Skip to main content

Merchant Segmentation

Part 1
Created on October 3|Last edited on October 4
The dataset given has 3 separate fields:
  1. Merchant ID as 'merchant'
  2. Time of transaction as 'time'
  3. Transaction amount in cents 'amount_usd_in_cents'
Sample:


Payments data EDA

(EDA code more in detail in the work repo)
Let's explore the payments data to get an idea about the type of transactions happening

payments per merchant



amount in cents per merchant



We see that payments activity has a long tail and isn't normally distributed
Since we have to segment merchants, the payment activity has to be rolled up to a merchant level

Merchant Level Rollup

Since most of our analysis is at the merchant level, let's roll up the data to calculate the different attributes that might separate one merchant from another, using just the transaction data. Unsurprisingly, given both payment timestamp and payment amount, we can generate more than a few features at a merchant level.
Some of those features:
  1. First payment date
  2. Last payment date
  3. Total amount paid
  4. Number of payments in total
  5. Average time between payments
  6. Q1, Q2, Q3 and Q4 payments and transaction amounts - since seasonality is sometimes a good indicator of behavior
  7. Lifespan: time between first and last payment
  8. Acquisition: time since first payment
  9. Recency: time since last payment
  10. Acquisition quarter and acquisition year
  11. Number of payments per day over the lifespan
  12. Average order value per payment

# Some basic features

# first payment date
merchant_first_payment = pd.DataFrame(payments.groupby(['merchant'])['time'].min().dt.date).reset_index().rename(columns =
{'time':'first_payment'})

# last payment date
merchant_last_payment = pd.DataFrame(payments.groupby(['merchant'])['time'].max().dt.date).reset_index().rename(columns =
{'time':'last_payment'})

# total payment amount
merchant_total_payment = pd.DataFrame(payments.groupby(['merchant'])['amount_usd_in_cents'].sum()).reset_index().rename(columns =
{'time':'total_spend'})

# number of payments
merchant_num_payments = pd.DataFrame(payments.groupby(['merchant'])['time'].count()).reset_index().rename(columns =
{'time':'num_payments'})

# time between payments
payments['time_from_last_order'] = payments.groupby(['merchant'])['time'].diff().dt.days
mask = payments['merchant'] != payments['merchant'].shift(1)
payments[mask]['time_from_last_order'] = np.nan

# time between payments

merchant_time_bw_payments = pd.DataFrame(payments.groupby('merchant')['time_from_last_order'].mean()).reset_index().rename(columns =
{'time_from_last_order':
'avg_time_bw_payments'})

Some more rollup:
# merging all the above features generated to a merchant level
merchant_level = merchant_first_payment.merge(merchant_last_payment,
how = 'inner',
on = 'merchant').merge(merchant_total_payment,
how = 'inner',
on = 'merchant').merge(merchant_num_payments,
how = 'inner',
on = 'merchant').merge(merchant_time_bw_payments,
how = 'inner',
on = 'merchant').merge(merchant_q1,
how = 'left',
on = 'merchant').merge(merchant_q2,
how = 'left',
on = 'merchant').merge(merchant_q3,
how = 'left',
on = 'merchant').merge(merchant_q4,
how = 'left',
on = 'merchant')
Summarize the rollups for relevant features:
merchant_level['time_between_first_and_last_payment'] = (merchant_level['last_payment'] - merchant_level['first_payment']).dt.days + 1

merchant_level['q1_amount'].fillna(0, inplace = True)
merchant_level['q1_purchases'].fillna(0, inplace = True)

merchant_level['q2_amount'].fillna(0, inplace = True)
merchant_level['q2_purchases'].fillna(0, inplace = True)

merchant_level['q3_amount'].fillna(0, inplace = True)
merchant_level['q3_purchases'].fillna(0, inplace = True)

merchant_level['q4_amount'].fillna(0, inplace = True)
merchant_level['q4_purchases'].fillna(0, inplace = True)

merchant_level['time_since_first_payment'] = (pd.to_datetime('2035-01-01') - pd.to_datetime(merchant_level['first_payment'])).dt.days

merchant_level['time_since_last_payment'] = (pd.to_datetime('2035-01-01') - pd.to_datetime(merchant_level['last_payment'])).dt.days

merchant_level['acquisition_month'] = pd.to_datetime(merchant_level['first_payment']).dt.to_period('M')

merchant_level['acquisition_quarter'] = pd.to_datetime(merchant_level['first_payment']).dt.quarter

merchant_level = pd.concat([merchant_level,
pd.get_dummies(merchant_level['acquisition_quarter'],
prefix = 'acquisition_quarter')],
axis = 1)

merchant_level['acquisition_year'] = pd.to_datetime(merchant_level['first_payment']).dt.year

merchant_level = pd.concat([merchant_level,
pd.get_dummies(merchant_level['acquisition_year'],
prefix = 'acquisition_year')],
axis = 1)

merchant_level['num_payments_per_day_in_life'] = merchant_level['num_payments'] / merchant_level['time_between_first_and_last_payment']

merchant_level['av_order_value'] = merchant_level['amount_usd_in_cents'] / merchant_level['num_payments']

This should give us a rolled-up merchant data:



Now that we've rolled up payments data at a merchant level, it is important we do an EDA on this data to ensure we have a good idea about how each of these features might impact the segmentation

Some merchant level EDA

(EDA code in detail in the work repo)
While a lot of segmentation based on multiple features were tried, it is easy to understand that a few crucial features that cover overall merchant behaviors are generally good enough to segment the merchant base:
Some of those selected features are:
  1. Monetary value - i.e. average order value
  2. Frequency - i.e. number of orders / payments per day of lifespan
  3. Lifespan - i.e. time from first to last purchase indicating longevity
Total Monetary Value = AOV * Frequency * Lifespan
(This is partially inspired by this blogpost on clustering based on recency, frequency and monetary value)
These three features should be able to cover the value of a customer

1 - AOV (Monetary value)





2 - Num Payments per day over lifespan




3 - Time between first and last payment (in days)



Since we know merchants differ from each other on these three dimensions, we shall try to cluster them according to these dimensions

Plotting all customers on these dimensions:
from mpl_toolkits import mplot3d
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
plt.figure(figsize = (20, 10))
ax = plt.axes(projection='3d')

zdata = merchant_level['num_payments_per_day_in_life']
ydata = merchant_level['time_between_first_and_last_payment']
xdata = merchant_level['av_order_value']

ax.scatter3D(xdata, ydata, zdata, s = 20);

ax.set_zlabel('Num Payments Per Day over Lifespan')
ax.set_ylabel('Lifespan in days')
ax.set_xlabel('Average Order Value')

We see that there are two outliers that should be removed for better clustering
Removing those outliers and replotting:



Clustering to find Segments

X_unscaled = merchant_level_smaller[columns_to_use]

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(X_unscaled)

from sklearn.cluster import KMeans

inertia = []

for i in range(1, 21):
kmeans = KMeans(
n_clusters=i,
random_state=42
)
kmeans.fit(X)
wandb.log({'inertia':kmeans.inertia_,
'clusters':kmeans.n_clusters})
inertia.append(kmeans.inertia_)

# plot
plt.figure(figsize = (20, 10))
plt.plot(range(1, 21), inertia, marker='o')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')
plt.show()
We plotted Inertia (which is a measure of how well separated the clusters themselves are and how similar the data points in each cluster are):

We see an inflection in the inertia reduction trend around when we split the data into 4 clusters
That looks like an optimal number of clusters for business interpretation and actionability
Find what those clusters actually are and what they mean:
kmeans = KMeans(n_clusters = 4,
init='k-means++',
random_state = 42)
kmeans.fit(X)
pred = kmeans.predict(X)
merchant_level_smaller['cluster'] = pred

plt.figure(figsize = (20, 10))
ax = plt.axes(projection='3d')

zdata = merchant_level_smaller['num_payments_per_day_in_life']
ydata = merchant_level_smaller['time_between_first_and_last_payment']
xdata = merchant_level_smaller['av_order_value']

ax.scatter3D(xdata, ydata, zdata, c=merchant_level_smaller['cluster'], s = 200);

ax.set_zlabel('Num Payments Per Day over Lifespan')
ax.set_ylabel('Lifespan in days')
ax.set_xlabel('Average Order Value')


Let's find how the clusters differ from each other?
cluster_summary = merchant_level_smaller.groupby(['cluster']).agg({'cluster':'count',
'num_payments':'sum',
'num_payments_per_day_in_life':'mean',
'amount_usd_in_cents':'sum',
'avg_time_bw_payments':'mean',
'time_since_first_payment':'mean',
'time_since_last_payment':'mean',
'time_between_first_and_last_payment':'mean'}).rename(columns = {'cluster':'cluster_count'}).sort_values(by = 'cluster_count',
ascending = False)

cluster_summary['aov'] = cluster_summary['amount_usd_in_cents'] / cluster_summary['num_payments']

cluster_summary['ltv_per_person'] = cluster_summary['amount_usd_in_cents'] / cluster_summary['cluster_count']

cluster_summary[['cluster_count',
'num_payments_per_day_in_life',
'time_between_first_and_last_payment',
'aov',
'ltv_per_person']].round({'num_payments_per_day_in_life':2,
'time_between_first_and_last_payment':2,
'aov':0,
'ltv_per_person':0})





Key numeric differences between clusters:

  1. Clusters 0 (purple) and 3 (yellow) are similar in terms of 'Num payments per day in lifespan', but Cluster 3 has a much shorter 'Average Lifespan' of 195 days, compared to Cluster 0 with 460 days. This isn't skewed by acquisition time period, since the 'Average number of days since acquisition' for cluster 0 is higher than that of Cluster 3
2. Cluster 1 (blue) has higher number of payments per day per person, even though it under-indexes on lifespan
3. Cluster 2 has shorter lifespans and AOVs but over-indexes on the number of payments per day by orders of magnitude

Merchant Segments Identified

Here's a general interpretation (with not so catchy names) on the four clusters we identified:
(We did not include the outliers in this segmentation and those outliers can be dealt with individually)
  1. High Frequency Merchants (Green cluster)
    • Merchants with high frequency of transactions per day (20+)
    • Tend to still have relatively higher order values
    • But their average lifespan is very short too around 5 days.
    • These merchants are unusual, but can be easily flagged due to their high frequency of transactions. Their higher AOVs suggest they should be incentivized to stay longer, especially if they also can make transactions similar to those in other clusters and last longer
  2. Loyalists (Purple cluster)
    • These customers have a high lifespan longer than a year between their first and last purchase, average upwards of 450 days
    • They average around a transaction a day
    • They have an AOV of 140 dollars per transaction
    • But their longevity makes them incredibly valuable at an average of 34k dollars per merchant
    • High customer satisfaction and good service should keep their churn low, get them to make recurring transactions and continue their high value
  3. Core Customers (Blue cluster)
    • These merchants make up 60% of the customer group
    • Although they underindex to the 'High Frequency' and 'Loyalists' group on their long term value, they are still incredibly valuable
    • They bring in an average of 4600 dollars per merchant over their lifetime
    • Their relatively higher AOV of 140 dollars per merchant is still valuableIf we find out what motivates this merchant group, some of them might transform into 'Loyalists' easily
  4. Whales (Yellow cluster)
    • This group is small with less than 3% of the overall base
    • But their high AOV (~3500 dollars) still make them incredibly valuable
    • Their average lifespan is a lot more variable though

What do these segments of customers mean?

Every business wants to understand what the underlying differences between the segments: It helps us plan for retention, acquisition, maintaining customer satisfaction, product strategy and model long term revenue and growth based on each segment's observed performance.

Appendix

Ways to improve the clustering:
  1. We only tried the most prominent of clustering methods - KMeans - to segment the merchants
2. There are a dozen other clustering methods on sklearn's clustering homepage that can explored to probably cluster the merchants better
3. We also went with the most obvious of the features to segment the merchants on. Trying a higher dimensional feature set, more / fewer segments etc. might find us more interesting patterns with a much lower inertia and useful business takeaways