Predict Churn
Part 2
Created on October 3|Last edited on October 4
Comment
The dataset given has 3 separate fields:
- Merchant ID as 'merchant'
- Time of transaction as 'time'
- Transaction amount in cents 'amount_usd_in_cents'
Using rolled-up data from previous work to understand when a customer might be considered as churned:

Working on the merchant-level rollup data we created to find churn
merchant_data['avg_time_bw_payments'].fillna(1, inplace = True)pd.cut(merchant_data['avg_time_bw_payments'],bins = [0,30,60,90,120,150,180,210,240,270,300,330,360,390,420,450,480,510,540,570,600,630,660,690,720]).value_counts(normalize = True).cumsum().plot(kind = 'bar');
Average number of days between successive payments:

Churn Identification
We could say that anyone who hasn't returned within 90 days of their last payment could be considered as churned because we see from above chart that the only <5% of customers take more than 90 days on average between successive purchases
To model churn, we need:
- Merchant behavior - i.e. transaction history for a period of time
- Merchant churn - i.e. in the days / months following the transaction history, information on whether the merchant churned or not.
Since we have the data for two years total - 2033 and 2034, we can split the data into two pieces for a pre and post period. The pre period helps us build the merchant features using transaction data, and the post period helps us determine if a merchant has churned.
- Pre period: 2033-01-01 to 2034-10-31
- Post period: 2034-11-01 to 2034-12-31
Pre period data, and merchant's presence in post period indicating churn / active status
payments_pre = payments[payments['time'] <= pd.to_datetime("2034-10-31")]merchant_post = payments_post['merchant'].unique()
Similar to the previous exercise where we rolled up the data on the entire dataset, here we are rolling up the pre-period data to merchant level:

Merchant's activity / churn status - If merchant has a transaction in the 3 months post period, they haven't churned, else they have.
merchant_level['churn'] = (merchant_level['merchant'].isin(merchant_post))
Class Balance
merchant_level['churn'].value_counts()
True - churned

We see that the classes are fairly well balanced.
Pre-model Considerations
What is a good metric to assess if a model has performed well?
1. Since the classes are slightly imbalanced, we'd be able to get 50%+ accuracy with a single class output as the prediction
2. But since we are trying to identify churn ahead of time, we would like to overweight the model towards predicting customers that might churn, even if that means falsely classifying non-churn customers as churning
3. This means we need to catch as much true-churn customers as possible - thereby, a high Recall (i.e. TP / (TP + FN)
4. We shall keep in mind ***Precision, Recall, ROC-AUC score and Accuracy*** all in mind while choosing which model to go with finally
Baseline model 1 performance:
Giving each customer a baseline churn flag as anyone with > 90 days since last payment as churned, to compare against true churn observed
merchant_level['baseline_churn_flag'] = (merchant_level['time_since_last_payment'] > 90)merchant_level.pivot_table(index = 'churn',columns = 'baseline_churn_flag',values = 'merchant',aggfunc = 'count')
Confusion matrix:

The above is a poor performance on all metrics
Test Train Split
We train the model on a training dataset to evaluate it on the test dataset
We aren't using a cross validation dataset here for simplicity
Using the most relevant columns that might help us predict churn from transaction data, let's split the training data into test and training sets using random sampling.
Since each merchant's behavior should be independent of the others and it is not a time series data, a random sampling of customers into test and train sets should work fine. Here, 20% of the dataset is considered as 'test set' and the rest is used for training.
columns_to_use = ['merchant','amount_usd_in_cents','num_payments','avg_time_bw_payments','time_between_first_and_last_payment','time_since_first_payment','time_since_last_payment','num_payments_per_day_in_life','av_order_value','churn']features_data = merchant_level[columns_to_use]y = features_data['churn']X = features_data.drop(columns = 'churn')X_id = X['merchant']X_train, X_test, y_train, y_test = train_test_split(X, y,test_size = 0.20,random_state = 0)X_train_id = X_train['merchant']X_test_id = X_test['merchant']X_train = X_train.drop(columns = 'merchant')X_test = X_test.drop(columns = 'merchant')
Let's train some models
1 - Baseline model performance
Assigning all merchants a churn flag - False because that was the dominant class.
This ensures that an inaccurate baseline is set up that any models we build should beat, so as to ensure that there is nothing wrong with any models we build
y_pred_test = [False]*len(y_test)y_pred_train = [False]*len(y_train)metrics.confusion_matrix(y_test, y_pred_test)

Model results tracked:
results = {'Algorithm':'Baseline_False','ROC AUC Score':np.nan,'Accuracy':metrics.accuracy_score(y_test, y_pred_test),'Precision':metrics.precision_score(y_test, y_pred_test),'Recall':metrics.recall_score(y_test, y_pred_test)}model_results = model_results.append(results, ignore_index = True)

2 - Simple Logistic Regression
scale the data
scaler = StandardScaler()X_train_scaled = pd.DataFrame(scaler.fit_transform(X_train))X_train_scaled.columns = X_train.columns.valuesX_train_scaled.index = X_train.index.valuesX_test_scaled = pd.DataFrame(scaler.fit_transform(X_test))X_test_scaled.columns = X_test.columns.valuesX_test_scaled.index = X_test.index.values
Fit a logistic regressor
from sklearn.linear_model import LogisticRegressionfrom sklearn import metricsmodel = LogisticRegression(random_state = 42)model.fit(X_train_scaled, y_train)prediction_test = model.predict(X_test_scaled)metrics.confusion_matrix(y_test, prediction_test)

results = {'Algorithm':'Simple Logistic Regression','ROC AUC Score':metrics.roc_auc_score(y_test, model.predict_proba(X_test_scaled)[:,1]),'Accuracy':metrics.accuracy_score(y_test, prediction_test),'Precision':metrics.precision_score(y_test, prediction_test),'Recall':metrics.recall_score(y_test, prediction_test)}model_results = model_results.append(results, ignore_index = True)

3 - Simple Logistic Regression with weight balancing
model = LogisticRegression(random_state = 42, class_weight='balanced')model.fit(X_train_scaled, y_train)prediction_test = model.predict(X_test_scaled)metrics.confusion_matrix(y_test, prediction_test)

results = {'Algorithm':'Simple Logistic Regression with weight balancing','ROC AUC Score':metrics.roc_auc_score(y_test, model.predict_proba(X_test_scaled)[:,1]),'Accuracy':metrics.accuracy_score(y_test, prediction_test),'Precision':metrics.precision_score(y_test, prediction_test),'Recall':metrics.recall_score(y_test, prediction_test)}model_results = model_results.append(results, ignore_index = True)

4 - Simple Random Forest Classifier
# Train modelclf_rf = RandomForestClassifier()clf_rf.fit(X_train, y_train)# Predict on test setpred_y_rf = clf_rf.predict(X_test)metrics.confusion_matrix(y_test, pred_y_rf)

results = {'Algorithm':'Simple Random Forest Classifier','ROC AUC Score':metrics.roc_auc_score(y_test, clf_rf.predict_proba(X_test)[:,1]),'Accuracy':metrics.accuracy_score(y_test, pred_y_rf),'Precision':metrics.precision_score(y_test, pred_y_rf),'Recall':metrics.recall_score(y_test, pred_y_rf)}model_results = model_results.append(results, ignore_index = True)

Recall fell slightly, but precision, ROC-AUC score and accuracy went up
Considering all four metrics, this seems to be the best performer so far.
How much better can the model be?
- We could train many other models on the dataset to see if they have better accuracy / recall / precision values
- We could also fine-tune the classifiers we have trained through better hyper parameters
- But the general structure / workflow is similar
Conclusion
- Our model was fairly simplistic
- It didn't have any complicated features and only used transaction dataset
- We didn't have any other information about the different merchants
- But based on just this information, our model was able to:
- Accurately classify 80% of the time between customers who would likely churn or stay active
- Have very low false negative rate (thereby lowering the chances of misclassifying churning merchants as healthy
- Putting this model into production could be a good start to proactively identifying potentially churning customers
Add a comment