Skip to main content

Direct Marketing with XGBoost and Amazon SageMaker

Analysis of experiments from SageMaker Immersion Day direct marketing notebook.
Created on May 8|Last edited on November 30

Introduction

In this post, we're going to look at
This post contains metrics reported from the SageMaker immersion day direct marketing notebook, which is self described as
Direct marketing, either through mail, email, phone, etc., is a common tactic to acquire customers. Because resources and a customer's attention is limited, the goal is to only target the subset of prospects who are likely to engage with a specific offer. Predicting those potential customers based on readily available information like demographics, past interactions, and environmental factors is a common machine learning problem. This notebook presents an example problem to predict if a customer will enroll for a term deposit at a bank, after one or more phone calls. The steps include: Preparing your Amazon SageMaker notebook Downloading data from the internet into Amazon SageMaker Investigating and transforming the data so that it can be fed to Amazon SageMaker algorithms Estimating a model using the Gradient Boosting algorithm Evaluating the effectiveness of the model Setting the model up to make on-going predictions

I modified the notebook (modded notebook available here) to track all experiments via Weights & Biases. The modification consists of essentially 2 steps:
  • Replace the use of Estimator(<xgboost-image>) with XGBoost() and a custom training script
  • Pass WandbCallback() to the booster in the custom script
Once that is done, model metrics over the course of training will be automatically captured and transmitted to Weights & Biases, where they will be organized and visualized in a powerful way. This report includes examples of some of W&B's native visualizations that make it easy to monitor and interpret the performance of your models, analyze the impact of hyperparameter choices, and see the impact of training on your hardware.
For more details on how to use Weights & Biases with XGBoost and/or SageMaker, check out some of the reports linked below!


Error Leaderboard

Weights & Biases' XGBoost integration automatically captures all metrics over the course of training. We can visualize the evolution of those metrics using W&B's line plot in a report and show the absolute best training and validation error achieved with a scalar chart. Note that as new experiments are logged, these figures will update in real time, making this report function as a live dashboard where you and your teammates can see your accuracy improve in real time!

Run set
42


Feature Importance

Weights & Biases' XGBoost integration also logs feature importance by default. Here we can see how influential each feature in our dataset was on the predictions made by each model. This chart shows us the distribution of feature importance values for each feature across all models we have trained.

Run set
42


Hyperparameter Analysis

Weights & Biases also provides out-of-the-box visualizations to help you understand the relationship between your chosen hyperparameters and any performance metric that you log. The two panels below are the parameter importance chart and the parallel coordinates chart.
The parameter importance chart actually trains a model in your browser to learn the correlation between parameter values and metrics. For example, the panel below shows that the most influential hyperparameter on our validation error is gamma , and larger values of gamma lead to smaller error rates.
The parallel coordinates chart draws each experiment as a line crossing several vertical axes, each corresponding to a hyperparameter or metric. By hovering over the lines or highlighting regions on the axes, we can interactively explore the hyperparameter space.
PS: Notice that when you highlight an axis-region on the parallel coordinates chart, the runs outside that region fade away and the parameter importance chart updates to only take into account the runs that are still visualized!

Run set
42


Hardware Monitoring

When we call wandb.init , Weights & Biases starts monitoring hardware performance over the course of your experiment. Here we can visualize the CPU and disk utilization percentages for each of our experiments in order to understand not just how hyperparameters influence model accuracy, but also how hyperparameters improve or the computational efficiency of the training process.

Run set
42