Skip to main content

Tutorial: Regression and Classification on XGBoost

A short tutorial on how you can use XGBoost with code and interactive visualizations.
Created on March 22|Last edited on June 7

Table of Contents (click to expand)



Introduction

In this report, we'll look at how you can use XGBoost, a well known implementation of Gradient Boosted Trees in Python, and learn how you can Weights and Biases to gather insights using Media Panels and Parallel Plots!
We'll look at how we can to use the algorithm below, though if you'd like to follow along in an executable colab, check out the link below:




Code

The XGBoost framework provides an extremely simple API to use decision trees for regression and classification tasks:
# Import the Libary
import xgboost as xgb
from wandb.integration.xgboost import WandbCallback

# Define a Model
xg_reg = xgb.XGBRegressor(...)

# Train the Model
xg_reg.fit(X_train,y_train,..., callbacks=[WandbCallback()])
That said? There are some key features you should consider while defining an XGBoost regression model. For example:
  • Maximum Depth: This parameter as the name suggests controls the depth of our tree. The higher the value, the more complex our model is and therefore higher are the chances of overfitting. It is therefore advisable to have a good validation strategy and robust evaluation metrics when experimenting with deeper trees.
  • Number of Estimators: This parameter controls the size of the forest.
  • Learning Rate: This parameter can be a key control for optimizing the performance of your model.

Experiments




Using the colab provided, here we can see how various learning rates, maximum depth and number of estimators differ from each other in terms of performance:

Run set
18

The Weights and Biases callback also logs the various parameters and calculates their importance, which we can see below:

Run set
18





Summary

In this article, you saw how you can use XGBoost in Python to train for various Machine Learning Tasks such as Classification and Regression. We also saw how using Weights and Biases to monitor your metrics can lead to valuable insights. To see the full suite of W&B features please check out this short 5 minutes guide. If you want more reports covering the math and "from-scratch" code implementations let us know in the comments down below or on our forum ✨!
Check out these other reports on Fully Connected covering other fundamental development topics like GPU Utilization and Saving Models.

Iterate on AI agents and models faster. Try Weights & Biases today.