Skip to main content

XGBoost Model Performance

This report offers a comprehensive evaluation of the XGBoost model used for CRSS prediction, detailing the model's performance metrics, and hyperparameter importance. We analyze R2, RMSE, and MAPE to assess model accuracy and fit. Hyperparameters such as colsample_bytree, learning_rate, max_depth, n_estimators, and subsample are evaluated for their impact on MAPE, providing insights into model sensitivity.
Created on April 3|Last edited on April 4

Section 1: Performance Metrics

R2 (R-squared): This is a statistical measure that represents the proportion of the variance for the dependent variable (in this case, CRSS) that's explained by the independent variables in the model. Higher values indicate a better fit of the model to the data. Each sweep (trial) in the report shows the R2 score for that particular set of hyperparameters.
RMSE (Root Mean Squared Error): This metric measures the average magnitude of the errors between the predicted values and the actual values. Lower RMSE values indicate better model performance.
MAPE (Mean Absolute Percentage Error): This is the average of the absolute percentage errors of predictions. It provides a percentage measure of the accuracy of the model's predictions, with lower values indicating better accuracy.

  • breezy-sweep-10 (colsample_bytree = 0.213, learning_rate = 0.02237, max_depth = 8, n_estimators = 1340, subsample = 0.4864, MAPE = 0.1823) shows very low MAPE, which suggests it can be the most accurate predictions on average, and it ranks relatively high on R2, which means it captures the amount of variance.

Sweep: 4y84hv6u 1
10
Sweep: 4y84hv6u 2
0




Section 2: Hyperparameter Importance

This section shows the importance of different hyperparameters with respect to the MAPE metric. The hyperparameters include:
  • colsample_bytree: The fraction of features (columns) to be randomly sampled for each tree.
  • learning_rate: The step size shrinkage used to prevent overfitting. It's also known as the "eta" value.
  • max_depth: The maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit.
  • n_estimators: The number of trees in the ensemble.
  • subsample: The fraction of observations (rows) to be randomly sampled for each tree.
The importance plot shows which hyperparameters have the most impact on the MAPE metric. In this case, we can see that n_estimators and subsample are among the most important hyperparameters.

Sweep: 4y84hv6u
10