Drift Detection Progress Report
A sample run of all drift detection methods as of 10-27-2022.
Created on October 28|Last edited on October 28
Comment
For more information on specific drift detection methods, see the design doc for the drift detection benchmarking tool. For information about the novel methods implemented, see this work-in-progress design doc.
Model-Free Drift Detection Techniques
These types of drift detectors output p-values for the hypothesis that there is no distribution change given two datasets (in this case, the training dataset and the inference dataset). For these drift detectors, p<.05 (or a user-specified threshold) counts as an alarm. For example, the mixed-type tabular data drift detector applies a feature-wise two-sample Kolmogorov-Smirnov for continuous features and Chi-Squared tests for the categorical features.
In the below graph, we compare drift detectors run on the NYC Taxi Cab dataset by comparing week 0 of 2020 with weeks 1-25 of 2020. All detectors are in alarm at the beginning of the NYC COVID pandemic, even though model accuracy actually increasing during that time. We see that 2 drift detectors (Maximum Mean Discrepancy, Fisher’s Exact Test) produces values that are not useful, as they are always alarming even when model accuracy is not significantly changing. We see 3/5 drift detectors alarm prior to March 2020 without a corresponding large decrease in model accuracy.
Run set
6
Classifier-Based Drift Detection Techniques
These drift detectors employ a classifier to detect changes in the data. Classifier Drift Model and Spot the Diff Model both train a binary classifier to predict whether a data point came from the train set or an inference set. Interestingly, the classifier uncertainty drift detector employs information from another trained model a user specifies. Specifically, it measures the entropy from the logits of a classifier. The classifier uncertainty model finds whether there is a statistically significant difference between the distribution of model uncertainty between the train and test set. Despite this added information, this model alarms 4 times in this run, even when model accuracy has increased.
Run set
6
Accuracy-Based Drift Detection
Inspired by classifier uncertainty drift detection, we predict the accuracy of a given model by measuring its performance on the training set and creating performance bins. We then use importance weighting during inference to generate predicted accuracy metrics. We can use the same methods to predict any scoring metric, such as F1 score, recall, precision. Here we use 4 novel methods that build on model/user side information (see this work-in-progress design doc for more information about each model).
There is potential in simply predicting accuracy on unlabeled data from a user perspective because users may wish to configure hard thresholds on model accuracy. This perspective may be useful in motivating future work.
Run set
6
Add a comment