Unintended Bias in Toxicity Classification
Brainstorm (Remove before publishing)
EDA
Part 1 - Embeddings
Bias benchmark - How choice of embedding affects bias
Plots
- Confusion matrices for using different embeddings on baseline model - look for counterfactuals
- Plots to compare performance of different embeddings
Part 2 - Model Training
Bias benchmark - How choice of model affects bias
Plots
- Confusion matrices for different models on one embedding - look for counterfactuals
- Explain model. Vega plots to add: ELI5, LIME NER, POS * Attention
Sweeps - optimize model
The Goal - Find Unintended Bias in Toxic Tweets
The Dataset
Toxicity Subtypes Distribution
Toxic Subtypes and Identity Correlation
Lexical Analysis
Toxicity by Identity Tags (Frequency)
Weighted Analysis of Most Frequently Toxic Tags
Correlation between identities - which identities are mentioned together?
Time Series Analysis of Toxicity
Word Clouds
All Identities
Emoji Usage in Toxic Comments
Word Embeddings
Word embeddings accept text corpus as an input and outputs a vector representation for each word. We use t-SNE to draw a scatter plot of similar words in the embedding space.
Affect of Embeddings on Bias
Bias Benchmarks
-
Subgroup AUC: The AUC score for the entire subgroup- a low score here means the model fails to distinguish between toxic and non-toxic comments that mention this identity.
-
BPSN AUC: Background positive, subgroup negative. A low value here means the model confuses non-toxic examples that mention the identity with toxic examples that do not.
-
BNSP AUC: Background negative, subgroup positive. A low value here means that the model confuses toxic examples that mention the identity with non-toxic examples that do not.
The final score used in this competition is a combination of these bias metrics, which we will also compute.
No Pretrained Embeddings - Final Metric: 0.90
GloVE - Final Metric: 0.9230
FastText - Final Metric: 0.9228
Concatenate GloVe and Fasttext - Final Metric: 0.9234
Model Interpretation - Named Entity Recognition, Eli5
NER
displacy.render(nlp(str(sentence)), jupyter=True, style='ent')
TextExplainer
Let's use ELI5 to see how model makes predictions.