Unintended Bias in Toxicity Classification
Brainstorm (Remove before publishing)
EDA
Part 1 - Embeddings
Bias benchmark - How choice of embedding affects bias
Plots
- Confusion matrices for using different embeddings on baseline model - look for counterfactuals
- Plots to compare performance of different embeddings
Part 2 - Model Training
Bias benchmark - How choice of model affects bias
Plots
- Confusion matrices for different models on one embedding - look for counterfactuals
- Explain model. Vega plots to add: ELI5, LIME NER, POS * Attention
Sweeps - optimize model
The Goal - Find Unintended Bias in Toxic Tweets
The Dataset

Toxicity Subtypes Distribution

Toxic Subtypes and Identity Correlation

Lexical Analysis

Toxicity by Identity Tags (Frequency)

Weighted Analysis of Most Frequently Toxic Tags

Correlation between identities - which identities are mentioned together?

Time Series Analysis of Toxicity

Word Clouds


All Identities

Emoji Usage in Toxic Comments


Word Embeddings
Word embeddings accept text corpus as an input and outputs a vector representation for each word. We use t-SNE to draw a scatter plot of similar words in the embedding space.




Affect of Embeddings on Bias
Bias Benchmarks
-
Subgroup AUC: The AUC score for the entire subgroup- a low score here means the model fails to distinguish between toxic and non-toxic comments that mention this identity.
-
BPSN AUC: Background positive, subgroup negative. A low value here means the model confuses non-toxic examples that mention the identity with toxic examples that do not.
-
BNSP AUC: Background negative, subgroup positive. A low value here means that the model confuses toxic examples that mention the identity with non-toxic examples that do not.
The final score used in this competition is a combination of these bias metrics, which we will also compute.
No Pretrained Embeddings - Final Metric: 0.90

GloVE - Final Metric: 0.9230

FastText - Final Metric: 0.9228

Concatenate GloVe and Fasttext - Final Metric: 0.9234

Model Interpretation - Named Entity Recognition, Eli5
NER
displacy.render(nlp(str(sentence)), jupyter=True, style='ent')

TextExplainer
Let's use ELI5 to see how model makes predictions.


