Skip to main content

Unintended Bias in Toxicity Classification

Created on March 20|Last edited on March 20

Brainstorm (Remove before publishing)

EDA

Part 1 - Embeddings

Bias benchmark - How choice of embedding affects bias

Plots

  • Confusion matrices for using different embeddings on baseline model - look for counterfactuals
  • Plots to compare performance of different embeddings

Part 2 - Model Training

Bias benchmark - How choice of model affects bias

Plots

  • Confusion matrices for different models on one embedding - look for counterfactuals
  • Explain model. Vega plots to add: ELI5, LIME NER, POS * Attention

Sweeps - optimize model

The Goal - Find Unintended Bias in Toxic Tweets

The Dataset

Toxicity Subtypes Distribution

Toxic Subtypes and Identity Correlation

Lexical Analysis

Toxicity by Identity Tags (Frequency)

Weighted Analysis of Most Frequently Toxic Tags

Correlation between identities - which identities are mentioned together?

Time Series Analysis of Toxicity

Word Clouds

All Identities

Emoji Usage in Toxic Comments

Word Embeddings

Word embeddings accept text corpus as an input and outputs a vector representation for each word. We use t-SNE to draw a scatter plot of similar words in the embedding space.

Affect of Embeddings on Bias

Bias Benchmarks

  • Subgroup AUC: The AUC score for the entire subgroup- a low score here means the model fails to distinguish between toxic and non-toxic comments that mention this identity.

  • BPSN AUC: Background positive, subgroup negative. A low value here means the model confuses non-toxic examples that mention the identity with toxic examples that do not.

  • BNSP AUC: Background negative, subgroup positive. A low value here means that the model confuses toxic examples that mention the identity with non-toxic examples that do not.

The final score used in this competition is a combination of these bias metrics, which we will also compute.

No Pretrained Embeddings - Final Metric: 0.90

GloVE - Final Metric: 0.9230

FastText - Final Metric: 0.9228

Concatenate GloVe and Fasttext - Final Metric: 0.9234

Model Interpretation - Named Entity Recognition, Eli5

NER

displacy.render(nlp(str(sentence)), jupyter=True, style='ent')

TextExplainer

Let's use ELI5 to see how model makes predictions.