Skip to main content

BigBird Base In-Domain Pre-training Results

Using Masked Language Modeling to adapt the model to the domain of high school essays.
Created on December 20|Last edited on December 21

Training parameters

Model: google/bigbird-roberta-base
Device: TPU v3-8
Train batch size: 4 x 8 devices
Dtype: fp16
Num train epochs: 15
Learning rate: 5e-5
Max sequence length: 1024
Weight decay: 0.0095
MLM probability: 0.15
Scheduler: Linear
Warmup steps: 1000



Training figures