BigBird Base In-Domain Pre-training Results
Using Masked Language Modeling to adapt the model to the domain of high school essays.
Created on December 20|Last edited on December 21
Comment
Training parameters
Model: google/bigbird-roberta-base
Device: TPU v3-8
Train batch size: 4 x 8 devices
Dtype: fp16
Num train epochs: 15
Learning rate: 5e-5
Max sequence length: 1024
Weight decay: 0.0095
MLM probability: 0.15
Scheduler: Linear
Warmup steps: 1000
Training figures
Add a comment