Task-Adaptive Pretraining
This report visualizes the continued pretraining of RoBERTa.
Created on August 28|Last edited on September 16
Comment
This report illustrates the adaptation of the RoBERTa encoder for the English texts from the BiPaR dataset. More precisely, RoBERTa was trained on texts using the masked language modeling (MLM) objective. Two different approaches were implemented:
- roberta-large_40-tokens was trained by masking 40% of tokens.
- roberta-large_20-words was trained by masking 20% of words.
Training
- The perplexity and loss of the model trained on 40% masked tokens were higher than those of the model trained on 20% masked words.
- Since RoBERTa had already been pretrained on stories, its perplexity changed only slightly during continued pretraining on the corpus generated from BiPaR passages.
- Over 20 training epochs, both approaches overfitted only slightly on the BiPaR passages, as shown by the loss curves.
Add a comment