Warm Start for Low Data tasks
Comparison of performance on RTE and STSB tasks when starting from MNLI checkpoint vs vanilla DistilRoBERTa
Created on May 7|Last edited on May 7
Comment
In this report we compare performance on two low data tasks when finetuning a regular pretrained model vs model finetuned on similar task.
Namely we'll consider two tasks for GLUE benchmark Recognize Textual Entailment (RTE, 2.5k training samples) and Semantic Textual Similarity Benchmark (STSB, 7k). For warm start the pretrained checkpoint for MNLI task is used. This technique is known as intermediate task training [Phang et al] and was utilized for example in RoBERTa and ELECTRA when reporting GLUE results.
RTE
Run set
6
STSB
Run set
6
Add a comment