Skip to main content

Warm Start for Low Data tasks

Comparison of performance on RTE and STSB tasks when starting from MNLI checkpoint vs vanilla DistilRoBERTa
Created on May 7|Last edited on May 7
In this report we compare performance on two low data tasks when finetuning a regular pretrained model vs model finetuned on similar task.
Namely we'll consider two tasks for GLUE benchmark Recognize Textual Entailment (RTE, 2.5k training samples) and Semantic Textual Similarity Benchmark (STSB, 7k). For warm start the pretrained checkpoint for MNLI task is used. This technique is known as intermediate task training [Phang et al] and was utilized for example in RoBERTa and ELECTRA when reporting GLUE results.

RTE



100200300400500600700Step0.50.550.60.650.70.75
group: glue-rte-distilroberta-base-2e-05
group: glue-rte-distilroberta-base-2e-05-warm-start
Run set
6




STSB


Run set
6