Warm Start for Low Data tasks

Comparison of performance on RTE and STSB tasks when starting from MNLI checkpoint vs vanilla DistilRoBERTa

Created on May 7|Last edited on May 7

Comment

In this report we compare performance on two low data tasks when finetuning a regular pretrained model vs model finetuned on similar task.
Namely we'll consider two tasks for GLUE benchmark Recognize Textual Entailment (RTE, 2.5k training samples) and Semantic Textual Similarity Benchmark (STSB, 7k). For warm start the pretrained checkpoint for MNLI task is used. This technique is known as intermediate task training [Phang et al] and was utilized for example in RoBERTa ﻿﻿and ELECTRA  when reporting GLUE results.
RTE﻿
﻿
accuracy
accuracy
100200300400500600700Step0.50.550.60.650.70.75
group: glue-rte-distilroberta-base-2e-05
group: glue-rte-distilroberta-base-2e-05-warm-start
Run set6
﻿
﻿
﻿
STSB﻿
Run set6
﻿
﻿
﻿

Add a comment