neftune for summarization - longt5

Small experiment to see if NEFTune improves summarization model generalization.
Created on November 13|Last edited on November 13
Comment
﻿
aboutRelevant paper link﻿
We show that language model finetuning can be improved, sometimes dramatically, with a simple augmentation. NEFTune adds noise to the embedding vectors during training. Standard finetuning of LLaMA-2-7B using Alpaca achieves 29.79% on AlpacaEval, which rises to 64.69% using noisy embeddings. NEFTune also improves over strong baselines on modern instruction datasets. Models trained with Evol-Instruct see a 10% improvement, with ShareGPT an 8% improvement, and with OpenPlatypus an 8% improvement. Even powerful models further refined with RLHF such as LLaMA-2-Chat benefit from additional training with NEFTune.
whatthere is a new technique/method for training called NEFTune that adds noise to embedding vectors, which results in better generalization (of autoregressive LMs on instruct tasks)
summarization models have very beeg generalization problem. huge
idea: let's try this technique for summarization fine-tunes and see if it helps!
tldrI think it's hard to tell and requires more experimentation.. It's hard to tell if a) it helps or b) it doesn't, either it needs to be much higher for summarization models or longt5-base is too small for it to make a difference.
﻿
datasetinitial version of my 'summary souffle'
lay_plos          20789
multi_news        11708
big_patent         4164
gov_report         3514
summ_screen_fd     3449
billsum            2541
lay_elife          2528
booksum            2383
cnn_dailymail      1705
stacksmol           450
qmsum               396
squality            200
xlsum_en            118
worldbank            90
narrativeqa          49
dialogsum             3
Name: subset, dtype: int64
size:
DatasetDict({
    train: Dataset({
        features: ['text', 'summary', 'subset'],
        num_rows: 54087
    })
    validation: Dataset({
        features: ['text', 'summary', 'subset'],
        num_rows: 4262
    })
    test: Dataset({
        features: ['text', 'summary', 'subset'],
        num_rows: 4202
    })
})
Section 1
eval﻿
Run set13
﻿
trainNote: in the paper (for autoregressive LMs) they mention that training loss goes up with neftune, but I don't think that is really observable in the the below, even at six times the recommended value (for autoregressive LMs) --> 0.6 ??
﻿
Run set13
﻿
Run ComparerOutside of the one run without NEFTune, the others were me trying every idea I could think of to train faster (torch compile, etc) ... nothing works
﻿
Run set13
﻿
﻿
Add a comment