A New Method For LLM Regularization
A simple and effective way to boost generalization?
Created on October 30|Last edited on October 30
Comment
Researchers have introduced NEFTune, a new approach to enhance the fine-tuning of language models using Noisy Embeddings. The method aims to improve the model's performance across various tasks, including text classification and machine translation, by injecting controlled noise into the model's embedding layer.
Methodology
NEFTune employs a unique strategy of adding controlled uniform noise to the embeddings during the fine-tuning process. This is in contrast to traditional fine-tuning methods, which usually just adjust the weights of a pre-trained model. The noise is designed to prevent overfitting and to improve the generalization of the model.
Results

In the analysis, LLaMA-2-7B models trained on the Alpaca dataset were evaluated both with and without NEFTune. The findings showed that models using NEFTune had a higher training loss but lower testing loss, indicating better generalization and less overfitting compared to models trained without NEFTune.
When comparing generated responses to ground truth data, models using NEFTune scored lower on ROUGE-L and BLEU metrics, suggesting they were less likely to reproduce the same wording as the ground truth. This confirmed that they were less overfit than their counterparts.
In terms of text length and diversity, NEFTune models produced longer outputs but did not sacrifice diversity, as evidenced by nearly identical 2-gram repetition rates and token log-diversity scores.
The study also examined if simply generating longer outputs could replicate NEFTune's performance. While longer outputs did yield some performance gains, none of the strategies for lengthening the outputs came close to the performance of models trained with NEFTune.
Experiments show that NEFTune outperforms traditional fine-tuning methods across several benchmarks. It demonstrated superior performance in text classification tasks, sentiment analysis, and machine translation. The model's ability to generalize was particularly noteworthy, making it versatile for various applications.
Improvement in Text Quality: Using NEFTune for training significantly improves text quality and conversational abilities across all datasets tested. The win rate of a 70B parameter model trained on Evol-Instruct increased from 75.03% to 88.81%.
Model-Specific Benefits: For the LLaMA-2 model, even after extensive fine-tuning, NEFTune is able to offer an additional performance increase of 10% on Evol-Instruct.
Preservation of Capabilities: A concern with using NEFTune might be that it improves conversational ability at the expense of other features like knowledge and reasoning. However, evaluations show that this is not the case.
Compatibility with QLORA: NEFTune also works well with Quantized Low Rank Adapters (QLORA), although the performance gains were less compared to full-scale fine-tuning.
Conclusion
NEFTune represents a significant step forward in the fine-tuning of language models. Its innovative approach of using noisy embeddings has been shown to enhance performance across a range of tasks, and it paves the way for future advancements in the field.
The paper:
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.