Skip to main content

The Future of Model Fine-Tuning May Have Just Been Discovered

MIT Researchers unveil an elegant solution to a huge problem
Created on February 14|Last edited on February 14
Currently, the major trend in deep learning is the development of huge models, with billions of parameters. These foundation models are extremely powerful, and oftentimes require several million dollars to develop and train. These models can be used as a backbone model for developers to start training their own models, in a setting where the developer has their own proprietary data that can be used for fine-tuning the foundational model to ultimately train a model specialized for the developers' data. One problem in this scenario is that when using traditional training methods, either the full model must be shared with the developer, or the developer's data must be shared with the model owner.
Currently, methods exist to protect the data of the developer, and this is called federated learning. Federated learning works by allowing users to train their own data on a full model on their local hardware, and the weights of the updated model are sent to a central server. The issue with federated learning is that it does not protect the original weights of the model, and is vulnerable to misuse. Another method called Decoupled learning has been created to break down the problem of neural network training into smaller subproblems, however, this has mostly been used for training models from scratch, rather than for fine-tuning large models.


The Solution

Researchers from the MIT HAN lab have recently found a solution to this problem, and they call it Offsite Tuning. With Offsite Tuning, the original model is broken down into two parts, called an Adapter and an emulator. The emulator is a portion of the model that is able to estimate gradients to be applied to the adapter. Layer drop-based compression followed by knowledge distillation is applied to the emulator, thus protecting the model owner, while allowing the user to estimate gradients effectively. The adapter is usually a small subset of the original model, with several layers dropped. The adapter is fine-tuned by the user with the use of the emulator to estimate gradients, and the fine-tuned adapter is returned to the model owner, and finally combined with the original uncompressed emulator and original adapter, to create a fine-tuned version of the original foundational model.


The future

Overall, Offsite Tuning provides performance without sacrificing either data or model privacy. As models grow in size and value, the issue of privacy and intellectual property grows as well. Offsite tuning is an elegant and valuable solution, and potentially could be the future of much of model training in the future.
Iterate on AI agents and models faster. Try Weights & Biases today.