Skip to main content

Researchers Use LLM’s for Code Efficiency Improvement

Google and others introduce open source solution to code efficiency improvement
Created on February 22|Last edited on February 22

Google Brain and Inspired Cognition, along with other Universities including Carnegie Mellon, University of Pennsylvania, and Language Technologies Institute recently utilized large language models (LLM’s) to improve efficiency of code. The task of code efficiency improvement is extremely valuable, as even small improvements in efficiency can lead to major cost savings and even environmental benefits.
Existing code optimization technologies like ChatGPT and OpenAI’s Codex offer excellent performance for tasks like single-shot code performance improvement, however, the models are currently closed-source, which prevents the majority of researchers from making improvements on the existing implementations.
The authors choose CodeGen as their backbone network, which is a billion parameter open sourced LLM, and also much smaller than GPT-3 based ChatGPT.
The main contribution of the paper is their introduction of a new form of dataset including primarily “performance improving edits,” which are trajectories of code changes within a repository where programmers gradually make improvements to the original version. Utilizing this form of data, and specifically focusing on a consistent programmer over the course of a trajectory, the model is able to learn a constant editing style from the data, while still achieving efficiency gains.
The authors saw an amazing performance improvements on par with OpenAI’s Codex, all with a model 10 times smaller than Codex. Overall, the authors state their method is able to provide a 2.5x speedup for over 25% of the code it is applied to.

Beginning of a Trend

This research is a huge step forward for AI powered code, and may be the beginning of a trend where closed source LLM’s begin to become available in open-sourced form; however, much is needed to improve existing models ability to reason logically, and reduce the chance of catastrophic code changes which are currently the major drawback to existing AI code generators.
As LLM's go from closed to open-sourced, its likely that pace of innovation will increase, and many of the current drawbacks will be resolved.

The Paper:

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.