Skip to main content

InCoder: A Generative Model for Code Infilling and Synthesis

On April 12, 2022 the paper, "InCoder: A Generative Model for Code Infilling and Synthesis" was submitted, complete with model, demo and examples.
Created on April 14|Last edited on April 15
On April 12, 2022 the paper, "InCoder: A Generative Model for Code Infilling and Synthesis" was submitted to arXiv by Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer and Mike Lewis in a collaboration between Facebook AI Research, University of Washington, UC Berkeley, TTI-Chicago, Carnegie Mellon.
The paper is accompanied by model weights and code, a demo and some great examples.

What Is InCoder?

InCoder as the author's describe it:
"We introduce INCODER, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming. We find that the ability to condition on bidirectional context substantially improves performance on these tasks, while still performing comparably on standard program synthesis benchmarks in comparison to left-to-right only models pretrained at similar scale."
What the authors have created is loosely equivalent to what the BERT authors accomplished with NLP. They have been inspired by the way humans approach problems and used that approach to improve the way machines to.
Rather than treating code creation as a pure left-to-right problem, they add bi-directionality, enabling a host of new capabilities including debugging, re-naming variables, adding comments, etc. As they note in their abstract, "Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined."
The describe the mechanics:
"More specifically, we learn to infill by randomly replacing spans of code with a sentinel token and moving them to the end of the sequence (Figure 1, top). The model is trained to predict all tokens in the complete sequence in this permuted ordering. During inference, we can edit code by replacing spans with sentinel tokens, prompting the model with the new sequence, and having it generate new tokens to replace the masked spans (Figure 1, bottom). Because the model can also trivially generate without sentinel tokens, the result is a unified approach for both program synthesis (via left-to-right generation) and editing (via infilling)."
Figure 1: top
Figure 1: bottom

Why Do We Care?

InCoder provides an evolution in code generation, and apart from providing a significant step forward in the task, provides us to demo and utilize the system in question.
This advancement stand to not just make coding easier, but also potentially advancing the art of letting machines write their own code based on natural language instruction.
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.