Refact LLM: A Compact Yet Powerful Coding Language Model
Langage Models are getting smaller and smarter!
Created on September 4|Last edited on September 4
Comment
Today, Refact introduced Refact LLM, a 1.6 billion parameter language model tailored specifically for coding tasks. This compact model offers impressive capabilities, including real-time code completion with fill-in-the-middle (FIM) and chat. Notably, it achieves top-tier performance on the HumanEval benchmark while being substantially smaller than its competitors.
Key Specs
- Model with 1.6 billion parameters
- Supports 20 programming languages
- 4096 tokens context
- Code completion and chat features
- Pre-trained on permissive licensed code, commercially available
- HumanEval performance surpasses models with larger size
Smaller Size, Bigger Impact
The decision to keep the model at 1.6 billion parameters is a strategic one. In the AI community, there has been a tendency to produce ever-larger models. However, Refact LLM bucks this trend, emphasizing efficiency and accessibility. Its smaller size allows more researchers and developers to experiment, even with limited hardware. The model requires only 3GB of RAM and can run on most modern GPUs, making real-time code completion faster and more affordable than ever before.

Technicals
Diving into the technical details, the base model was trained on 1.2 trillion tokens of code combined with open text datasets. Special attention was given to fine-tuning, utilizing open code instruction-following datasets and a synthetic dataset, thus significantly boosting its base performance.
In terms of architecture, the model incorporates elements inspired by recent models like LLaMA and MPT-7B. For instance, the model leverages specialized optimization techniques like LiON and hyperparameters that include a batch size of 2 million tokens. The context size is set at 4096 with a dropout rate of 0.1, which gradually reduces to zero in the first 20% of the training.
Architectural Elements
LLaMA's Unique Architecture and Hyperparameters
Meta's LLaMA is an epitome of long-duration training with its 1T tokens, yet its architecture and hyperparameters are equally compelling. Unlike traditional transformer models, it omits bias terms in self-attention and the MLP, potentially improving weight decay. LLaMA also deviates from the sequential computation of self-attention and MLP, running them independently to speed up calculations. Its batch size of 4 million tokens is unusually large, aimed at providing a broad spectrum of data for sustained training.
Position Encoding in MPT-7B
MosaicML's MPT-7B uses the ALiBi position encoding method, which significantly departs from traditional absolute position encoding. ALiBi allows for extendable context sizes and provides a more robust representation of sequence positions, an advantage for models requiring extensive context.
StarCoder’s Multi-Query Attention
BigCode's StarCoder employs Multi-Query Attention, a resource-efficient adaptation of the traditional Multi-Head Attention. This technique allows for a smaller KV cache, enabling StarCoder to handle large contexts (up to 8192 tokens) with lesser computational overhead.
Replit-code-v1-3b’s Specialization
This model opts for a specialized dataset containing only code, disregarding other types of data. This focus allows for high performance in coding tasks, even though it lacks features like fill-in-the-middle capabilities.
Huge Upside
The smaller size of these models is a game-changer. Researchers with limited hardware can now experiment more freely, test innovative ideas, and contribute to the field's exponential growth. It's exhilarating to see how quickly advancements are being made, propelled by the momentum and excitement around AI.
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.