Meta Releases Code Llama
Meta continues to release LLM's to the public, with Code Llama, a LLM designed specifically for programming!
Created on August 24|Last edited on August 24
Comment
Code Llama, a large language model (LLM), is now available and has been specifically designed to generate code from textual prompts. It stands out as a leading solution for publicly available LLMs on coding tasks.
The model is being offered in three distinct versions:
- Code Llama, the foundational code model
- Codel Llama - Python, tailored for Python coding
- Code Llama - Instruct, which has been fine-tuned to comprehend natural language instructions
It's released free of charge for both research and commercial applications, and the models are built on the foundation of Llama 2.
Several Versions
Code Llama showcases enhanced coding abilities. It not only supports code generation but also offers services such as code completion and debugging. The model covers several popular programming languages like Python, C++, Java, PHP, and more. Three different sizes of Code Llama, 7B, 13B, and 34B parameters, are available to cater to various latency requirements. They also provide stable generations with an extensive context of up to 100,000 tokens. This feature can be particularly beneficial in scenarios like debugging and crafting longer programs. Special variations such as Code Llama - Python and Code Llama - Instruct have been fine-tuned for specific needs. The former has been fine-tuned on 100B tokens of Python code, while the latter improves the understanding of human instructions. Interestingly, Code Llama is not recommended for general natural language tasks but is highly specialized for coding-specific assignments.
Performance
The performance of Code Llama has been assessed using coding benchmarks like HumanEval and Mostly Basic Python Programming (MBPP). It has outperformed open-source, code-specific LLMs, scoring high on both benchmarks. The 34B variant achieves a 53.7% HumanEval score (pass@1), which though lower than GPT-4's 67%, is higher than GPT-3.5's 48.1% score. The Code Llama models offer a notable advantage in context length, providing stable generations with up to 100,000 tokens, while being trained on sequences of 16,000 tokens. In contrast, GPT-4 for ChatGPT accommodates only 8K tokens or 32K tokens for the paid API version. This difference can be significant in programming, where longer context lengths enable more detailed analyses and operations. The increased token length could lead to serious advantages in certain tasks, like conditioning questions on an entire GitHub repository instead of just a few files. This extended context may lead to better understanding and interpretation of complex coding scenarios, making Code Llama a valuable tool for developers.


Open Source
The model's training details and weights are available on GitHub. A Responsible Use Guide has been released as well, detailing various facets of Code Llama's development, including its limitations, challenges, and mitigations.
The Blog Post:
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.