AMD Introduces Instella: A Fully Open 3B Parameter Language Model

Created on March 6|Last edited on March 6
Comment
AMD has announced Instella, a new family of fully open 3-billion-parameter language models trained entirely on AMD Instinct MI300X GPUs. These models outperform other fully open models of similar sizes and compete strongly with leading open-weight models like Llama-3.2-3B, Gemma-2-2B, and Qwen-2.5-3B. Instella represents a major step in AMD’s efforts to establish itself in AI model development while demonstrating the power of its GPU hardware for large-scale training.  
Scaling Model Training on AMD Hardware  Building on the success of AMD’s previous 1-billion-parameter OLMo models, Instella significantly scales up both model size and training data. Instella was trained using 4.15 trillion tokens on 128 AMD Instinct MI300X GPUs, doubling the hardware resources used for OLMo’s training. The extensive dataset and improved training infrastructure contribute to Instella's strong performance. By training entirely on AMD hardware, Instella also showcases the viability of AMD’s ROCm software stack for AI model development, making AMD a competitive alternative to NVIDIA in the AI space.  
Fully Open-Source Release for Collaboration  One of Instella’s defining features is its complete open-source release. AMD is providing not only the model weights but also detailed training configurations, datasets, and code. This allows researchers and developers to replicate, modify, and improve upon the models, fostering open collaboration in AI development. Instella’s training was optimized using advanced techniques such as FlashAttention-2, Torch Compile, and Fully Sharded Data Parallelism (FSDP) with hybrid sharding, improving computational efficiency and reducing memory usage.  
Breakdown of the Instella Model Family  Instella consists of multiple models, each trained in stages to improve capabilities progressively. The base model, Instella-3B-Stage1, was pre-trained on 4.065 trillion tokens to establish a foundational understanding of natural language. Instella-3B was further refined with an additional 57.575 billion tokens, improving performance in tasks such as mathematical reasoning and question-answering.  
The instruction-tuned variants, Instella-3B-SFT and Instella-3B-Instruct, enhance the model’s ability to follow user instructions and align its responses to human preferences. Instella-3B-SFT underwent supervised fine-tuning on high-quality instruction-response pairs, while Instella-3B-Instruct was further optimized using Direct Preference Optimization (DPO) to improve the model’s helpfulness and accuracy in conversations.  
Benchmark Performance: Instella vs. Other Models  Instella models perform exceptionally well across multiple benchmarks, outperforming other fully open models and narrowing the gap with top open-weight models. Instella-3B surpasses fully open models like StableLM-3B-4E1T and OpenELM-3B by large margins in knowledge-based tasks such as MMLU, ARC, and GSM8k. It also competes closely with models like Llama-3.2-3B and Qwen-2.5-3B, despite being trained on significantly fewer tokens.  
﻿
The instruction-tuned Instella-3B-Instruct model excels in instruction-following tasks, outperforming other fully open models on benchmarks like TruthfulQA, GPQA, and AlpacaEval 2. It also competes well with leading open-weight models, demonstrating strong chat and reasoning capabilities.  
Future Directions and Open-Source Commitment  The launch of Instella highlights AMD’s commitment to open-source AI development. By making the model weights, datasets, and training configurations fully available, AMD encourages collaboration and further innovation in AI research. Future work on Instella will explore improvements in context length, reasoning ability, and multimodal capabilities, as well as scaling up the model size and dataset diversity.  
AMD's open-source approach, combined with the performance of Instella, positions the company as a strong competitor in the AI hardware and model development space. As AI research continues to evolve, Instella provides an accessible and powerful foundation for researchers and developers to build upon.
﻿
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.