Adept Open-Sources Persimmon-8B: A Permissively-Licensed 8 Billion Parameter Language Model
A new open source alternative to GPT-3 and Llama
Created on September 8|Last edited on September 8
Comment
In a recent announcement, Adept revealed the open-sourcing of its language model, Persimmon-8B. Designed to be versatile and powerful, this language model comes fully permissively-licensed under an Apache license. Unlike many language models that focus on scale, Persimmon-8B is optimized for a balance between performance and accessibility, making it suitable for single-GPU operations and even potentially for mobile devices.
What Sets Persimmon-8B Apart?
With approximately 9.3 billion parameters, Persimmon-8B utilizes a standard decoder-only transformer architecture. It has 4096 hidden sizes, 64 heads, and 36 layers, trained with a sequence length of 16K on a mixed dataset of text and code.
Performance on Limited Data: Despite being trained on just 37% of the data that similar models like LLaMA2 have been trained on, Persimmon-8B matches and even exceeds their performance metrics.
Extended Context Size: One of the standout features is its extended context size of 16K, which is four times larger than LLaMA2 and eight times that of GPT-3. This makes it more adept at handling larger contextual inputs.
Unused Embeddings for Extensions: The model comes with 70k unused embeddings which can be potentially useful for multimodal extensions.
Speed and Flexibility: Adept has also released unique inference code that combines the speed of C++ with the flexibility of Python. This allows for faster sampling while keeping the codebase manageable.
Squared ReLU Activations: Persimmon-8B utilizes squared ReLU activations, offering potential benefits for optimization during inference.
Evaluation Methodology
In the Multimodal Language Understanding (MMLU) benchmark, it performs on par with LLaMA2. In the Arc Challenge, which assesses reasoning abilities, the fine-tuned version (Persimmon-8B-FT) significantly outperforms other models, including GPT-3. It also shows strong capabilities in code generation and understanding in the HumanEval benchmark. For inference efficiency, Persimmon-8B incorporates techniques like operator fusion and CUDA graphs, closing the gap between flexible Python-based implementations and highly efficient C++ ones. Unlike models evaluated solely on likelihoods, Adept evaluates Persimmon-8B based on its interaction with humans and text generation capabilities. Overall, its open-source nature, robust architecture, and impressive benchmarks make it a formidable competitor in the realm of language models.
Overall
Adept’s Persimmon-8B offers a unique balance of performance and usability, providing a valuable resource for developers. Its release is part of Adept's broader mission to develop an AI agent that can assist people in various computer tasks. As the company continues to build upon this foundational technology, the AI community looks forward to future updates.
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.