Phind unveils Phind-405B and Phind Instant

The Code answer engine unveils new models!
Created on September 6|Last edited on September 6
Comment
Phind is an advanced AI answer engine specifically designed for developers. They have unveiled their latest AI models, Phind-405B and Phind Instant, designed to offer faster and higher-quality AI answers for technical and general queries. These new models mark a significant upgrade in Phind’s capability to assist developers, researchers, and curious minds.
Phind-405B: The New Flagship ModelPhind-405B is Phind’s new flagship model, built on the Meta Llama 3.1 405B architecture. It is specifically designed for programming and technical tasks, allowing it to handle a vast amount of context with a 128K token input capacity, including a 32K context window available at launch. This model is now accessible to Phind Pro users and excels in real-world tasks such as designing and implementing web applications. For instance, when tasked with creating landing pages for Paul Graham's Founder Mode, Phind-405B autonomously performed multiple searches and generated different design options.
The model’s performance is further highlighted by its 92% score on HumanEval (0-shot), matching Claude 3.5 Sonnet’s level. Phind-405B was trained on 256 H100 GPUs using FP8 mixed precision, managed through DeepSpeed and the MS-AMP library. This approach not only maintains high training quality but also reduces memory usage by 40%.
Phind Instant ModelAddressing the speed limitations commonly associated with AI-powered search, Phind has introduced the Phind Instant model, which runs at up to 350 tokens per second. This model, based on Meta Llama 3.1 8B, is optimized for fast and efficient inference using a Phind-customized NVIDIA TensorRT-LLM server on H100 GPUs. Techniques such as FP8 training, flash decoding, and fused CUDA kernels enhance the model’s speed, offering users quick, high-quality answers.
Phind’s new model runtime improvements also include predictive web result fetching, which reduces search latency by up to 800 milliseconds. A new, larger embeddings model improves text relevance determination, enhancing the overall search experience without compromising speed.
Enhanced Search ExperienceThe updated Phind models bring several improvements to the search process, aiming to make it as efficient and responsive as possible. By integrating advanced latency reduction techniques and scaling up the embedding model, Phind has achieved a balance of speed and precision that sets it apart from traditional search engines. These advancements are part of Phind’s broader mission to help developers and innovators quickly experiment and bring new ideas to life.
Phind continues to evolve as an answer engine, bridging the gap between traditional search and advanced AI interaction. By integrating the latest advancements in AI and working closely with partners such as Meta, NVIDIA, and AWS, Phind is positioned to set new standards in AI-assisted search and discovery.
﻿
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.