Skip to main content

Nvidia's New AI Chip

NVIDIA roles out a big upgrade to the GH200 chip, which likely will lead to huge improvements in products like ChatGPT and more!
Created on August 9|Last edited on August 9
In a recent announcement, Nvidia unveiled their an upgrade to their GH200 chip, which is designed to run complex artificial intelligence models. This launch is significant news for any company requiring GPUs for training and deploying neural networks.

Big Software Requires Big Hardware

There's a trend in the AI community that echoes, "Bigger is often better." Nvidia's new chip is no exception. With its foundation built on the same GPU as Nvidia's H100, it supplements with 141 gigabytes of cutting-edge memory and a 72-core ARM central processor. Such enhancements are pivotal in an era where neural network models, like the Llama 2 with its 70 billion parameters, are becoming larger to capture intricate patterns and complexities.

The Beauty of Parallelism

While GPUs were initially designed for computer game graphics, their role has been redefined in the modern AI age. They're now the backbone for neural network training. The GH200, a testament to GPU evolution, addresses the substantial computational demands of models like Llama 2. Nvidia's chip is optimized for parallel processing, making it ideal for neural network operations. CPUs, despite their power, lack the thousands of cores present in GPUs, making GPUs like Nvidia's GH200 more efficient for such tasks.

The bottleneck

VRAM (Video Random Access Memory) is becoming more important as the size of LLM’s grow. While popular GPUs like Nvidia RTX 3090 offer VRAM capacities of 24GB and 16GB, the GH200's 141GB makes deployment of large models very important. This massive capacity is especially relevant given that massive models, depending on their specifications, might require hundreds of gigabytes of VRAM. For reference, the Lama 2 70B model requires close to 60 GB of VRAM to run inference. Dividing neural network models across multiple GPUs, or model parallelism, has always been seen as a potential solution to manage large models that require more VRAM than any single GPU. But it brings with it a plethora of challenges—communication overheads, synchronization issues, and distribution complexities. The GH200, with its ample VRAM, alleviates some of these challenges by enabling larger models to reside on a single GPU, thereby reducing the necessity for model parallelism.

Cheaper Models

Beyond training, the real-world deployment of AI, especially large language models (LLMs), depends heavily on inference costs. Jensen Huang, Nvidia's CEO, highlighted this aspect, noting the significant drop in the inference cost of LLMs with their new chip. For organizations, cheaper and efficient inference paves the way for broader AI application deployment. With Nvidia's GH200, businesses can now leverage advanced neural networks without the burden of high costs. Nvidia has strategically slated the launch of the more advanced GH200 with HBM3e for Q2 2024 [1].

The article: [1] https://www.cnbc.com/2023/08/08/nvidia-reveals-new-ai-chip-says-cost-of-running-large-language-models-will-drop-significantly-.html
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.