NVIDIA GB300 NVL72: A leap in AI performance and efficiency

The NVIDIA GB300 NVL72 is delivered as a fully liquid-cooled, rack-scale system (CoreWeave's first GB300 rack shown above). Its dense design unifies cutting-edge GPUs, CPUs, and networking to push AI performance to new heights.
Dave Davies
Created on July 4|Last edited on July 4
Comment
The NVIDIA GB300 NVL72 represents a groundbreaking advancement in AI computing, purpose-built for the new era of AI reasoning and inference. This platform packs an unprecedented 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace CPUs into a single fully liquid-cooled rack, creating a powerhouse for data centers.
The GB300 NVL72 is optimized for large-scale inference (or "test-time" AI workloads), delivering massive improvements in speed and efficiency over previous-generation systems. In fact, compared to NVIDIA’s prior Hopper-based platform, GB300 NVL72 provides up to 10× higher user-level responsiveness and 5× greater throughput per watt, culminating in a 50× leap in overall AI output for reasoning models. This dramatic gain means AI services can be vastly more responsive and cost-efficient, enabling more complex models and real-time applications than ever before.
﻿
﻿
﻿CoreWeave – a close NVIDIA partner – has become the first hyperscale cloud provider to deploy the GB300 NVL72, highlighting the strategic collaboration that gave CoreWeave early access to this cutting-edge platform. In the sections below, we’ll explore the GB300 NVL72’s key features, its performance breakthroughs, and how platforms like CoreWeave are harnessing it to accelerate the next generation of AI.
Key features and technological breakthroughs of the NVIDIA GB300 NVL72The GB300 NVL72 introduces several key features and technological breakthroughs that set it apart from previous AI platforms:
Rack-Scale “AI Factory” Design: The GB300 NVL72 is a single 72-GPU, 36-CPU platform in a rack-scale form factor, meaning all components are integrated within one rack for maximum density. The entire system is 100% liquid-cooled, allowing it to safely dissipate the heat of a ~120–130 kW rack and enabling much higher power density than air-cooled setups. This liquid cooling is crucial for modern AI data centers – it supports more GPUs per rack, improves energy efficiency, and prevents thermal throttling, ultimately yielding greater performance in a smaller footprint. In practical terms, liquid cooling lets the GB300 NVL72’s high-wattage GPUs run at full tilt continuously, which is essential for AI factories and cloud deployments.
Blackwell Ultra GPUs with Enhanced AI Compute: At the heart of the platform are 72 NVIDIA Blackwell Ultra GPUs, representing NVIDIA’s latest GPU architecture (Blackwell). The “Ultra” variant is an especially powerful model: each Blackwell Ultra chip combines two reticle-sized GPU dies and is loaded with 288 GB of HBM3e high-bandwidth memory, delivering roughly 15 petaflops of FP4 AI performance per GPU. Compared to standard Blackwell GPUs, the Ultra GPUs feature 1.5× more AI compute FLOPS and specialized optimizations for inference. Notably, the Tensor Cores in Blackwell Ultra are “supercharged” with 2× faster Transformer attention processing and new low-precision modes like FP4, which significantly boost throughput for large language models and reasoning tasks. These GPUs enable the platform to achieve exaflop-class performance – over 1.1 exaFLOPS (1,100 petaflops) of tensor processing at FP4 precision (or up to 1.4 exaFLOPS with sparsity) across the whole rack. This represents a massive compute leap, empowering larger and more complex AI models than ever before.
Grace CPUs and Unified Memory: Complementing the GPUs are 36 NVIDIA Grace CPUs, based on Arm Neoverse V2 cores, which provide powerful CPU throughput and memory bandwidth to feed the GPUs. Each Grace CPU features 72 cores, and together the 36 CPUs contribute 2,592 cores in the system, all optimized for AI and data center workloads. The Grace CPUs bring high memory capacity and efficiency – up to 18 TB of CPU memory (LPDDR5X) is available in the rack, with exceptional bandwidth up to 14.3 TB/s aggregate. Critically, the GB300 NVL72’s design tightly couples CPUs and GPUs via 5th-generation NVLink connectivity, creating a unified memory architecture. NVIDIA NVLink provides an enormous 130 TB/s of total bandwidth between the chips, allowing GPUs to communicate with each other and with CPU memory at high speed. This means large datasets and models can be shared seamlessly across the 72 GPUs, and the CPUs can directly assist in memory-intensive tasks without bottlenecks. The platform supports a total of up to 40 TB of fast memory (GPU + CPU memory combined), making it possible to hold extremely large model parameters and context windows in memory for inference. This memory and networking integration is a breakthrough for AI at scale – GPUs can effectively act in unison on giant models that previously had to be sharded across slower network links.
Integrated High-Speed Networking (ConnectX-8 SuperNIC): Each GPU in the GB300 NVL72 is directly equipped with high-bandwidth network connectivity. NVIDIA introduced a novel ConnectX-8 “SuperNIC” I/O module that hosts dual ConnectX-8 NIC devices, providing an aggregate 800 Gb/s of network bandwidth to each GPU. In other words, every GPU can communicate out to other racks or storage at 800 Gb/s, an enormous per-node bandwidth that ensures data isn’t the bottleneck for distributed workloads. These NICs support both the latest NVIDIA Quantum-X800 InfiniBand and Spectrum-X Ethernet networks, with full RDMA (remote direct memory access) capabilities for ultra-low latency data transfers. This flexible networking means the GB300 NVL72 can be deployed in either InfiniBand-based AI supercomputers or Ethernet-based cloud data centers while still achieving “in-network computing” performance levels. The result is that multi-rack AI jobs (spanning hundreds or thousands of GPUs) can be scaled efficiently, since each GPU can send/receive data at 800 Gb/s to its peers without saturating the network. This networking prowess is especially vital for large inference pipelines and model-parallel workloads.
BlueField-3 DPUs and Secure Multi-Tenancy: In addition to CPUs and GPUs, the GB300 NVL72 platform also integrates 18 NVIDIA BlueField-3 DPUs (data processing units). These specialized accelerator cards offload networking, storage, and security tasks from the CPUs, enabling smarter and more secure data handling. With the BlueField-3 DPUs running NVIDIA’s DOCA software framework, the system can provide line-rate networking for cloud tenants up to 200 Gb/s per connection, with features like isolated virtual networks and encryption handled in hardware. This is particularly important in multi-tenant AI cloud environments – it means that even when the GB300 NVL72 is offered as a service (as on CoreWeave’s cloud), different users’ jobs can each get secure high-speed network throughput without interfering with one another. The BlueField DPUs essentially act as intelligent traffic cops and security guards for the data flowing in and out of the AI rack, all while the GPUs remain focused on crunching AI computations. By integrating DPUs, NVIDIA ensures the GB300 NVL72 is cloud-ready and enterprise-grade, with no compromises on performance or security when sharing this powerful resource.
Overall, these features make the NVIDIA GB300 NVL72 a game-changing platform for AI. It combines unprecedented GPU horsepower, massive unified memory, and bleeding-edge networking in one ready-to-deploy rack. Each of these elements builds on NVIDIA’s prior innovations (Hopper GPUs, NVLink, InfiniBand, etc.) but pushes them to new levels – more GPUs, more memory, faster interconnects – all aimed squarely at maximizing AI inference performance.
The GB300 NVL72 is so advanced that it outclasses its predecessor (the NVIDIA GB200 NVL72 platform) on every front. The new system delivers roughly 1.5× higher AI performance than the GB200 NVL72 (thanks to the upgraded GPUs), offers 1.5× greater GPU memory capacity (21 TB vs ~14 TB) for larger models, and doubles the network bandwidth per GPU (800 Gb/s vs ~400 Gb/s on the previous generation). These generational improvements underscore how the GB300 NVL72 is truly a leap forward – enabling AI models and deployments that simply weren’t feasible before.
Performance comparison: NVIDIA GB300 NVL72 vs. NVIDIA Hopper platformOne of the most impressive aspects of the GB300 NVL72 is how dramatically it outperforms NVIDIA’s previous-generation architecture (based on the Hopper GPU platform, e.g. the HGX H100 systems). NVIDIA has quantified these gains in terms of AI inference at scale, and the numbers are striking. Compared to a Hopper-based deployment, the GB300 NVL72 can deliver about a 10× boost in user responsiveness and 5× improvement in throughput for AI inference workloads.
﻿
﻿
In practical terms, “user responsiveness” refers to how quickly an AI system can respond to each individual query or task (measured in transactions per second per user), while throughput measures how many total queries or tasks can be handled per amount of power (transactions per second per megawatt, which reflects energy efficiency). A 10× increase in per-user responsiveness means that applications like chatbots, search engines, or recommendation systems built on GB300 NVL72 hardware will feel far more instantaneous, even under heavy loads – reducing latencies from, say, one second down to a tenth of a second in ideal cases. Meanwhile, a 5× throughput-per-power gain means that an AI data center could handle five times the workload at the same power budget, or equivalently, achieve the same workload with one-fifth the energy, a huge win for operational cost and sustainability.
Taken together, these improvements compound into a roughly 50× leap in overall output for reasoning model inference when using the GB300 NVL72 platform. NVIDIA describes this in terms of “AI factory output” – essentially the total work (inferences) done by a cluster. A 50× jump is transformative: models that took an hour to run responses on Hopper might complete in a little over a minute on GB300, or a cluster that could serve 1 million queries per day before might handle 50 million now. This opens the door to deploying much larger AI models in production and serving far more users or more complex requests within the same time frame. For instance, massive language models with greater than a trillion parameters could be deployed with reasonable latency on this platform, whereas previously their response times would be impractical for real-time use.
It’s important to note that these performance gains come not just from raw hardware speed, but also from architectural optimizations targeting inference workloads. The Hopper generation (e.g. NVIDIA H100 GPUs) was already a training and inference workhorse, but GB300 NVL72’s Blackwell Ultra GPUs are even more finely tuned for inference tasks like LLM “reasoning.” They introduce features such as the FP4 data type and twice-accelerated attention mechanisms, which specifically speed up large Transformer model inference. Additionally, the increased memory per GPU means models don’t need to be as heavily sharded or quantized to fit, avoiding memory-bound slowdowns. The improved interconnect (NVLink and networking) also means that scaling out to multiple GPUs has less overhead, so the 72 GPUs can work together more efficiently than 72 Hopper GPUs could. All these factors contribute to the headline 50× number in different ways – it’s not simply a clock speed or FLOPS increase, but a full-system optimization for inference throughput.
Another angle of comparison is power and cooling. The Hopper-based systems (like NVIDIA DGX H100) typically used air or hybrid cooling and had lower per-rack power density. The GB300 NVL72’s liquid-cooled design and higher density allow it to achieve those performance gains within a single rack, whereas achieving equivalent performance with older tech might require a many-rack cluster (with more networking overhead and energy loss). NVIDIA has indicated that an AI factory built on GB300 NVL72 can achieve far superior performance-per-watt – aligning with that 5× throughput per MW metric – which translates to significant operational savings for large-scale deployments.
When pitted against the previous NVIDIA Hopper platform, the GB300 NVL72 is in a different league for inference-heavy workloads. It responds faster, serves more users, and uses power more efficiently than its predecessor. This step-change in performance is akin to a generational leap one might expect after several years of hardware and software advances; yet NVIDIA has delivered it in a single generation, underscoring the acceleration of AI hardware development. For organizations running AI services, this means upgrading to GB300 NVL72 isn’t just a marginal improvement – it could be the difference between needing a whole warehouse of servers for a given workload versus a single rack, or the difference between a laggy user experience and an instantaneous one. The GB300 NVL72 thus sets a new baseline for AI performance, and it will likely define the standard that future platforms are measured against.
Specifications of the NVIDIA GB300 NVL72Having discussed the features and performance holistically, let’s dive into the key specifications of the NVIDIA GB300 NVL72 platform. These specs highlight the sheer scale and power of the system’s hardware:
GPU Configuration: 72× NVIDIA Blackwell Ultra GPUs, fully interconnected via NVLink and NVSwitch. Each GPU provides up to ~15 PF4 (FP4 precision) peak compute (approx 15 petaflops) and contains 288 GB of HBM3e memory on board. The GPUs are arranged to work in concert as one giant accelerator, with NVLink connecting all 72 GPUs into a single high-speed cluster.
CPU Configuration: 36× NVIDIA Grace CPUs (Arm Neoverse V2 architecture). In total, the platform has 2,592 CPU cores (72 cores per Grace CPU) to handle preprocessing, data loading, and CPU-bound portions of workloads. The Grace CPUs are paired with the GPUs via NVIDIA’s NVLink-Chain and NVLink-C2C technology, ensuring tight coupling between CPU and GPU memory systems. The Grace chips also contribute significant memory capacity and are highly energy-efficient (they offer 2× the energy efficiency of traditional server CPUs for data center workloads).
Memory Capacity: GPU Memory: Up to 21 TB of HBM3e memory across the 72 GPUs (each GPU’s 288 GB adds up to ~20.7 TB total). This memory is extremely fast, with an aggregate bandwidth of ~576 TB/s (each GPU providing on the order of 8 TB/s of bandwidth from its HBM stacks). CPU Memory: Up to 18 TB of system memory (LPDDR5X ECC memory across the Grace CPUs, via SOCAMM modules) with around 14.3 TB/s total bandwidth. The platform’s architecture treats these memories in a complementary way – the GPU memory is ultra-fast for active data, while the CPU memory can hold additional data and be accessed via NVLink when needed. Combined, the GB300 NVL72 offers nearly 40 TB of addressable memory for AI workloads, enabling it to host extremely large models or datasets entirely in-memory for inference.
Interconnect (NVLink and NVSwitch): NVIDIA’s 5th-generation NVLink provides the backbone of intra-rack communication. The aggregate NVLink bandwidth is 130 TB/s across the system. In essence, all 72 GPUs are connected in a seamless mesh through NVSwitches, so any GPU can talk to any other GPU at high speed. This allows the GPUs to share data and model parameters as if they were on a single super-GPU with 21 TB of HBM. The ultra-high NVLink bandwidth is critical for scaling deep learning models across dozens of GPUs without hitting communication bottlenecks.
External Networking: As mentioned earlier, each GPU is equipped with a dual-ConnectX-8 “SuperNIC” module providing 800 Gb/s of network I/O per GPU. In terms of port configuration, this could be seen as, for example, two 400 Gb/s ports per GPU, or eight 100 Gb/s links – but from a spec perspective, 800 Gb/s per GPU is the headline. Across 72 GPUs, that’s a staggering potential bandwidth (57.6 Tb/s total) out of the rack, though in practice it will be used for redundancy and east-west clustering rather than all at once. The networking supports both Ethernet (Spectrum-X) and InfiniBand (Quantum-X800) fabrics at full rate, with RDMA and GPUDirect capabilities for direct GPU-to-GPU communication between racks or to storage. In addition, the system integrates 18 NVIDIA BlueField-3 DPUs on the networking side. While not always listed in “speeds and feeds,” these DPUs offload network processing and can provide up to 200 Gb/s of secure throughput per host/tenant connection, as well as enhanced capabilities like packet processing and storage acceleration.
Power and Cooling: Each Blackwell Ultra GPU in the GB300 NVL72 has a high TDP (Thermal Design Power) to achieve its performance – on the order of several hundreds of watts per GPU (exact figures are not publicly stated, but estimates range around 700–1000 W per GPU given the total rack power). The entire 72-GPU rack is designed for ~120 kW+ of power draw. Cooling this requires direct liquid cooling to each component. The spec is that the GB300 NVL72 rack is fully liquid-cooled, often with warm-water cooling loops that can handle >90% of the heat load. This enables deployment in AI data centers without exotic cooling infrastructure – a properly equipped facility with coolant distribution can accommodate these racks that dissipate the heat equivalent of dozens of household ovens running simultaneously. The benefit is a much higher compute density per square foot and lower cooling power overhead compared to massive air-cooling setups. (For context, the previous GB200 NVL72 rack consumed roughly 130 kW and was also liquid-cooled; the GB300 likely is in a similar or slightly higher power class, but performing 1.5× more work in that envelope).
Other Specs: The platform supports all major AI frameworks and comes with NVIDIA’s software stack (CUDA, AI Enterprise, etc.) optimized for Grace+Blackwell. It also supports Multi-Instance GPU (MIG) partitioning on the GPUs, though in inference scenarios the full GPUs are often used together. Storage is not part of the GB300 spec itself, but such a rack would be connected to high-speed storage (e.g., NVMe over Fabrics) to stream model data; the 800 Gb/s NICs and BlueField DPUs facilitate that with minimal latency.
In short, the NVIDIA GB300 NVL72’s specs read like those of a small supercomputer concentrated into a single rack. Key figures like “72 GPUs,” “40 TB memory,” “130 TB/s NVLink,” and “800 Gb/s per GPU networking” underscore just how far the envelope has been pushed for a unified AI system. Each of these numbers is a record-setter in its category, reflecting the fact that this platform is built for frontier-scale AI.
It’s also evident how well-balanced the design is: the enormous GPU compute is matched with enormous memory and I/O, so no one subsystem starves the others. For anyone interested in deploying or utilizing this platform, these specs translate to the ability to run very large models or many models concurrently with maximum efficiency. Whether it’s a single gigantic model that needs all 72 GPUs or dozens of smaller inference jobs assigned across the GPUs, the GB300 NVL72 has the hardware resources to handle it with ease.
Enhancing AI reasoning inference and throughputA primary goal of the GB300 NVL72 is to supercharge AI reasoning and inference workloads – that is, the “thinking” tasks AI models perform once they’ve been trained. Recently, the demands of inference, especially for large language models and other reasoning-centric AI, have skyrocketed. These tasks involve evaluating massive neural networks with long context inputs, which requires both intensive computation (e.g. large matrix multiplications and attention operations) and the ability to handle large amounts of data (model weights and inputs) quickly.
The GB300 NVL72 introduces several advancements specifically to meet these needs and dramatically boost inference throughput.
First, the Blackwell Ultra GPUs in this platform come with architectural enhancements targeting inference performance. As mentioned, the Tensor Cores now support FP4 (4-bit floating point) precision and offer 2× faster attention processing than previous GPUs. The attention mechanism is critical in transformer-based models (like GPT-style LLMs) for handling long text and reasoning over it. By doubling the speed of attention calculations, the GB300 NVL72 can handle much larger context windows or more simultaneous sequences without latency blowups. Additionally, FP4 support means the GPUs can process twice as many operations in the same time (compared to 8-bit precision) when models are optimized to use 4-bit weights/activations. Many large inference models can be quantized to 4-bit with minimal accuracy loss, immediately yielding higher throughput. The Blackwell Ultra’s 1.5× increase in raw compute FLOPS (versus even baseline Blackwell GPUs) also contributes to speeding up inference computations across the board.
Second, memory capacity and throughput have been elevated to enhance inference on large models. Each GPU having 288 GB of HBM3e means that extremely large models (potentially up to tens of billions of parameters per GPU in 4-bit form) can reside in memory local to the GPU. For example, a model with 200 billion parameters in 4-bit would be ~100 GB – which fits well within 288 GB, allowing room for activations and overhead. The 1.5× larger memory per GPU (compared to the prior gen) directly translates to the ability to handle larger batch sizes and longer sequence lengths, which are key for maximizing throughput in reasoning tasks. Moreover, the high memory bandwidth (each GPU ~8 TB/s) ensures that these massive models can be fed to the compute units without stalling. The overall effect is that even the largest context AI models (such as GPT-4 style models with very long prompts) can be run at full speed, utilizing the GPUs efficiently. NVIDIA specifically notes that GB300 NVL72 boosts throughput for “the largest context lengths,” meaning scenarios like summarizing book-length texts or doing multi-step reasoning over long dialogs become much more feasible.
Third, the scale of the GPU cluster (72 GPUs with NVLink) enhances inference through parallelism. In many inference workloads, especially for huge models, you might distribute different parts of a model across multiple GPUs (model parallelism) or serve many queries in parallel (data parallelism). The GB300’s internal NVLink fabric allows these 72 GPUs to work together on a single model as if they were one unit with shared memory. This is crucial for “test-time scaling” – if a model doesn’t fit in one GPU, it can be partitioned across dozens of GPUs with minimal communication overhead, so the inference still runs efficiently. Alternatively, if you have a flood of inference requests (think of millions of users querying an AI assistant), the 72 GPUs can be carved into groups or operate independently to serve many requests concurrently. The fifth-gen NVLink and NVSwitch ensure that whichever approach is used, the GPUs remain fed with data and intermediate results without slowdowns. In essence, seamless communication between every GPU means the platform can achieve near-linear scaling in throughput as you use more GPUs for inference – a feat not easily achievable in loosely connected clusters.
Additionally, we should mention how these advancements translate to real-world improvements. NVIDIA has quantified that the GB300 NVL72 yields a 50× increase in reasoning model inference output over the previous generation. This metric combines the effect of all enhancements – faster compute, more memory, better efficiency – and it directly benefits AI applications like conversational agents, recommendation systems, real-time analytics, and more. For example, an AI reasoning task that might have taken 50 servers full of GPUs in the past could potentially be done with a single GB300 NVL72 rack now. This orders-of-magnitude boost means AI researchers and engineers can iterate faster (getting inference results in seconds that used to take minutes or hours) and deploy far more ambitious services (like assistants that consider huge knowledge contexts or perform complex multi-step reasoning in real time).
One specific area of impact is user-facing AI services. The 10× higher per-user throughput means that an AI application can support many more simultaneous users or requests before response times degrade. This is especially important for interactive AI (chatbots, interactive agents) where maintaining low latency is key to a good experience. Another area is energy efficiency – the 5× throughput-per-watt improvement implies that even if you aren’t pushing the system to its absolute limits, it can do the same work as older systems using a fraction of the power. This not only lowers running costs but also reduces the carbon footprint of AI inference, which is a growing concern as usage scales up globally.
The NVIDIA GB300 NVL72 is engineered to excel at AI reasoning and inference in a way that no previous platform has. By augmenting compute, memory, and interconnect specifically for inference demands, it breaks through previous bottlenecks. Whether it’s a matter of cranking out answers from a gigantic LLM, performing complex reasoning for autonomous systems, or serving endless personalized recommendations, the GB300 NVL72 dramatically increases how much can be done in a given time (and at what cost). This marks a significant step towards AI systems that are not only powerful in theory but also highly scalable and practical in production settings.
Networking capabilities with the NVIDIA ConnectX-8 SuperNICNetworking is a critical component of any large-scale AI platform, and the NVIDIA GB300 NVL72 introduces state-of-the-art networking capabilities to ensure that data flows as freely as the compute allows. Central to this is the NVIDIA ConnectX-8 SuperNIC architecture. Each GB300 NVL72 includes specialized I/O modules, each hosting two ConnectX-8 network interface devices, effectively bonding them to provide an 800 Gb/s network pipe for each GPU in the system. This is a staggering amount of bandwidth at the disposal of each GPU – for perspective, 800 Gb/s is roughly 100 GB of data per second. In practical terms, a single GPU in the GB300 could read or write the equivalent of the entire content of a Blu-ray disc in just a fraction of a second over the network. This level of bandwidth means that even if the GPUs are processing enormous datasets or model weights that reside on external storage or across multiple racks, the network won’t be a limiting factor.
The ConnectX-8 SuperNIC not only provides raw bandwidth but also advanced features for distributed computing. It supports Remote Direct Memory Access (RDMA), which allows GPUs to exchange data directly between their memories across the network without involving the CPUs or operating system in the data path. RDMA significantly lowers latency and CPU overhead for inter-GPU communication across servers. The GB300 NVL72, with ConnectX-8 and RDMA, can thus extend its high-speed NVLink network beyond a single rack – multiple GB300 racks can be linked into an even larger cluster where GPUs in different racks communicate almost as efficiently as if they were in one system. This is essential for “AI factories” where you might have several racks of GB300 working together on training or inference for ultra-large models.
Another key aspect is the flexibility between InfiniBand and Ethernet. The ConnectX-8 SuperNIC and NVIDIA’s networking stack make it possible to use either NVIDIA Quantum-X800 InfiniBand or Spectrum-X Ethernet as the underlying network technology. InfiniBand is often preferred in supercomputing and tightly-coupled AI clusters for its low latency and offload capabilities, whereas Ethernet (especially NVIDIA’s Spectrum-X with enhancements for AI) is common in cloud and enterprise data centers. The GB300 NVL72 supports both, at full 800 Gb/s rates, giving deployers the choice to integrate the system into different network environments without losing performance. In an InfiniBand deployment, the Quantum-X800 switch infrastructure would provide features like in-network computing (accelerating reductions, broadcasts, etc., in hardware), which can further speed up multi-node AI workflows. In an Ethernet deployment, Spectrum-X switches paired with ConnectX-8 still offer advanced congestion control and AI-friendly scheduling to achieve excellent performance. Essentially, NVIDIA has ensured that no matter what fabric you plug the GB300 into, it will operate at peak bandwidth and efficiency.
One more piece of the networking puzzle in GB300 NVL72 is the inclusion of BlueField-3 DPUs mentioned earlier. These DPUs sit on the network path and manage a lot of the heavy lifting for network and storage tasks. From a capabilities standpoint, the BlueField-3 can handle things like packet routing, firewalling, encryption/decryption, and even data parsing at line rate. For cloud providers, this means they can offer the GB300’s power to multiple clients securely – the DPUs enforce isolation between different users’ traffic and keep performance high by offloading overhead from the CPUs. NVIDIA’s DOCA framework running on BlueField enables “software-defined” networking functions and tenant isolation at up to 200 Gb/s speeds per DPU. So, if a GB300 NVL72 rack is partitioned among several users or jobs, each can be given a dedicated slice of network bandwidth that the DPU will manage with QoS guarantees. This is a huge boon for cloud-based AI services, where you want many customers to share a big resource without interfering with each other.
In simpler terms, the GB300 NVL72’s networking means data can be fetched from anywhere and delivered to the GPUs extremely fast, and multiple systems can be tied together to act as one larger system with minimal network penalty. For an AI inference scenario, imagine a cluster of GB300 racks behind a load balancer serving requests – the 800 Gb/s pipes ensure each rack can ingest request data and return results to users at high rates, and if the racks need to talk to each other (for example, to ensemble multiple models or share a large database of vectors), they can do so swiftly. For training scenarios, if one rack isn’t enough for a gargantuan model, multiple racks can share the training load with InfiniBand-level latency. The ConnectX-8 SuperNIC and the networking backbone effectively remove the network as a bottleneck.
To illustrate the benefit: previously, one of the challenges in scaling AI was that if you tried to use dozens of GPUs across servers, the interconnect (like 100 Gb or 200 Gb Ethernet/InfiniBand) could become a chokepoint, limiting speedup. With 800 Gb/s per GPU, the interconnect bandwidth has leaped ahead of typical workload requirements, meaning the GPUs can stay busy with data always available. This best-in-class networking throughput and smart networking (RDMA, in-network compute, DPU offloads) together enable what NVIDIA calls “AI factories” – essentially data centers that can operate like assembly lines for AI tasks, with minimal idle time and maximum efficiency.
The NVIDIA ConnectX-8 SuperNIC and networking stack in the GB300 NVL72 ensure that this powerful AI platform is not confined by data movement constraints. It provides the speed and smarts for data to move at the pace of computation, whether within the rack or across the globe. This means faster results, higher utilization of the GPUs, and the ability to scale out AI deployments easily. For anyone building large AI systems or services, the networking capabilities of GB300 NVL72 are as much a selling point as the raw compute – they guarantee that the compute can be fully utilized in real-world workflows.
CoreWeave: Early access deployment and partnership with NVIDIAOne of the notable storylines around the NVIDIA GB300 NVL72 is how quickly it has moved from announcement to actual deployment, thanks in large part to NVIDIA’s collaboration with key partners. CoreWeave, a specialized cloud provider focusing on AI, has emerged as the first hyperscaler to deploy the GB300 NVL72 platform in the wild. In July 2025, CoreWeave announced the industry’s first bring-up of the NVIDIA GB300 NVL72, indicating that they had integrated this new platform into their cloud offering even before general availability. This early access was made possible by CoreWeave’s close relationship with NVIDIA (and with hardware vendors like Dell), positioning CoreWeave at the forefront of AI infrastructure innovation.
﻿
﻿
CoreWeave’s deployment consists of the Dell PowerEdge XE9712-based GB300 NVL72 rack – essentially a turnkey rack-scale system provided by Dell Technologies and NVIDIA – installed in one of CoreWeave’s data centers. Visuals of the installation (in partnership with data center provider Switch) show a full rack with liquid-cooling hookups ready to go. According to Dell, this initial rack was shipped to CoreWeave to be fully assembled and tested in the U.S., marking the first time the GB300 NVL72 was operational outside of NVIDIA’s labs. CoreWeave’s ability to get this running quickly speaks to a deep engineering partnership: their team worked alongside Dell and NVIDIA to power on the system, integrate it with CoreWeave’s cloud management, and validate its performance.
The significance of CoreWeave having early access to GB300 NVL72 is multi-fold. For one, it means that CoreWeave’s customers get a first-mover advantage – AI companies and researchers using CoreWeave’s cloud can experiment with and leverage the GB300’s unparalleled performance right away, ahead of their competitors. In a field as fast-moving as AI, having access to 50× faster inference can accelerate development cycles and enable new applications (for example, startups can deploy more advanced AI services without procuring their own supercomputer-class hardware). It also underscores CoreWeave’s positioning as a cutting-edge AI cloud; they similarly were first to offer previous top-end systems like NVIDIA H100s and the prior GB200 NVL72 in the cloud. This consistent access to NVIDIA’s latest and greatest has made CoreWeave an attractive platform for AI teams that don’t want to invest in on-premise infrastructure but still want state-of-the-art performance.
From NVIDIA’s perspective, working with CoreWeave as a launch partner for GB300 NVL72 shows confidence in CoreWeave’s capability to handle and scale new AI technology quickly. CoreWeave has built out an infrastructure (power, cooling, cluster management) that can accommodate something as demanding as the GB300 rack from day one. Their data centers are equipped for liquid cooling and high power density, which many traditional clouds or enterprises might not yet be ready for. This means NVIDIA can rely on CoreWeave to showcase what GB300 can do in a real data center serving real workloads, essentially providing a reference deployment for others to follow. In press releases, CoreWeave and NVIDIA together highlighted how this platform enables “frontier-scale AI” and allows users to train and deploy multi-trillion-parameter models with ease on the CoreWeave cloud.
CoreWeave’s integration of the GB300 NVL72 into its cloud also involved more than just racking the hardware. The company optimized its software stack to fully exploit the platform. They extended their Kubernetes-based scheduling and Slurm-on-Kubernetes systems to be aware of the 72-GPU units, ensuring jobs can request and utilize those resources. They also improved their observability and management tools — for instance, CoreWeave’s Rack Lifecycle Controller (RLCC) and new “Cabinet Wrangler” dashboards were updated to monitor these liquid-cooled racks at fine granularity. CoreWeave even streams hardware-level telemetry (temperatures, utilization, errors) directly into user-facing interfaces via Weights & Biases, so that AI developers can see how the GB300 infrastructure is behaving during their runs. This kind of deep integration is crucial; it ensures that users can treat the GB300 NVL72 not as an exotic machine requiring manual fiddling, but as a seamless part of the cloud where they can spin up instances, run code, and get insights just like on any other resource.
Another point CoreWeave emphasized is that their earlier deployment of NVIDIA GB200 NVL72 instances yielded impressive performance, and that experience paved the way for GB300. They published benchmarks showing, for example, nearly 3× better per-GPU inference performance on GB200 (Grace-Hopper based) instances compared to standard H100 instances. With GB300’s arrival, CoreWeave expects even larger leaps. They’ve stated that the GB300 will “define the next generation of AI applications” by enabling unprecedented speed for inference workloads. For industries like film (generative filmmaking, as quoted by a CoreWeave client) or any AI-driven field, having CoreWeave deploy this platform means those industries can start exploring new possibilities right now, not years later.
CoreWeave’s early deployment of the NVIDIA GB300 NVL72 demonstrates the importance of cloud partnerships in advancing AI infrastructure. It showcases how a nimble, focused company can collaborate with a tech giant to bring cutting-edge hardware to market faster than traditional routes. For the broader community, it means that the GB300 NVL72’s capabilities are not just theoretical – they’re already available via CoreWeave’s cloud. This tight-knit relationship with NVIDIA (and OEMs like Dell) effectively gave CoreWeave a head start, and in turn gives AI developers a head start, in leveraging the most powerful AI platform currently available. As more providers and enterprises eventually get their hands on GB300 NVL72, CoreWeave’s early success will serve as a blueprint for how to integrate and operate such high-end systems at scale.
ConclusionThe NVIDIA GB300 NVL72 platform stands out as a monumental leap in AI infrastructure, combining record-breaking performance with thoughtful design for real-world deployment. Over the course of this article, we explored how its 72 Blackwell Ultra GPUs, 36 Grace CPUs, and high-speed fabric work in harmony to deliver unprecedented throughput for AI reasoning tasks. Key innovations – from fully liquid-cooled rack integration, to 800 Gbs per-GPU networking, to massive unified memory – all serve a singular purpose: to accelerate AI workloads to levels previously unattainable. The performance comparisons make it plain: with up to 50× more inference output than the prior generation, the GB300 NVL72 enables AI models and services that would have been impractical due to latency or cost just a year ago.
This platform isn’t just a paper tiger; through partnerships with firms like CoreWeave, it’s already powering real applications and proving its worth in the field. CoreWeave’s early deployment illustrated how quickly the GB300 NVL72 can be brought online and the immediate benefits it offers to AI practitioners – from researchers building giant models to startups delivering AI-driven features to users. The collaboration between NVIDIA and CoreWeave exemplifies the synergy needed to push the boundaries of technology: cutting-edge hardware meets agile, specialized cloud service, resulting in broader and faster access to these AI capabilities for everyone.
For businesses and tech enthusiasts, understanding the GB300 NVL72 is more than an academic exercise; it’s a glimpse into the future of AI data centers. We see a trend toward high-density, liquid-cooled AI “factories” where compute, memory, and networking are abundant and tightly integrated. This will reduce the friction in scaling AI – both scaling up (to more complex intelligence) and scaling out (to more users and scenarios). The GB300 NVL72, with its Grace-Blackwell architecture and Spectrum/InfiniBand networking, is a blueprint of that future: where every component is optimized for AI throughput and nothing is wasted on general-purpose overhead. It even accounts for operational needs like multi-tenant security and observability, showing a maturity in design for cloud and enterprise use.
The NVIDIA GB300 NVL72 can be seen as the cornerstone of next-generation AI infrastructure. It empowers engineers to push AI models to trillions of parameters, to use longer contexts and more sophisticated reasoning, and to deploy those models with responsiveness that users expect in modern applications. As this platform becomes broadly available (in CoreWeave’s cloud and beyond), we can anticipate a wave of innovation: new AI applications that leverage the 50× boost in performance to deliver capabilities we haven’t seen before. From real-time language translators and ultra-smart assistants to advanced robotics and scientific simulations, the GB300 NVL72 will be a catalyst enabling those ambitious projects to run at viable speeds.
Ultimately, the story of the GB300 NVL72 is one of excellence in engineering – the result of NVIDIA’s relentless drive and the support of an ecosystem that includes manufacturers, data center specialists, and cloud providers. For anyone interested in the cutting edge of AI, the GB300 NVL72 isn’t just a list of specs; it’s a herald of what’s to come. And if history is any guide, the innovations here will trickle down to more accessible levels in time, benefiting the wider tech landscape. For now, though, the GB300 NVL72 firmly holds the crown, and it’s exciting to witness the AI breakthroughs it will enable in the immediate future.
SourcesNVIDIA, “Designed for AI Reasoning Performance & Efficiency – NVIDIA GB300 NVL72,”
CoreWeave, “CoreWeave Leads the Way with First NVIDIA GB300 NVL72 Deployment,” Peter Salanki, July 3, 2025.
DataCenterDynamics, “Dell ships first Nvidia GB300 NVL72 to CoreWeave,” Charlotte Trueman, July 03, 2025.
NVIDIA Developer Blog, “NVIDIA Blackwell Architecture Highlights,” (via product page features).
CoreWeave Blog, “Why You Need Liquid Cooling for AI Performance at Scale,” Tess Sohngen, April 25, 2025.
CoreWeave Press Release, “CoreWeave and Weights & Biases Announce New Products and Capabilities, Helping AI Developers Iterate Faster on Models and Agents,” June 18, 2025.
CoreWeave Blog, “CoreWeave Leads the Way with First NVIDIA GB300 NVL72 Deployment,” (observability section).
﻿
﻿
Add a comment
Tags: Community Posts, LLM, Hardware
Iterate on AI agents and models faster. Try Weights & Biases today.