Skip to main content

W&B Models performance improvements: Fast logging, immediate results

We've been hard at work improving the performance of our Models product and we're excited to share how things stand today
Created on May 5|Last edited on May 6
Building foundation models is incredibly resource-intensive, requiring vast amounts of data, substantial processing power, and millions of iterations. And Weights & Biases, the AI developer platform to build AI agents, applications, and models with confidence, delivers the performance and scalability the world’s leading AI foundation model builders demand. After all, given the cost and the time dedicated to building these models, it is critical that tracking AI and machine learning experiments adds no additional latency or negatively impacts training performance in any way.
Delivering a perfect user experience means lightning fast logging during experiments and immediate access to results on the dashboard. Machine learning engineers and data scientists often analyze upwards of hundreds of thousands of metrics over tens to hundreds of thousands of runs. A responsive, snappy interface that renders tables, charts, and graphs quickly ensures that users discover the right answers faster. Time spent waiting for dashboard content to materialize is no more acceptable than latency while logging data during model building experiments.
Performance and user experience are a priority at Weights & Biases. We know we are not the only AI developer platform in the market and we take our customer feedback seriously. When it comes to providing the best possible tools and features, our work is never done. We're proud of our performance track record and very excited about recent product improvements allowing us to confidently assure our customers that they can build their AI models while sacrificing minimal resource consumption to logging, taking advantage of scalable performance, and feeling secure that their experiments are logged reliably.
We'll begin with a pair of links where you can check out a large runs demo and the benchmarking script we used to test our performance before we dig into details of our performance improvements.



Large runs demo

Click here to test the enhanced interface with the large runs demo and see the latest performance upgrades in action. Experience smooth responsiveness examining results from over 72,000 runs, each between 20,000 and 100,000 steps, and tracking more than 183,000 metrics. Click around and explore W&B Models’ simple, quick, and user-friendly interface.


Benchmarking transparency

Weights & Biases values transparency. The benchmarking scripts used for these tests are open-source and you can download them and run them yourselves.



Experiment tracking performance

Let’s begin by examining recent W&B Models performance updates on the logging side and then discuss the interface enhancements that deliver a quicker, more responsive user experience.
Enterprise model training and fine-tuning requires high volumes of data and heavy duty hardware that can fulfill the demands of iteration and optimization. Weights & Biases’ role when it comes to training models is the tracking and logging. Given the effort it takes to train and fine-tune models, it’s imperative that the tracking and logging do not slow things down.
Recent product enhancements to W&B Models and the W&B SDK ensure that customers are now able to log more data, at higher speeds, with minimal resource consumption. As evidence of these improvements, let’s take a look at a customer’s past experience building an LLM using the previous version of the W&B SDK, and then their more recent experience using a newer version.

Customer story: Faster, more efficient experiment tracking

The latest version of W&B SDK is delivering better overall performance for Weights & Biases customers. One of these customers, a leading AI research team, was building an LLM that required tracking 100,000 metrics per run. Their training infrastructure was similar to GCP’s A3 Mega with 8x NVIDIA H100 80GB GPUs.
Building the model using an older version of the W&B SDK required an experiment consisting of 50 runs and took approximately 33 hours to complete. Let’s take a closer look at the time required for the entire experiment and then broken down by just the training time and just the logging time.
  • Total runtime: 50 runs x (40 mins training + 20 mins logging) = 3,000 minutes = 50 hours
  • Training time only: 50 runs x 40 minutes training = 2,000 minutes = 33.33 hours
  • Logging time only: 50 runs x 20 minutes logging = 1,000 minutes = 16.67 hours
The bottom line here: One third of the total runtime consisted of logging.
After upgrading to the new version of the W&B SDK, the overall runtime was reduced significantly. Logging time decreased a staggering 75%, meaning not only a reduction in time, but cost as well. Let’s take a look at the time required for the same experiment using the new W&B SDK.
  • Total runtime: 50 runs x (40 mins training + 5 mins logging) = 2,250 minutes = 37.5 hours
  • Training time only: 50 runs x 40 minutes training = 2,000 minutes = 33.33 hours
  • Logging time only: 50 runs x 5 minutes logging = 250 minutes = 4.17 hours
The bottom line: The customer completed their entire model training experiment 25% faster using the new W&B SDK versus the old SDK.

Now that we’ve seen the performance improvement provided by the new W&B SDK first hand, let’s dive deeper into the enhancements that made this possible.

Scalable performance: near-linear scaling

As the size of models grows larger, the systems required to build these models must grow larger to keep pace. Experiment tracking software must handle logging and organizing tens of thousands of data points in parallel and allow ML engineers and data scientists to analyze thousands of runs simultaneously. Weights & Biases is no stranger to the demands of model builders and understands the commitment required to stay ahead of the curve and give these practitioners what they need to be successful. There are several experiment tracking tools available to users and if W&B Models cannot support modern model building, our users will leave. But they don’t.
Unlike many experiment tracking tools that are susceptible to bottlenecks as data volume grows, W&B’s SDK delivers consistent, near-linear scaling. This performance is enabled by thread-safe, asynchronous logging, ensuring smooth and efficient data handling at any scale.
We’ve benchmarked performance across a wide range of compute configurations and run sizes. Measuring metrics logged per second by the number of vCPU’s, tests confirm that logging with W&B Models scales consistently and linearly, ensuring reliable performance no matter the workload.

Benefits of near-linear scaling include:
Predictable performance: Near-linear scaling allows for predictable performance as the workload increases. This makes it easier to plan and provision resources for machine learning experiments and production deployments.
Efficient resource utilization: The ability to accurately plan for and efficiently use additional compute resources increases predictability, minimizes waste, and reduces costs.
Support for large-scale workloads: The scalability of the W&B SDK handles the demands of large-scale machine learning workloads, including those involving distributed training and hyperparameter optimization.

Low logging overhead

Tracking experiments should not adversely impact the running of the experiment itself. After confirming linear scaling, we benchmarked the W&B SDK with an eye on core consumption to determine resource utilization.
Scaling benchmarking from recording roughly one thousand metrics per second to one million metrics per second, we tracked both millicore consumption and cost. Executing a large run producing one million metrics per second using the W&B SDK consumed only 16,000 millicores.



Reliability: Built to handle failures gracefully

Large training runs conducted while building foundation models are expensive not only financially, but also in terms of resources and time allocated. Top AI companies choose Weights & Biases because it delivers proven performance coupled with trusted reliability.
Unexpected failures should never cause data loss or prevent users from quickly resuming or restarting runs. W&B Models offers seamless recovery from failed runs, keeping your projects on track. Weights & Biases’ fault-tolerance features provide the reliability that customers depend on:
  • Resume functionality: Recover from crashes without restarting runs
  • Offline mode: Logs data locally when network connectivity is lost
  • Asynchronous logging: Ensures training performance is unaffected by logging delays
  • Local cache: Prevents data loss during transient network failures
  • Graceful shutdown: Ensures all logs are flushed before exiting

Meeting the needs of our customers

Longer runs, greater data volume, increasing metrics count. The trend is clear. As foundation model builders track hundreds of thousands of metrics and beyond, W&B Models scales to meet this growing demand. The Weights & Biases platform supports logging data with precision, enabling you to monitor, compare, and optimize your models at scale without sacrificing speed or clarity.
Demonstrating the ability of W&B Models to handle large runs, we benchmarked the W&B SDK while increasing the number of run steps from 100,000 to 1 million while tracking 100 metrics every step.

The steady, linear growth in resource usage as the number of steps reaches 1 million demonstrates clearly maintained stable and predictable performance. This consistent behavior highlights the system’s reliability and scalability, even under heavy workloads.
Next, we tested experiments tracking 1 million metrics per step. This time, we measured how logging time changed with increasing metric counts per run.

As the number of metrics increased, logging time remained stable and scaled linearly. Customers know and trust that Weights & Biases scales to meet the demands of their largest and most resource intensive model building experiments.

Latest performance results are great but our job never ends

As part of our customer story, we mentioned the performance benefits experienced when switching from an older version of the W&B SDK to the current version. We hope that the shared results encourage upgrading to the latest version of the W&B SDK to take advantage of the performance improvements. To drive the point home, the chart below illustrates the impact of installing the latest version, showcasing a dramatic reduction in logging time when tracking a run with 20,000 steps, highlighting Weights & Biases’ commitment to speed and scalability.


W&B Workspace: A user interface built for speed

AI model builders rely on scalable, efficient logging. To unlock the value of logged data, they require a fast, responsive interface that organizes and displays experiment results clearly for easy analysis and collaboration. Data retrieval and loading performance for W&B Workspace interface is measured along four axes:

The number of runs per project

As foundation model projects grow to hundreds of thousands of runs, W&B Models scales to meet the demand. Whether training classification models or large language models, W&B Workspace delivers fast, clear metrics, charts, and tables that track progress without delay.

The number of metrics per project

This measure tracks the number of unique keys logged to wandb.log across all project runs. As foundation model training generates more metrics, optimizing performance for large metric counts is a top priority. W&B Workspace processes hundreds of thousands of metrics with loading speeds matching or exceeding any other enterprise experiment tracking tool.

The number of steps per run

Benchmarking W&B Workspace reveals that this dimension has minimal effect on loading time. Accessing and visualizing runs with hundreds of thousands of steps is as efficient as displaying runs with just a few hundred steps.

Concurrent writes and reads per project

Large teams execute experiments simultaneously and require a datastore that can quickly and seamlessly handle heavy write concurrency. W&B Workspace delivers training and fine-tuning results instantly to the dashboard immediately after logging for real-time tracking.


Lightning-fast rendering of metrics, charts, and tables in W&B Workspace is a two-step process. First, W&B Models queries and retrieves data from storage. Then, results load and display in the browser. Recent model improvements speed up both steps, giving users quicker access to run results which means less waiting and more time to make informed decisions.
Before exploring some optimization strategies for speedier W&B Workspace performance, here are some recent benchmark results that highlight key improvements:

Number of runs per project

Building more accurate and efficient models today requires a greater number of experimentation runs than in the past. New feature updates have sped up Workspace loading time by over 2.5x on projects with over 300K runs.

Live run ingest and data availability

Backend query processing for live runs is now 320% faster. This means experiment tracking results are available more than three times faster in W&B Workspace.

Number of metrics per project

Since ML engineers and AI model builders can’t always predict the most valuable metrics, tracking them all is essential. Recent Workspace customization updates permit users to reorganize sections and panels loading only critical metrics boosting rendering speed by more than 50 times.

The data you need, without the wait

As AI and ML models grow larger, platforms must improve both back-end and front-end performance. ML engineers and data scientists can’t afford delays, even when launching dashboards showing results for tens of thousands of runs logging tens of thousands of metrics. Effective analysis demands focus, and focus requires instant access to every key project metric, chart, and table. When comparing thousands of runs across thousands of steps, loading delays are disruptive. W&B Workspace delivers all relevant data immediately, no matter the run count or size, using lists, tables, and interactive charts in customizable panels.


Database storage and querying optimizations

Improving datastore performance means enhancing write operations for faster logging and read operations for quicker access. Recent efforts to speed up page rendering have focused on optimizing queries that fetch results from storage. When tables and charts load slowly, query logs are often the first place to check. Beyond tweaking query syntax and caching results, changing how data is organized and stored on disk in the underlying datastore can help generate more efficient query plans. This reduces read times and reduces the amount of data retrieved. Weights & Biases is continually analyzing and improving back-end data access performance.
In addition to our own updates, Weights & Biases benefits whenever our data storage vendors release their own software upgrades with new features. New querying options and column types allow W&B Models to store and access data in ways not previously possible. For example, storing data as variable character fields or blobs may or may not be more efficient than using a new JSON column type. We test every option at scale to ensure the fastest experience for users.
As an example of ongoing efforts to improve Workspace performance, Weights & Biases has recently restructured the run data table. Run data is, unsurprisingly, among the most heavily active datasets for both reads and writes. In order to improve read speeds for run data, the run data table schema was reorganized using new, more efficient column types that allow reading only the data necessary to fulfill the W&B Workspace request, and reading that data faster. Updates like this result from scrutinizing the cost of reading even a single extra byte of information that is not absolutely necessary for rendering a chart visible to the end-user. Faster read speeds + less data = a quicker loading dashboard.
Fetching less data from the back-end results in a positive chain reaction where the browser, on the client-side, can more comfortably handle and render the smaller data volume into metrics, charts, and tables in the Workspace interface.

Browser loading and storage optimizations

Built for speed and interactivity, most modern interactive dashboards operate in a similar manner:
  • A user makes a request by clicking on a tab or button or entering a search term or filter
  • The request is translated into a query that is asynchronously executed against a back-end datastore
  • Results are retrieved and returned to the browser
  • The browser stores the returned data and concurrently delivers the appropriate data to the intended visualization(s)
Bottlenecks can arise at several points in this process. Problems occur when queries run slowly or too much data is pulled from the datastore or the browser tries to store more data than its memory allows. These issues cause lag, latency, or even out-of-memory errors that crash the browser and require a page refresh. Combining the steps listed above in the most efficient way possible is the recipe for a fast-loading, responsive interface.

Working with the world’s leading AI teams, we understand that every millisecond matters. In addition to back-end data enhancements optimizing storage and retrieval, we’ve also refined strategies for storing data in browser memory. W&B Workspaces load essential data instantly, defer additional data loading until implicitly requested via user activity, and selectively handle loading data that might be needed later.
To provide necessary data instantly and prepare for what’s next, Models uses a technique known as “lazy loading” where visible metrics, charts, and tables receive data to render immediately and additional data loads in the background while the user examines visible charts. This technique reduces lag by allowing the browser to first commit resources to rendering rather than receiving, organizing, and adding extra, perhaps unnecessary, data to in-memory storage. Loading and rendering methods that ensure prioritization of performance and the end-user experience rely on understanding user activity while viewing dashboards.
Constant product usage analysis permits Weights & Biases to understand user navigation and interaction patterns when using W&B Models and, specifically, project workspaces. Experiment tracking results are laid out in sections and panels and, despite being used in different ways by different users, this organizational structure organically informs the rendering optimization strategy. When a workspace is opened, data is retrieved from the back-end and only visual elements in expanded and visible panels and sections are rendered. Once loaded, W&B Workspace panels and sections respond smoothly and efficiently. Meanwhile, Workspace loads and prepares extra data to support future activity.

In sum, performance benefits attributable to the panel and section pagination strategy include:
  • Reduced memory usage: Only fetching data for visible panels, minimizing JS heap space
  • Faster load times: Initial load times decreased by avoiding overfetching of panels
  • Enhanced user experience: Smoother and more responsive interactions by incrementally loading panels

Back-end and front-end caching strategies

When discussing loading times and dashboard responsiveness, we'd be remiss not to mention caching. In modern web applications, caching plays a crucial role accelerating data access and improving user experience. While modern databases have built-in caching mechanisms, architects and developers must still ensure proper use of this caching functionality and also deployment of additional caching strategies both inside and outside of the datastore.
Understanding caching performance starts with cold and warm caches. “Cold cache” refers to the initial state when the cache contains no data. During a cold cache scenario, the system must fetch data directly from the primary data source, such as a database, which typically results in higher latency and longer load times. This “first-time” retrieval is essential to populate the cache but can temporarily negatively impact performance.
“Warm cache,” on the other hand, describes a cache that already contains relevant data from previous requests. When a warm cache is in place, the system retrieves the requested information directly from the cache, significantly reducing response times and easing the load on back-end resources. Although some validation or refresh of cached data may occur in certain situations, warm caching largely delivers near-instant data access.
Competitive claims and benchmarking exercises should always explicitly take caching into account and differentiate between cold and warm cache results. Results depicting anything other than an apples-to-apples comparison deserve scrutiny. Comparing warm-cache results to cold-cache results across tools is an unfortunately common occurrence. Weights & Biases encourages ML engineers, data scientists, and software developers to benchmark experiment tracking and dashboard responsiveness for themselves, training their own models with their own data on competing tools.
Data is persisted in an underlying database and stored impermanently in browser memory. Caching is used whenever possible in the datastore and browser to display metrics, charts, and tables faster. When it comes to optimizing dashboard loading, like so many other performance challenges, ingenuity often wins the day. Still, as a thorough competitive analysis will reveal, most experiment tracking tools rely on virtually the same techniques to improve loading speed. W&B Models immediately loads all data for panels visible on the page and, as you interact, preloads data for upcoming views, providing a smooth and uninterrupted experience.

How it comes together in W&B Workspace

AI and ML model training experiment tracking results live in W&B Workspace. Users can customize their workspace by choosing which panels to show or hide, arranging their order, and grouping them to fit their analysis style. Since different experiments require focus on different metrics, Workspace lets users create specific layouts for each project. Individual users can save personalized views to organize results or highlight key data when sharing with teammates.
W&B Workspace combines speed and simplicity providing an intuitive and delightful user experience. Unlike other platforms that slow users with complex and confusing interfaces, Weights & Biases flattens the learning curve for ML engineers, data scientists, and developers with an interface that is familiar and easy-to-understand. Navigating results and managing AI and ML projects is smooth and efficient. Building the perfect model is difficult enough. Weights & Biases streamlines tracking for training and fine-tuning experiments, removing barriers instead of creating them.


Wrapping up

Model builders expect fast logging during training and immediate access to results. Managing growing data volumes calls for innovative architectures and new, efficient strategies for data querying and retrieval. Weights & Biases focuses on delivering performance that meets and surpasses user expectations.
We are on the frontier of AI model and software development, meeting the challenges posed by the pace of innovation head on. Methods change, data volumes change, data types change. Our team is watching, listening, and continually optimizing around new techniques and behaviors, ensuring that the Weights & Biases user experience is the best available.
Ultimately, there’s no better performance test than running your own large-scale experiments. And when you do, please get in touch. We’d love to hear how well W&B Models scales for you.

Iterate on AI agents and models faster. Try Weights & Biases today.