Plotting with Pandas and Weights & Biases: Step-by-Step Guide
This article provides a brief overview of plotting with Pandas using Weights & Biases for interactive visualizations. It includes code samples for you to follow.
Created on June 21|Last edited on October 24
Comment
In this piece, you'll learn about plotting with Pandas with Weights & Biases. It follows that we should start with a quick definition of Pandas itself:
Pandas is perhaps the most popular library in Python for data analysis and manipulation. It's a core part of any data scientist's toolkit, along with NumPy and matplotlib.
It is widely used in open source thanks to its intuitive API, and it lies at the core of myriad data transformation pipelines. In fact: most visualization libraries in Python build on top of Pandas. There are some alternatives to Pandas that are gaining popularity, such as polars and dask.
Pandas comes pre-baked with data handling capabilities. Some of the basic ways in which you can use Pandas is:
- Load data files in CSV format into a Pandas "DataFrame" format.
- Perform operations across columns using a simple interface, ex: df["info"] = df["feat_1"] + df["feat_2"]
- Perform queries on your dataframe using simple Python expressions, ex: df.query("info > 100")
Here's what we'll be covering:
Table of Contents
Introduction to Data Plotting and Its Significance in Visualizing DataDifferent Types of Plots Available in Pandas for Various Data ScenariosLeveraging Weights & Biases for Data PlottingBenefits of Using Weights & Biases for Plotting With PandasIntegration of Weights & Biases With Pandas for Interactive and Collaborative Data PlottingBest Practices for Effective Data PlottingChallenges and ConsiderationsConclusion
Introduction to Data Plotting and Its Significance in Visualizing Data
Data plotting is an important part of any typical data science workflow. Once we have the raw data available to us, it's important that we take some time to explore and understand it. Simply plotting the data can give us ideas on how the model the data.
For instance, if we try a scatter plot between the various features and see a linear correspondence between our target variable and feature, we know we could at least create a baseline using a linear regression model using that particular feature. Or if we do a scatter plot and observe clusters, we can try a clustering model.
Different Types of Plots Available in Pandas for Various Data Scenarios
Pandas provides us with a number of inbuilt plots that we can leverage to understand the data.
For more functions, please have a look at the Chart Visualization and Table Visualization guides on the official Pandas documentation.
💡
Leveraging Weights & Biases for Data Plotting
Weights & Biases provides us with a ton of tools to visualize our data:
- The wandb.plot.line() function allows us to log a custom line plot and the wandb.plot.line_series() function allows us to plot multi-line plots.
- We can also create interactive Custom Charts by providing data as W&B Tables and then plotting using the wandb.plot_table() function.
Benefits of Using Weights & Biases for Plotting With Pandas
Luckily Weights & Biases also provided us with the option to log custom matplotlib or plotly charts to a Weights & Biases workspace.
Just pass a matplotlib plot or figure object to wandb.log(). By default, we'll convert the plot into a Plotly plot. If you'd rather log the plot as an image, you can pass the plot into wandb.Image. We also accept Plotly charts directly.
Integration of Weights & Biases With Pandas for Interactive and Collaborative Data Plotting
It is also extremely simple to log any Pandas dataframe to a workspace by converting it into a W&B Table:
import wandbimport pandas as pd# Read our CSV into a new DataFramepandas_dataframe = pd.read_csv("data.csv")# Convert the DataFrame into a W&B Tablewandb_table = wandb.Table(dataframe=pandas_dataframe)# Add the table to an Artifact to increase the rowwandb_table_artifact = wandb.Artifact("wandb_artifact",type="dataset")wandb_table_artifact.add(wandb_table, "table")# Log the raw csv file within an artifact to preserve our datawandb_table_artifact.add_file("data.csv")# Start a W&B run to log datarun = wandb.init(project="...")# Log the table to visualize with a run...run.log({"data": wandb_table})# and Log as an Artifactrun.log_artifact(wandb_table_artifact)
Below we can see a Table with a mix of data types (audio, text, numerical, and images). Feel free to explore the data and try out various features. The table below is fully interactive.
Run set
12
Best Practices for Effective Data Plotting
Choosing Appropriate Plot Types for Different Data Scenarios
Choosing the correct approach to plot your data is immensely important and will be the make-or-break factor in figuring out if you can model your data correctly. A simple line chart will not work for all types of data, and sometimes even a single plot might not be the best idea. Experiment with various types and the number of charts, perhaps even know the recommended plot type for your data.
Optimizing Visualization Aesthetics and Readability
This is a tricky one! Your style of aesthetics might be different from your manager's, which can be immensely different from your clients. I, for instance, love dark themes, but creating plots with a dark background doesn't really help in most cases.
- Use contrastive colors to highlight the key information!
- Make sure that the order in which you plot has an inherent story and leads to a conclusion rather than being a collection of abstract facts about your data.
- Make sure that you try and maintain a consistent order and size of your plots.
For reference, I highly recommend checking out highly rated exploration themed notebooks on Kaggle in your data domain for amazing examples.
💡
Iterative Exploration and Refinement Using Weights & Biases
Leverage Weights & Biases! Once you have your data logged in a W&B Table, you can leave your code editor completely and explore your data in the UI. Run queries, sort your data, apply filters, and much more!

Challenges and Considerations
Handling Large Datasets and Performance Optimization
Handling large datasets using Pandas can sometimes land you in trouble. There are some general guidelines you can follow to get the most out of Pandas
- Try to pre-process your dataset in smaller batches and then merge it into a single dataframe
- Consider writing Cython for Pandas
- Use the "numba" backend to utilize JIT (Just-in-Time) compilation
There are other libraries that you can also consider, such as dask, cudf, and polars which boast better performances than Pandas.
Addressing Missing or Inconsistent Data for Accurate Plotting
Handling missing and inconsistent data is an important skill one must have. Pandas provide us with a number of utility functions to deal with them.
- You can use the .dropna(), method to remove missing values from your dataframe. You can apply this function across rows or columns or a subset of the data.
- Your dataframe can contain multiple types of data types, each requiring a different type of specific cleaning technique. If it's numeric features, then ideally, try to normalize them. If there are dates, consider ensuring they are in the same format. If they are categorical, then consider one-hot encoding them.
Conclusion
In this article, you read through a brief overview of plotting with Pandas and how using Weights & Biases to explore your data can lead to valuable insights.
To see the full suite of W&B features, please check out this short 5-minute guide. If you want more reports covering the math and "from-scratch" code implementations, let us know in the comments down below or on our forum ✨!
Check out these other reports on Fully Connected covering other fundamental development topics like GPU Utilization and using Dropout.
Enhancing Performance with SciPy Optimize and W&B: A Deep Dive
In this article, we provide a brief overview of the scipy.optimize submodule interactive visualizations, using Weights & Biases to help us out along the way.
How To Create an Image Classification Model in JAX/Flax
In this article, we learn how to create a simple image classification model in Flax with a short tutorial complete with code and interactive visualizations.
How To Write Efficient Training Loops in PyTorch
In this tutorial, we cover how to write extremely memory- and compute-efficient training loops in PyTorch, complete with share code and interactive visualizations.
Preventing The CUDA Out Of Memory Error In PyTorch
A short tutorial on how you can avoid the "RuntimeError: CUDA out of memory" error while using the PyTorch framework.
How to Initialize Weights in PyTorch
A short tutorial on how you can initialize weights in PyTorch with code and interactive visualizations.
PyTorch Dropout for regularization - tutorial
Learn how to regularize your PyTorch model with Dropout, complete with a code tutorial and interactive visualizations
Add a comment
Weave: (empty)
I'd like to see what the Table looks like here, but I'm getting "Panel table error. See console" and cannot see anything visualized. 1 reply
Iterate on AI agents and models faster. Try Weights & Biases today.