Skip to main content

Plotting with Pandas and Weights & Biases: Step-by-Step Guide

This article provides a brief overview of plotting with Pandas using Weights & Biases for interactive visualizations. It includes code samples for you to follow.
Created on June 21|Last edited on October 24
In this piece, you'll learn about plotting with Pandas with Weights & Biases. It follows that we should start with a quick definition of Pandas itself:
Pandas is perhaps the most popular library in Python for data analysis and manipulation. It's a core part of any data scientist's toolkit, along with NumPy and matplotlib.
It is widely used in open source thanks to its intuitive API, and it lies at the core of myriad data transformation pipelines. In fact: most visualization libraries in Python build on top of Pandas. There are some alternatives to Pandas that are gaining popularity, such as polars and dask.
Pandas comes pre-baked with data handling capabilities. Some of the basic ways in which you can use Pandas is:
  • Load data files in CSV format into a Pandas "DataFrame" format.
  • Perform operations across columns using a simple interface, ex: df["info"] = df["feat_1"] + df["feat_2"]
  • Perform queries on your dataframe using simple Python expressions, ex: df.query("info > 100")
Here's what we'll be covering:

Table of Contents



Introduction to Data Plotting and Its Significance in Visualizing Data

Data plotting is an important part of any typical data science workflow. Once we have the raw data available to us, it's important that we take some time to explore and understand it. Simply plotting the data can give us ideas on how the model the data.
For instance, if we try a scatter plot between the various features and see a linear correspondence between our target variable and feature, we know we could at least create a baseline using a linear regression model using that particular feature. Or if we do a scatter plot and observe clusters, we can try a clustering model.

Different Types of Plots Available in Pandas for Various Data Scenarios

Pandas provides us with a number of inbuilt plots that we can leverage to understand the data.
  • We can use the .plot() function to create line charts
  • We can use the "bar" argument for bar charts
  • We can use the "hist" argument for histograms
  • We can use the "box" argument for box plots.
For more functions, please have a look at the Chart Visualization and Table Visualization guides on the official Pandas documentation.
💡

Leveraging Weights & Biases for Data Plotting

Weights & Biases provides us with a ton of tools to visualize our data:

Benefits of Using Weights & Biases for Plotting With Pandas

Luckily Weights & Biases also provided us with the option to log custom matplotlib or plotly charts to a Weights & Biases workspace.
Just pass a matplotlib plot or figure object to wandb.log(). By default, we'll convert the plot into a Plotly plot. If you'd rather log the plot as an image, you can pass the plot into wandb.Image. We also accept Plotly charts directly.

Integration of Weights & Biases With Pandas for Interactive and Collaborative Data Plotting

It is also extremely simple to log any Pandas dataframe to a workspace by converting it into a W&B Table:
import wandb
import pandas as pd

# Read our CSV into a new DataFrame
pandas_dataframe = pd.read_csv("data.csv")

# Convert the DataFrame into a W&B Table
wandb_table = wandb.Table(dataframe=pandas_dataframe)

# Add the table to an Artifact to increase the row
wandb_table_artifact = wandb.Artifact(
"wandb_artifact",
type="dataset")
wandb_table_artifact.add(wandb_table, "table")

# Log the raw csv file within an artifact to preserve our data
wandb_table_artifact.add_file("data.csv")

# Start a W&B run to log data
run = wandb.init(project="...")

# Log the table to visualize with a run...
run.log({"data": wandb_table})

# and Log as an Artifact
run.log_artifact(wandb_table_artifact)
Below we can see a Table with a mix of data types (audio, text, numerical, and images). Feel free to explore the data and try out various features. The table below is fully interactive.

Run set
12


Best Practices for Effective Data Plotting

Choosing Appropriate Plot Types for Different Data Scenarios

Choosing the correct approach to plot your data is immensely important and will be the make-or-break factor in figuring out if you can model your data correctly. A simple line chart will not work for all types of data, and sometimes even a single plot might not be the best idea. Experiment with various types and the number of charts, perhaps even know the recommended plot type for your data.

Optimizing Visualization Aesthetics and Readability

This is a tricky one! Your style of aesthetics might be different from your manager's, which can be immensely different from your clients. I, for instance, love dark themes, but creating plots with a dark background doesn't really help in most cases.
  • Use contrastive colors to highlight the key information!
  • Make sure that the order in which you plot has an inherent story and leads to a conclusion rather than being a collection of abstract facts about your data.
  • Make sure that you try and maintain a consistent order and size of your plots.
For reference, I highly recommend checking out highly rated exploration themed notebooks on Kaggle in your data domain for amazing examples.
💡

Iterative Exploration and Refinement Using Weights & Biases

Leverage Weights & Biases! Once you have your data logged in a W&B Table, you can leave your code editor completely and explore your data in the UI. Run queries, sort your data, apply filters, and much more!


Challenges and Considerations

Handling Large Datasets and Performance Optimization

Handling large datasets using Pandas can sometimes land you in trouble. There are some general guidelines you can follow to get the most out of Pandas
  • Try to pre-process your dataset in smaller batches and then merge it into a single dataframe
  • Consider writing Cython for Pandas
  • Use the "numba" backend to utilize JIT (Just-in-Time) compilation
There are other libraries that you can also consider, such as dask, cudf, and polars which boast better performances than Pandas.

Addressing Missing or Inconsistent Data for Accurate Plotting

Handling missing and inconsistent data is an important skill one must have. Pandas provide us with a number of utility functions to deal with them.
  • You can use the .dropna(), method to remove missing values from your dataframe. You can apply this function across rows or columns or a subset of the data.
  • Your dataframe can contain multiple types of data types, each requiring a different type of specific cleaning technique. If it's numeric features, then ideally, try to normalize them. If there are dates, consider ensuring they are in the same format. If they are categorical, then consider one-hot encoding them.

Conclusion

In this article, you read through a brief overview of plotting with Pandas and how using Weights & Biases to explore your data can lead to valuable insights.
To see the full suite of W&B features, please check out this short 5-minute guide. If you want more reports covering the math and "from-scratch" code implementations, let us know in the comments down below or on our forum ✨!
Check out these other reports on Fully Connected covering other fundamental development topics like GPU Utilization and using Dropout.


Evan C
Evan C •  *
Weave: (empty)
I'd like to see what the Table looks like here, but I'm getting "Panel table error. See console" and cannot see anything visualized.
1 reply
Iterate on AI agents and models faster. Try Weights & Biases today.