Using W&B Tables and Reports for Enhanced EDA in MLOps

Learn to use W&B Tables and Reports for Enhanced EDA. A sample from the free MLOps certification course from Weights & Biases.
Dave Davies
Created on December 22|Last edited on December 28
Comment
Exploratory data analysis (EDA) is a crucial step in the machine learning process, allowing us to understand and verify the quality of our data before diving into model building. However, EDA can often be a time-consuming and manual process involving the use of multiple tools and often requiring collaboration with multiple team members. 
With Weights & Biases, you can easily explore your data, inspect images and labels, analyze data distribution, and document it all within a collaborative report  you can share with your team. This not only saves time but also helps to document your processes and make them more reproducible. 
In this video from our MLOps course, we introduce a better way to do EDA using W&B Tables and Reports:
﻿
﻿
Transcript (from Whisper)So in the past, I used Jupyter Notebooks for EDA but this was not very convenient. When I communicated with subject matter experts, I would often save a notebook as an HTML file and share it over email. This was not very efficient. We communicated over emails and over Slack. Often we needed meetings to discuss.
Fortunately, now we can do this more efficiently with Weights & Biases. We want to document our data exploration if we want to share it with our team.
So let's add the Table to a Report. 
First, let's create a new report. We'll give it a new title. And let's take a look at the distribution of attributes in this dataset.
All of the images are in the same dataset, so maybe we can group it by the dataset column.
We can now see the histograms and the distribution of our attributes and labels here:
﻿
project("av-team", "mlops-course-001").artifact("bdd_simple_1k").membershipForAlias("808d8032fa441f1bb467").artifactVersion.file("eda_table.table.json")
 - 1 of 1
File_Name
P1
P2
Images
background
road
traffic light
traffic sign
person
vehicle
bicycle
1
bdd1k
Dataset
You can see that P1 and P2 are imbalanced, so we should take a closer look at it.
Most of the classes are represented in each image. One exception is bicycle. There are very few bicycle images, so we need to put some special attention to it.
Let's note down our findings in the report.
A table is represented as a weave expression in the report, so we can copy this expression. Now we will create a new weave panel and paste that expression here. This will essentially duplicate this table.
We are interested in the P1 attribute, so let's now go by this attribute.
﻿
project("av-team", "mlops-course-001").artifact("bdd_simple_1k").membershipForAlias("808d8032fa441f1bb467").artifactVersion.file("eda_table.table.json")
 - 4 of 929
File_Name
P2
Images
Dataset
background
road
traffic light
traffic sign
person
vehicle
bicycle
1
0027eed2
2
00aad4a0
3
00d79c0a
4
00e69ee0
P1
﻿
We can maybe sort this table now and make it a bit bigger. Let's look at the images (ed note: you can click any of the images in the "images" column to do this)
Maybe we can change the settings of this column. Maybe let's remove the mask for now.
Now what we can see is when we grouped by this P1 attribute, similar images appear in each group. It looks like they were taken from the same car, maybe on the same day.
This can mean that this images come from the same video. There are potentially different frames from the same video. This will be important in the future when we split our data across training, validation and test sets. So let's note down this finding. 
Let's also look now at one of the other classes.
We'll duplicate the table again. Let's group it by bicycle now.
﻿
project("av-team", "mlops-course-001").artifact("bdd_simple_1k").membershipForAlias("808d8032fa441f1bb467").artifactVersion.file("eda_table.table.json")
 - 2 of 2
File_Name
P1
P2
Images
Dataset
background
road
traffic light
traffic sign
person
vehicle
1
0
2
1
bicycle
We know there are very few bicycle images in this table. In fact, we can see there are 59 in our data set.
Let's then change the settings. And now we're specifically interested in the bicycle class, so let's just look at these annotations. It looks like the bicycle annotations are very small and potentially noisy, so it may be hard for our models to learn this class.
Let's note down our findings again.
After completing our analysis, we can save our report and share it with our team members. See you in the next video!
﻿
Add a comment
Tags: Articles, Tables, Intermediate, MLOps, Course
Iterate on AI agents and models faster. Try Weights & Biases today.