Error Analysis with Hamel Husain: Using W&B Tables for Model Evaluation
Learn how to use W&B tables for ML error analysis with Hanel Husain. This video is a sampling from the free MLOps certification course from Weights & Biases!
Created on December 28|Last edited on December 28
Comment
Evaluating the performance of your machine learning models is an important step in the development process, and error analysis can provide valuable insights into where your model is struggling and how to improve it. In this video from our MLOps course, Hamel Husain shows you how to use Weights & Biases Tables to perform error analysis and evaluate your model's performance on the validation dataset.
By looking at edge cases and understanding where your model is making mistakes, you can get a better understanding of its limitations and areas for improvement. Hamel provides practical tips and guidance on how to use Weights & Biases to get rich insights into your model's performance. If you're looking to improve your model evaluation skills, be sure to watch this video.
Transcript (from Whisper)
The next thing I want to discuss is error analysis. And error analysis is one of the most critical parts of model evaluation. It's really the opportunity to improve your models and gain deep insights into where your model could be wrong or where it's struggling or where you can improve your model.
And honestly, this is where I personally have the most fun in the entire machine learning process because this is where I get to build more intuition about the domain and how my model is interacting with that domain.
So what is error analysis and how does weights and biases fit into that?
So the way error analysis works is we're going to log a table to two Weights & Biases.
You may have already seen these tables in previous lessons, but essentially it's table with predictions and ground truth and comparing those ground truths to predictions with all the various metrics.
And what we're going to do is we're going to look at these images and we're going to see where the model is most wrong, for example, low IOU. And the goal is to try to gain an intuition about where the model is struggling, about what issues might be happening in terms of the data, the labels, or so on and so forth. And it always surprises me how much I learn when I look at the data like this.
And it is super important, no matter how much EDA that you do, this kind of guided approach to looking at data is extremely helpful. And if you go through this error analysis, you might notice a lot of things that might be relevant, such as poor lighting, obstruction of certain objects, so on and so forth. Especially with these perception tasks, where humans are quite good at perception tasks and we're good at intuiting where issues may be occurring.
And I'll show you some examples. And a key output of error analysis is not only these categories.
So when you're going through error analysis, you want to keep tabs of issues that you see
and try to categorize them. You want to look at around a hundred or so examples at least and categorize any issues that you see. And what you'll often see is things like incorrect labels, even in your training set. And I always find issues like these. And so I think error analysis is a really key part of model evaluation.
What you have here is a table of images and all of the IOU scores that I might care about. And this is for the validation set.
So when I'm going through this model, I want to, or evaluating this model, my goal is to look at the performance of the model on the validation set, but specifically for specific images to get an idea of where the model might be struggling.
And the way you do that is to go through each metric that you care about and sort of sort the different data points by poorly performing, you know, their metrics are not great. And so let me do an example of that.
So I'm going to take this road IOU and I'm going to sort it in ascending order so that the lowest IOUs are at the top. And then we can go and look at some of these images. So let's look at this image.
Okay.
So this image is interesting. I don't even see a road or anything like that. So what you see here is the image followed by the ground truth mask and then the predictions mask.
So if you keep track of which one is which, you can always hide it here like this. So you know, this is the mask, this is the ground truth.
If I hide it, you see that's the middle one and you can arrange it in many ways. So this is the, you could stack them on top of each other. I don't tend to like that as much. Sometimes it's helpful.
You can see this followed by both ground truths. These are the ground truth and the prediction stacked on top of each other. I like this view, I like to start with this view to see the predictions and the ground truth sort of segmented out. And it's interesting that, okay, so I can see in the predictions, so let me just look at the predictions.
It does, it is predicting road, road here. I'm not sure there's anything particularly that interesting about this picture to be honest.
So I'm not really that, yeah, this is not that interesting, but I just want to give you sort of an idea of how to look at it. I think these next ones might be quite interesting.
So let's take a look at this one here. Okay.
And let's unmask that. So this is really interesting. We have a road, it's clearly a road in front of us. And I can see that there's no road being classified. So there is no road in the ground truth. So let's look at that.
So what I can do is if I click road here, well, you can see like this orange area is vehicles and you can expand this to see, okay, orange area is vehicles, road is not even in the ground truth.
Here is clearly road and this model is trying to predict road. And let's take a look, let's take a, let's stack it on top of each other. Let's even take this off.
The model is doing a pretty reasonable job at the road part of it. See, these are the predictions.
I can see do none to unselect everything and just say road and that's pretty reasonable. There's this blip up here. So what is the takeaway here?
The takeaway from this specific example is the labels are wrong in our ground truth clearly because we can, if I do this and select road, there isn't, they don't have road.
So we need to go back to our labeling process and potentially figure out what's going on, why roads are not being labeled appropriately. Okay, let's look at another one.
For example, let's see if there's anything interesting in this one. Oh, this is interesting. This is a parking lot. Okay. That's really interesting.
So our model is predicting the entire parking lot as being a road essentially. And you know, the ground truth seems to be correct. The ground truth is just doing, showing vehicles and showing the background. However, you will notice the ground.
We missed something here.
I mean, this is camper van is not a vehicle and that is wrong. Let's see what our model is predicting for vehicles just while we're here, just out of, just for fun. And we'll see your model, it does identify that camper van.
So again, we have some labeling issues potentially where we're not labeling things correctly and that is causing error in this issue. So we might want to look at parking lots. And if I find several examples of parking lots, then I might say, okay, I want to acquire, I want to acquire a data set that has more parking lots or make sure that I try to find more examples of parking lots for my data set and differentiate those.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.