Skip to main content

ImageNetX: Meta's Modified ImageNet Dataset To Target Vision Model Weaknesses

Meta AI has come out with a new dataset for pinpointing the deficiencies of computer vision models. It's open-source and available for download.
Created on November 18|Last edited on November 19
Computer vision, despite being a classic problem for machine learning, still faces many hard-to-diagnose issues. ImageNet is one dataset commonly used to train models for object recognition, but even with its large image count and many annotated classes, models trained with it can struggle to make strong classifications under certain circumstances.
To precisely target some of the potential issues an object recognition model might have, Meta AI researchers created ImageNetX, a new annotation dataset that piggybacks off of ImageNet, which adds new human-created annotations to define how certain images differ from standard, easy-to-understand examples for each object class.
The ImageNetX annotation list covers the full validation set of the ImageNet1k dataset (the most commonly used variation of ImageNet using only 1k classes of the 22k~ present in the original) consisting of 50k images, plus 12k additional randomly selected images to create a training set, for a total of 62k annotations.

Identifying issues with 16 new annotations

ImageNetX introduces 16 new annotations which can be attributed to each image. These annotations label how an image differs from "prototypical" examples in its class, such as partial obscurity, subject pose, lighting and color, and many others.
For example, the prototypical example of a cow would be a center-frame cow with no obstruction in good lighting. An image that might be compared is a cow far in the distance wading in the water at the beach; Upon first glance, especially at a low resolution, it appears there might be an animal, or even a log, in the water - but it's definitely a cow if you look close enough.
The class prototype images were determined using ResNet-50 to choose the most likely examples in each class; The most cow-like images amongst cow images.

Additionally, two free-form annotations, justification and one-word differences, are included and used in different ways to better validate and optimize ImageNetX:
  • To put emphasis on the most important annotation, thereby avoiding over-representation of less important annotations, the justification text is fed through a language model which selects a top annotation.
  • To determine whether these 16 annotations were a fully-encompassing selection, annotator-written custom one-word differences were collected; It was found that the vast majority of them matched the 16 annotations in some way, indicating that they were sufficiently optimal.
To collect annotations, human annotators were shown three prototypical images alongside another image in the same class from within the dataset. They then selected all appropriate annotations and provided the free-form entry annotations.
The researchers applied many models which were trained on ImageNet to this new dataset, finding some key points:
  • Regardless of any defining factors, all models roughly suffered the same deficiencies.
  • The difference in texture, subcategory, and occlusion make up the majority of mistakes.
  • Training data augmentation is the best method for improved model robustness - However, some common augmentation methods improve some factors while harming unrelated factors.

Find out more

The dataset is open-source, so as long as you have access to ImageNet, you can download the new ImageNetX annotations on top of it - head over to the GitHub repository for instructions. They've also created a Google Colab for quick setup.
You can also read the full paper for all the details, or head to the web page for some more info.
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.