Skip to main content

ImageNetV2 Results

Created on November 6|Last edited on December 7


Tabs are for per class differences and differences over the entire dataset. Click on the run to see the images and all the hypotheses. The results in the paper are using AUROC ranking but we forgot to visualize those runs (whoops), so we also include runs using a slightly different difference score metric which do contain visualizations :)

The scoring metric for 'Per Class with visualizations' gives each image a 0/1 label per hypothesis by thresholding the clip score. For example, if an image has a clip cosine similarity with 'toaser oven' over 0.3, it is counted as containing the hypothesis. The final score is (# positive examples in A)/(# samples in A) - (# positive samples in B)/(# samples in B).

ImNtV2 VS ImNt
1
Per Class AUROC Ranking
1000
Per Class with visualizations
997