Embeddings - Retrieval

Collection of learnings about embedding models, emphasizing retrieval tasks.
Created on September 24|Last edited on October 4
Comment
﻿
﻿
MetricsPlots of the primary metrics for models averaged across retrieval tasks. Black bars are supposed to show std dev, but seem to show max and min across all tasks.
﻿
﻿
Mean map_at_1 -- All Tasks
Mean map_at_1 -- All Tasks
05101520map_at_10.20.40.6Model
Mean test mrr_at_1 -- All Tasks
Mean test mrr_at_1 -- All Tasks
05101520mrr_at_10.20.40.60.8Model
Run set80
﻿
    In these benchmarks, we run the following MTEB tasks:
    - ClimateFEVER
    - CQADupstackAndroidRetrieval
    - CQADupstackEnglishRetrieval
    - CQADupstackGamingRetrieval
    - CQADupstackGisRetrieval
    - CQADupstackMathematicaRetrieval
    - CQADupstackPhysicsRetrieval
    - CQADupstackProgrammersRetrieval
    - CQADupstackStatsRetrieval
    - CQADupstackTexRetrieval
    - CQADupstackUnixRetrieval
    - CQADupstackWebmasters- ClimateFEVER
    - CQADupstackAndroidRetrieval
    - CQADupstackEnglishRetrieval
    - CQADupstackGamingRetrieval
    - CQADupstackGisRetrieval
    - CQADupstackMathematicaRetrieval
    - CQADupstackPhysicsRetrieval
    - CQADupstackProgrammersRetrieval
    - CQADupstackStatsRetrieval
    - CQADupstackTexRetrieval
    - CQADupstackUnixRetrieval
    - CQADupstackWebmastersRetrieval
    - CQADupstackWordpressRetrieval
    - DBPedia
    - FEVER
    - FiQA2018
    - HotpotQA
    - MSMARCO
    - NFCorpus
    - NQ
    - QuoraRetrieval
    - SCIDOCS
    - SciFact
    - Touche2020
    - TRECCOVID
These are listed as the English retrieval tasks here. ﻿﻿
Note that the MTEB leaderboard only uses a subset of these tasks for benchmarking retrieval, however.﻿﻿
﻿
The instructor-xl and e5-large-v2 models took at least 3 days to run mteb retrieval on 1 GPU. Cancelled them.
﻿
﻿
Add a comment