wandbot v1.3 vs 1.2 Debugging Eval Acccuray
Created on January 6|Last edited on January 20
Comment
Baseline v1.2
Correctness: ~72-73%
Commit: 8ec557ee727162da52375eff1120bab3975d984e
Baseline v1.2 again
- Ensuring that chroma_index:v34 was being used
v1.3
Debugging outputs, step by step
Jan 6th, 2024
Jan 12th, 2024 - Is the e2b retriever repro 3.10 consistent? - Yes
Jan 14th, 2024 - Does the e2b retriever repro 3.10 match the 3.10 baseline eval? - No
Jan 19th, 2024 - Is it because a different chroma index was being used? - No, we have a context duplication problem (aka lack of de-dup)
Jan 20th, 2024 - Find missing contexts from golden answer, did they get retrieved?
- For 1 single query, looking at baseline retrieved indexes to try reproduce it:
- query: concurrent writes
-
Add a comment