Skip to main content
wandb-smle
Projects
weave-rag-lc-demo
Evaluations
Log in
Sign up
Overview
Traces
Evals
Playground
Monitors
Prompts
Models
Datasets
Scorers
Threads
More
Evaluations
Filter
inputs
output
CorrectnessLLMJudge
eval_retrieval
answer correct
first retrieval correct
Trace
Feedback
Status
model
self
stderr
true_count
true_fraction
true_count
true_fraction
Phi-1.5 Chat Model Eval
181d
👎 1
1
RagModel:v8
gen_eval_dataset-evaluation:v4
0.1006
14
0.5833
15
0.625
Gemini 1.5 Pro Chat Model Eval
2f74
👍 1
1
RagModel:v7
gen_eval_dataset-evaluation:v4
0.0884
18
0.75
15
0.625
Gemini 1.5 Flash Chat Model Eval
444b
🫀 1
👍 2
3
RagModel:v6
gen_eval_dataset-evaluation:v4
0.0761
20
0.8333
15
0.625
Evaluation.evaluate
81bf
RagModel:v5
gen_eval_dataset-evaluation:v3
N/A
15
0.625
15
0.625
Evaluation.evaluate
9213
👎 1
RagModel:v4
gen_eval_dataset-evaluation:v2
N/A
N/A
N/A
N/A
N/A
Phi-1.5 Chat Model Eval
96e5
RagModel:v3
gen_eval_dataset-evaluation:v1
N/A
14
0.5833
15
0.625
Phi-1.5 Chat Model Eval
6b8f
👎 1
1
RagModel:v2
gen_eval_dataset-evaluation:v0
N/A
24
1
15
0.625
GPT-3.5-Turbo Chat Model Eval
0553
RagModel:v1
gen_eval_dataset-evaluation:v1
N/A
17
0.7083
15
0.625
GPT-4 Chat Model Eval
f8c4
RagModel:v0
gen_eval_dataset-evaluation:v1
N/A
17
0.7083
15
0.625