Skip to main content
wandbot
Projects
wandbot-eval2
Evaluations
Log in
Sign up
Overview
Models
Workspace
Runs
More
Weave
Traces
Evals
Playground
Monitors
Threads
More
Evaluations
Filter
inputs
output
get_answer_correctness
model_latency
model_output
answer_correctness
answer_in_context
Trace
Feedback
Status
model
self
true_count
true_fraction
mean
true_count
true_fraction
Evaluation.evaluate
8942
EvaluatorModel:v12
Evaluation:v5
2
1
117.6286
N/A
N/A
Evaluation.evaluate
22a8
EvaluatorModel:v12
Evaluation:v5
1
1
121.0161
N/A
N/A
Evaluation.evaluate
2de5
EvaluatorModel:v12
Evaluation:v5
N/A
N/A
N/A
N/A
N/A
Evaluation.evaluate
bbeb
EvaluatorModel:v12
Evaluation:v4
76
0.8
37.6104
N/A
N/A
Evaluation.evaluate
7961
EvaluatorModel:v12
Evaluation:v3
2
1
42.7512
N/A
N/A
Evaluation.evaluate
3d21
EvaluatorModel:v12
Evaluation:v4
N/A
N/A
N/A
N/A
N/A
Evaluation.evaluate
03d8
EvaluatorModel:v12
Evaluation:v4
N/A
N/A
N/A
N/A
N/A
Evaluation.evaluate
de51
EvaluatorModel:v12
Evaluation:v4
N/A
N/A
N/A
N/A
N/A
top_k@10|threshold@0.5
9c41
EvaluatorModel:v11
Evaluation:v4
18
0.2195
7.504
82
1
top_k@10|threshold@0.3
49d3
EvaluatorModel:v10
Evaluation:v4
13
0.1566
7.9207
83
1
top_k@15|threshold@0.3
0bcb
EvaluatorModel:v9
Evaluation:v4
14
0.1772
8.6152
79
1
default-top_k@15
0de9
EvaluatorModel:v8
Evaluation:v4
11
0.131
8.3827
84
1
default-top_k@5
aa5f
EvaluatorModel:v7
Evaluation:v4
14
0.1818
8.54
77
1
with_few-shot-examples
666d
EvaluatorModel:v6
Evaluation:v4
11
0.1196
15.6598
92
1