Skip to main content
wandbot
Projects
wandbot-eval-jp
Evaluations
Log in
Sign up
Overview
Models
Workspace
Runs
More
Weave
Traces
Evals
Playground
Monitors
Assets
More
Evaluations
Filter
inputs
output
get_answer_correctness
model_latency
model_output
answer_correctness
completion_tokens
prompt_tokens
Trace
Feedback
Status
model
self
true_count
true_fraction
mean
mean
mean
Evaluation.evaluate
0a90
EvaluatorModel:v24
Evaluation:v9
64
0.6598
76.933
789.0309
8251.866
Evaluation.evaluate
da34
EvaluatorModel:v24
Evaluation:v9
66
0.6804
76.5604
814.1134
8368.9691
Evaluation.evaluate
3694
77
0.7857
62.6091
829.7041
8306.7143
Evaluation.evaluate
6aa1
76
0.7755
64.0234
811.602
8406.6531
Evaluation.evaluate
c980
74
0.7551
64.493
812.7959
8402.9388
Evaluation.evaluate
f378
83
0.8469
64.5334
802.4082
8323.6327
Evaluation.evaluate
3285
69
0.7041
74.6639
827.2449
8425.4796
Evaluation.evaluate
6ed4
Evaluation:v9
57
0.5876
84.1702
835.8041
8441.7113
Evaluation.evaluate
177d
69
0.7188
131.8345
813.6146
8404.9063
Evaluation.evaluate
b351
54
0.5625
99.1182
808.8333
8366.5104
Evaluation.evaluate
7307
73
0.7449
90.5226
808.1633
8400.2449
Evaluation.evaluate
9893
80
0.8163
98.0373
814.4796
8391.7857