Skip to main content
wandb-japan
Projects
ichikara-test
Evaluations
Log in
Sign up
Overview
Models
Workspace
Runs
More
Weave
Traces
Evals
Playground
Monitors
Assets
More
Evaluations
Filter
inputs
output
model_latency
scores
domain_score
ビジネス
医療
教育
法律
Trace
Feedback
Status
model
self
mean
mean
mean
mean
mean
Evaluation.evaluate
2835
📝 1
1
LLMinvoke:v96
ichikara_human_eval:v1
0.2124
3.7333
3.55
3.9615
3.7368
Evaluation.evaluate
7131
📝 1
LLMinvoke:v95
ichikara_human_eval:v1
0.3216
3.7
3.4
3.7857
3.25
Evaluation.evaluate
4309
📝 1
LLMinvoke:v94
ichikara_human_eval:v1
0.1772
4.4
4.3
4.5357
4.05
Evaluation.evaluate
cc00
1
LLMinvoke:v87
test_20240905:v53
0.1959
3.6897
3.4
3.7857
3.25
Evaluation.evaluate
ed0c
1
LLMinvoke:v85
test_20240905:v53
0.2246
4.4138
4.3
4.5357
4.05
Evaluation.evaluate
8fc3
1
LLMinvoke:v84
test_20240905:v53
0.2062
3.7241
3.55
3.9615
3.7368
Evaluation.evaluate
87c0
🤖 1
2
LLMinvoke:v74
test_20240905:v24
19.6149
4.6
4.2
4.6786
4.2
Evaluation.evaluate
7ec9
🤖 1
1
LLMinvoke:v72
test_20240905:v24
6.6422
4.7333
4.6
4.9643
4.8
Evaluation.evaluate
d8aa
🤖 1
1
LLMinvoke:v71
test_20240905:v24
20.7779
4.1667
4.45
4.1429
3.95