Skip to main content
wandb-japan
Projects
fc-agent-dev
Evaluations
Log in
Sign up
Overview
Traces
Evals
Playground
Monitors
Leaders
Threads
Assets
Evaluations
Filter
inputs
output
success
tool_use
Trace
Feedback
Status
model
self
overall_score
success_score
true_count
true_fraction
tool_use_score
Test2_Tool_Use_v4
3c94
wandb_fc_agent:v0
Evaluation:v2
0.7833
N/A
47
0.7833
0.9
Test2_Tool_Use_v3
9a6b
wandb_fc_agent:v0
Evaluation:v2
0.5
N/A
30
0.5
0.8667
Test2_Tool_Use_v0
b318
wandb_fc_agent:v0
Evaluation:v1
0.2833
N/A
17
0.2833
0.3333
Test1_ReportValidation
9f8e
WandBReportTranslator:v0
evaluation-report-list-v0-evaluation:v0
N/A
0.24
12
0.24
N/A
Test2_Tool_Use_v1
23f9
wandb_fc_agent:v0
Evaluation:v1
0.6333
N/A
38
0.6333
0.8667