Skip to main content
capecape
Projects
hack-starter-v2
Evaluations
Log in
Sign up
Overview
Traces
Evals
Playground
Monitors
Leaders
Threads
Assets
Evaluations
Filter
inputs
output
model_latency
model_output
solution
ran
test_sample
Trace
Feedback
Status
model
self
mean
true_count
true_fraction
true_count
true_fraction
Together-FP8
74e4
solve_one:v4
Evaluation:v10
83.8589
3
0.75
0
0
Cerebras- 70b
3560
solve_one:v3
Evaluation:v10
31.1777
4
1
1
0.25
Octo
a630
solve_one:v3
Evaluation:v9
76.8515
4
1
1
0.25
Fireworks-70b
bf51
solve_one:v3
Evaluation:v10
83.0282
3
0.75
0
0
Groq-70b
a992
solve_one:v3
Evaluation:v10
132.7512
2
1
1
0.5
GPT4o-mini
c11a
solve_one:v3
Evaluation:v10
138.0163
3
0.75
1
0.25
GPT4o
efe0
solve_one:v3
Evaluation:v10
136.2726
3
0.75
1
0.25