Skip to main content
byyoung3
Projects
aime_evaluation
Evaluation-definitions
Log in
Sign up
Project
Traces
Evals
Playground
Monitors
Leaders
Threads
Assets
Assets
All Assets
Models
Datasets
Prompts
Scorers
Evaluations
Ops
Other
Evaluations
Evaluation
Category
User
Last updated
Versions
AIME_2024_HF-evaluation:v0
Evaluation
Brett Young
6 months ago
1 version
Dataset-v10-evaluation:v0
Evaluation
Brett Young
6 months ago
1 version
AIME_2024_v10-evaluation:v0
Evaluation
Brett Young
6 months ago
1 version
AIME_2024-evaluation:v0
Evaluation
Brett Young
6 months ago
1 version
qwen3-14b-openrouter-Evaluation:v1
Evaluation
Brett Young
6 months ago
2 versions
r1-free-Evaluation:v0
Evaluation
Brett Young
6 months ago
1 version
r1-distill-qwen-Evaluation:v0
Evaluation
Brett Young
6 months ago
1 version
gemini-2.0-flash-Evaluation:v1
Evaluation
Brett Young
7 months ago
2 versions
gemini-2.5-pro-exp-Evaluation:v2
Evaluation
Brett Young
7 months ago
3 versions
Claude-3.7-AIME-Evaluation:v6
Evaluation
Brett Young
8 months ago
7 versions
standard-Evaluation:v0
Evaluation
Brett Young
8 months ago
1 version
thinking_8k-Evaluation:v0
Evaluation
Brett Young
8 months ago
1 version
thinking_4k-Evaluation:v0
Evaluation
Brett Young
8 months ago
1 version
thinking_16k-Evaluation:v0
Evaluation
Brett Young
8 months ago
1 version
r1o3verifier-Evaluation:v3
Evaluation
Brett Young
8 months ago
4 versions
base-gpt4o-Evaluation:v0
Evaluation
Brett Young
9 months ago
1 version
budget-forcing-gpt4o-Evaluation:v0
Evaluation
Brett Young
9 months ago
1 version
finetuned-gpt4o-Evaluation:v1
Evaluation
Brett Young
9 months ago
2 versions