Skip to main content
byyoung3
Projects
aime_evaluation
Assets
Log in
Sign up
Project
Traces
Evals
Playground
Monitors
Leaders
Threads
Assets
Assets
All Assets
Models
Datasets
Prompts
Scorers
Evaluations
Ops
Other
All assets
Asset
Category
User
Last updated
Versions
correctness:v0
Scorer
Brett Young
6 months ago
1 version
Qwen3_14B_OpenRouter_Model:v9
Model
Brett Young
6 months ago
10 versions
AIME_2024_HF-evaluation:v0
Evaluation
Brett Young
6 months ago
1 version
qwen3_14b_openrouter:v0
Model
Brett Young
6 months ago
1 version
AIME_2024_HF:v0
Dataset
Brett Young
6 months ago
1 version
gpt4o_correctness:v0
Scorer
Brett Young
6 months ago
1 version
gpt4o_scorer_correctness:v0
Scorer
Brett Young
6 months ago
1 version
Dataset-v10-evaluation:v0
Evaluation
Brett Young
6 months ago
1 version
Dataset-v10:v0
Dataset
Brett Young
6 months ago
1 version
AIME_2024_v10-evaluation:v0
Evaluation
Brett Young
6 months ago
1 version
AIME_2024_v10:v0
Dataset
Brett Young
6 months ago
1 version
AIME_2024-evaluation:v0
Evaluation
Brett Young
6 months ago
1 version
AIME_2024:v0
Dataset
Brett Young
6 months ago
1 version
qwen3-14b-openrouter-Evaluation:v1
Evaluation
Brett Young
6 months ago
2 versions
Dataset:v10
Dataset
Brett Young
6 months ago
11 versions
r1-free-Evaluation:v0
Evaluation
Brett Young
6 months ago
1 version
R1FreeModel:v0
Model
Brett Young
6 months ago
1 version
r1-distill-qwen-Evaluation:v0
Evaluation
Brett Young
6 months ago
1 version