Skip to main content
cornell-npg
Projects
random-rewards-reasoning
Log in
Sign up
Overview
Workspace
Runs
Automat.
Sweeps
Reports
Artifacts
Mingyuc's workspace
Personal workspace
Automated workspace
Changes are only visible to you.
Runs
19
Name
19 visualized
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-ppo-random
qwen2.5-7b-ppo-random
qwen2.5-7b-ppo-random
qwen2.5-7b-ppo-random
qwen2.5-7b-ppo-random
qwen2.5-7b-ppo-random
qwen2.5-7b-rebel-random
qwen2.5-7b-rebel-random
qwen2.5-7b-rebel-random
qwen2.5-7b-rebel-random
qwen2.5-7b-rebel-random
qwen2.5-7b-rebel-random
qwen2.5-7b-rebel-normal
qwen2.5-7b-rebel-normal
qwen2.5-7b-rebel-normal
qwen2.5-7b-rebel-normal
qwen2.5-7b-rebel-normal
qwen2.5-7b-rebel-normal
qwen2.5-7b-rloo-n-16-normal
qwen2.5-7b-rloo-n-16-normal
qwen2.5-7b-rloo-n-16-normal
qwen2.5-7b-rloo-n-16-normal
qwen2.5-7b-rloo-n-16-normal
qwen2.5-7b-rloo-n-16-normal
1-19
of 19
Settings
Add panels
actor
8
1-6 of 8
actor/ppo_kl
actor/ppo_kl
Showing first 10 runs
500
1k
1.5k
Step
-0.04
-0.02
0
0.02
0.04
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-ppo-random
qwen2.5-7b-ppo-random
qwen2.5-7b-ppo-random
actor/pg_loss
actor/pg_loss
Showing first 10 runs
500
1k
1.5k
Step
-0.5
0
0.5
1
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-ppo-random
qwen2.5-7b-ppo-random
qwen2.5-7b-ppo-random
actor/pg_clipfrac
actor/pg_clipfrac
Showing first 10 runs
500
1k
1.5k
Step
0
0.01
0.02
0.03
0.04
0.05
0.06
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-ppo-random
qwen2.5-7b-ppo-random
qwen2.5-7b-ppo-random
actor/lr
actor/lr
Showing first 10 runs
500
1k
1.5k
Step
0
5e-8
1e-7
1.5e-7
2e-7
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-grpo-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-ppo-random
qwen2.5-7b-ppo-random
qwen2.5-7b-ppo-random
actor/kl_loss
actor/kl_loss
200
400
600
800
1k
1.2k
1.4k
Step
1
2
3
4
5
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-normal
qwen2.5-7b-rloo-n-16-normal
qwen2.5-7b-rloo-n-16-normal
actor/kl_coef
actor/kl_coef
200
400
600
800
1k
1.2k
1.4k
Step
-2
-1
0
1
2
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-random
qwen2.5-7b-rloo-n-16-normal
qwen2.5-7b-rloo-n-16-normal
qwen2.5-7b-rloo-n-16-normal
critic
33
1-6 of 33
global_seqlen
6
1-6 of 6
mfu
2
prompt_length
4
1-4 of 4
response_length
4
1-4 of 4
timing_per_token_ms
6
1-6 of 6
timing_s
8
1-6 of 8
val
168
1-6 of 168
System
21
1-6 of 21
Add section