Skip to main content
costa-huang
Projects
cleanRL
Reports
Cleanba impala threads ASAP grad norm
Log in
Sign up
Share
Comment
Star
Share
Comment
Star
Cleanba impala threads ASAP grad norm
Costa
Created on April 9
|
Last edited on April 9
Comment
charts/SPS_update
charts/SPS_update
5M
10M
15M
20M
25M
global_step
5000
10000
15000
20000
losses/entropy
losses/entropy
5M
10M
15M
20M
25M
global_step
-3000
-2500
-2000
-1500
-1000
-500
0
stats/param_norm
stats/param_norm
5M
10M
15M
20M
25M
global_step
32
34
36
38
stats/param_updates_norm
stats/param_updates_norm
5M
10M
15M
20M
25M
global_step
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
losses/value_loss
losses/value_loss
5M
10M
15M
20M
25M
global_step
2e+18
4e+18
6e+18
8e+18
1e+19
1.2e+19
1.4e+19
losses/policy_loss
losses/policy_loss
5M
10M
15M
20M
25M
global_step
0
5000
10000
15000
20000
25000
30000
charts/avg_episodic_return
charts/avg_episodic_return
5M
10M
15M
20M
25M
global_step
0
100
200
300
charts/avg_episodic_return
charts/avg_episodic_return
5
10
15
20
Time (minutes)
0
100
200
300
baseline (RTX 3060 TI)
1
actor threads (A100)
1
actor threads asap (A100) prime candidate
1
A100
1
cleanba impala 8 A100 baseline (a0_l1_d4)
1
actor threads asap no update A100
1
actor threads asap (A100) prime candidate iteration
1
actor threads asap (A100) update policy mid rollout
1
cleanba baseline
1
Add a comment