Skip to main content
costa-huang
Projects
cleanRL
Reports
sample temperature tldr
Log in
Sign up
Share
Comment
Star
Share
Comment
Star
sample temperature tldr
Costa
Created on September 26
|
Last edited on September 27
Comment
diff only
train_policy_accelerate_summarize__1__1695843386
train_policy_accelerate_summarize__1__1695739148
train_policy_accelerate_summarize__1__1695737228
meta
runtime
runtime
1h 19m 31s
23h 57m 19s
23h 58m 19s
config
ppo
(1 collapsed)
task
temperature
temperature
1
1
0.7
summary
(7 collapsed)
ppo/episode
ppo/episode
200
400
600
800
1k
1.2k
1.4k
global_step
200000
400000
600000
objective/kl
objective/kl
200
400
600
800
1k
1.2k
1.4k
global_step
0
5
10
15
20
25
objective/score_total
objective/score_total
200
400
600
800
1k
1.2k
1.4k
global_step
0
1
2
3
4
objective/scores
objective/scores
200
400
600
800
1k
1.2k
1.4k
global_step
0
1
2
3
4
5
objective/kl_coef
objective/kl_coef
200
400
600
800
1k
1.2k
1.4k
global_step
0
0.02
0.04
0.06
0.08
0.1
Run set
3
Run set 2
0
Add a comment