Skip to main content

summarize

Created on September 23|Last edited on September 24

meta
3h 26m 59s
23h 47m 16s
config
ppo
0.00001
0.000015
rewards
0.15
0.05
summary
objective
64.5
0.16016
5.1875
22.875
0.079849
0.05
-0.44141
-1.14062
-0.012695
-2.14062
0.42773
-1
ppo
loss
-0.10303
0.18457
-0.0079427
-0.000057176
0.13086
0.18555
2.34375
0.011658
1.90524
0.010925
policy
0.0045166
6.8394e-9
0.0023585
0.0000044663
0.033854
0
0.013234
0.00008138
1.32813
0.0052185
1.34928
0.0051395
returns
1.94531
-0.89063
5001k1.5kStep-1-0.8-0.6-0.4-0.200.20.4
5001k1.5kStep0102030405060
5001k1.5kStep0.060.080.10.120.14
Run set
1
Run set 2
1
reward model
1