Skip to main content

pipeline correct reward index

Created on December 7|Last edited on December 31

5001k1.5k2kStep-0.500.511.52
5001k1.5k2kStep-0.8-0.6-0.4-0.200.20.4
5001k1.5k2kStep0.9980.99911.001
5001k1.5k2kStep0.0010.0020.003
5001k1.5k2kStep-0.8-0.6-0.4-0.200.20.4
5001k1.5k2kStep02468
gpt2
1
gpt2-xl
1
original
5
gpt2 fix value
2
fix value with masked mean
1
pipeline with white spaces
1