Skip to main content

Reward Ablation

Created on October 6|Last edited on October 7
Insight:
  1. Freeze 70 percent of layers seem to improve performance
  2. Sigmoid and cross entropy loss makes no difference (there is probably something about the formula itself)


05001k1.5k2kStep0.40.60.8
05001k1.5k2kStep0.480.50.520.540.560.580.6
05001k1.5k2kStep0.511.52
no-deepspeed
3
freeze70percent
3
deepspeed
3
sigmoid freeze70percent
3