Skip to main content

Crysty Novel instructionFT Stage report

Train on 1 A100 40GB
Created on August 14|Last edited on August 14

Training conclusion



Larger dataset (30MB to 60MB) gives this model more robust to write the novel part in EPOCH 10, we got a far more better result,story consistence is still not ideal,this might caused by RWKV_World model only have 4096 ctx-len.we will try to train a new lora on CTX_LEN=128K model to figure out the influence of CTX_LEN afterwards.

5001k1.5k2k2.5kStep1.41.61.822.22.42.6
05001k1.5k2k2.5kStep0.0000580.000060.0000620.0000640.0000660.000068
05001k1.5k2k2.5kStep00.0050.010.0150.02
Run set
2