Training 6B GPT-J 8bit for Dialogue
Training GPT-J 8bit quantized with LoRA for chatbot-esque dialog applications off of a "simple text script" a la ai-msgbot:
Created on September 6|Last edited on September 19
Comment
Links & References
- Each checkpoint was trained on a different dataset and compared below on a pure loss score to illustrate general convergence when tuning.
- Each checkpoint was initiated 'fresh' from the GPT-J vanilla model.
- The datasets are:
Training Loss & LR
Run set
4
Add a comment