Training 6B GPT-J 8bit for Dialogue

Training GPT-J 8bit quantized with LoRA for chatbot-esque dialog applications off of a "simple text script" a la ai-msgbot:

Peter Szemraj

Created on September 6|Last edited on September 19

Comment

﻿
﻿
Links & References﻿
﻿LoRA: Low-Rank Adaptation of Large Language Models makes this possible
General concepts and code related to this idea are in the ai-msgbot repository﻿
Each checkpoint was trained on a different dataset and compared below on a pure loss score to illustrate general convergence when tuning. 
Each checkpoint was initiated 'fresh' from the GPT-J vanilla model.
The datasets are:
﻿Daily Dialogues﻿
﻿Wizard of Wikipedia: Knowledge-Powered Conversational agents﻿
~100,000 WhatsApp/imessages (GPT-Peter)﻿﻿
Try the demo notebook of the two fine-tuned models (daily dialogues & Wizard of Wikipedia)
GPT-Peter is not present in the demo, but some examples are here﻿
﻿
﻿
﻿
﻿
﻿
Training Loss & LR﻿
Run set4
﻿
﻿

Add a comment