GPT-J Training
Created on July 26|Last edited on September 5
Comment
Notes:
First full training run:
- warm-up: 1.8k steps
- total steps: 37.3 steps
- lr: 2.5e-5
Resumed from 28000 steps
- lr: 1e-5
- warm-up: 100 steps
prosecraft_ft
- Full train dataset, using create_finetune_recoreds.py script: includes shuffling
- lr: 1e-5
- warm-up steps: 300
- Total steps: 43195 (crashed at 3k steps)
prosecraft_ft_resumed
- Full train dataset, using create_finetune_recoreds.py script: includes shuffling
- lr: 1e-5
- warm-up steps: 300
- Total steps: 20k
prosecraft_resumed_ft2
- Full train dataset, using create_finetune_recoreds.py script: includes shuffling
- Previous fine-tuning steps: 3k + 20k = 23k
- lr: 1e-5, end 1e-8
- warm-up steps: 75
- Total steps: 43195
prosecraft_linear
- Full train dataset, using create_finetune_recoreds.py script: includes shuffling
- Fine-tuning from initial GPT-J checkpoint
- linear warm-up, linear anneal
- lr: 2e-5, end 2e-8
- warm-up steps: 200 (0.46%)
- Total steps: 43195
Training Loss
Run set
7
Samples Dataset
Samples training run1:
- warm-up: 100 steps
- total steps: 390
- lr : 5e-5
Samples training run2:
- warm-up: 390 steps
- total steps: 3900
- lr : 1e-5
Samples batch size 16
- warm-up: 390 steps
- total steps: 7800
- lr : 1e-5
Run set
5
Prompt Generations
Post Training Prompt Generations: prosecraft_ft_resumed generation testing (fine-tuned for 23k steps)
Run: prosecraft_ft_resumed_slim_20001
5
In-training Prompt Generations: prosecraft_resumed_ft2 (starting with 23k steps pre-trained) prompt generations
Run set
1
In-training Prompt Generations: prosecraft_linear * prompt generations
*starting with initial GPT-J checkpoint, linear warmup, linear anneal
Run set
1
Add a comment