Skip to main content

GPT-J Training

Created on July 26|Last edited on September 5

Notes:

First full training run:
  • warm-up: 1.8k steps
  • total steps: 37.3 steps
  • lr: 2.5e-5
Resumed from 28000 steps
  • lr: 1e-5
  • warm-up: 100 steps

prosecraft_ft
  • Full train dataset, using create_finetune_recoreds.py script: includes shuffling
  • lr: 1e-5
  • warm-up steps: 300
  • Total steps: 43195 (crashed at 3k steps)

prosecraft_ft_resumed
  • Full train dataset, using create_finetune_recoreds.py script: includes shuffling
  • lr: 1e-5
  • warm-up steps: 300
  • Total steps: 20k

prosecraft_resumed_ft2
  • Full train dataset, using create_finetune_recoreds.py script: includes shuffling
  • Previous fine-tuning steps: 3k + 20k = 23k
  • lr: 1e-5, end 1e-8
  • warm-up steps: 75
  • Total steps: 43195
prosecraft_linear
  • Full train dataset, using create_finetune_recoreds.py script: includes shuffling
  • Fine-tuning from initial GPT-J checkpoint
  • linear warm-up, linear anneal
  • lr: 2e-5, end 2e-8
  • warm-up steps: 200 (0.46%)
  • Total steps: 43195

Training Loss


Run set
7



Samples Dataset

Samples training run1:
  • warm-up: 100 steps
  • total steps: 390
  • lr : 5e-5
Samples training run2:
  • warm-up: 390 steps
  • total steps: 3900
  • lr : 1e-5
Samples batch size 16
  • warm-up: 390 steps
  • total steps: 7800
  • lr : 1e-5


Run set
5



Prompt Generations

Post Training Prompt Generations: prosecraft_ft_resumed generation testing (fine-tuned for 23k steps)


Run: prosecraft_ft_resumed_slim_20001
5


In-training Prompt Generations: prosecraft_resumed_ft2 (starting with 23k steps pre-trained) prompt generations


Run set
1



In-training Prompt Generations: prosecraft_linear * prompt generations

*starting with initial GPT-J checkpoint, linear warmup, linear anneal

Run set
1