Kastan's group workspace
Aug-05__12:37
What makes this group special?
Tags
q_allBF16_gpt_8B_PP2_TP8_3d
Notes
Tags
Aug-05__12:37
BATCH_SIZE16
MICRO_BATCH_SIZE=4
NUM_EPOCHS=3
NUM_MICRO_BATCHES=16
PP=2
SLURM=513717
TP=8
WORLD_SIZE=64
Author
State
Crashed
Start time
August 5th, 2022 5:38:42 PM
Runtime
12s
Tracked hours
10s
Run path
kastan/LLM-Distributed-Quantization/uc90j7ew
OS
Linux-4.18.0-305.49.1.el8_4.x86_64-x86_64-with-glibc2.28
Python version
3.9.12
Command
/u/kastanday/LLM-Distributed-Quantization/benchmarks/gpt/v2_train.py --config /u/kastanday/LLM-Distributed-Quantization/benchmarks/gpt/configs/q_allBF16_gpt_8B_PP2_TP8_3d.py --host gpub036 --port 29500 --world_size 64 --rank 36
System Hardware
| CPU count | 64 |
| GPU count | 4 |
| GPU type | NVIDIA A40 |
W&B CLI Version
0.13.0
Group
Aug-05__12:37Config
Config parameters are your model's inputs. Learn more
- {} 29 keys▶
- 16
- 1
- "col_ai_quant"
- "/u/kastanday/LLM-Distributed-Quantization/datasets/small-gpt-dataset.json"
- {} 1 key▶
- "AMP_TYPE.NAIVE"
- 4
- 0.00015
- "./quant_gpt2_3d_tp8_bs16_lr0.00015/"
- {} 1 key▶
- "titans.loss.lm_loss.gpt_lmloss.GPTLMLoss"
- 4
- {} 7 keys▶
- {} 4 keys▶
- "torch.float16"
- "torch.float16"
- "torch.float16"
- "torch.float16"
- 3
- "4"
- 16
- {} 2 keys▶
- 0.00015
- 0.01
- {} 2 keys▶
- 2
- {} 2 keys▶
- "3d"
- 8
- 2
- "titans.model.quant_gpt.quant_gpt.quant_gpt2_8B"
- "titans.model.quant_gpt.quant_gpt.quant_gpt2_xl"
- {} 4 keys▶
- 16
- true
- [] 3 items▶
- 4
- 1,024
- 3,072
- "colossalai.engine.schedule._pipeline_schedule.PipelineSchedule"
- 1,024
- "3d"
- 8
- 64
- "64"
- 50,304
- 1
- 0.01