Kastan's group workspace
Aug-05__13:58
What makes this group special?
Tags
gpt2_PP2_TP4_2d
Notes
Tags
Aug-05__13:58
BATCH_SIZE32
MICRO_BATCH_SIZE=4
NUM_EPOCHS=20
NUM_MICRO_BATCHES=8
PP=2
SLURM=513923
TP=4
WORLD_SIZE=16
Author
State
Crashed
Start time
August 5th, 2022 6:59:45 PM
Runtime
28m 3s
Tracked hours
-
Run path
kastan/LLM-Distributed-Quantization/3nsgkzdv
OS
Linux-4.18.0-305.49.1.el8_4.x86_64-x86_64-with-glibc2.28
Python version
3.9.12
Command
/u/kastanday/LLM-Distributed-Quantization/benchmarks/gpt/v2_train.py --config /u/kastanday/LLM-Distributed-Quantization/benchmarks/gpt/configs/gpt2_PP2_TP4_2d.py --host gpub007 --port 29500 --world_size 16 --rank 6
System Hardware
| CPU count | 64 |
| GPU count | 4 |
| GPU type | NVIDIA A40 |
W&B CLI Version
0.13.0
Group
Aug-05__13:58Config
Config parameters are your model's inputs. Learn more
- {} 26 keys▶
- 32
- 1
- "col_ai_quant"
- "/u/kastanday/LLM-Distributed-Quantization/datasets/small-gpt-dataset.json"
- {} 1 key▶
- "AMP_TYPE.NAIVE"
- "titans.model.gpt.gpt.gpt2_medium"
- 1
- 0.00015
- {} 1 key▶
- "titans.loss.lm_loss.gpt_lmloss.GPTLMLoss"
- 4
- {} 4 keys▶
- true
- "torch.float16"
- 1,024
- 50,304
- 20
- "4"
- 8
- {} 2 keys▶
- 0.00015
- 0.01
- {} 2 keys▶
- 2
- {} 2 keys▶
- "2d"
- 4
- 2
- {} 4 keys▶
- 8
- true
- [] 3 items▶
- 4
- 1,024
- 3,072
- "colossalai.engine.schedule._pipeline_schedule.PipelineSchedule"
- 1,024
- "2d"
- 4
- 32
- "16"
- 50,304
- 1
- 0.01
Summary
Summary metrics are your model's outputs. Learn more
No summary metrics saved for this run.
Check the summary metrics documentation for more information.
Artifact Outputs
This run produced these artifacts as outputs. Total: 1. Learn more
Type
Name
Consumer count
Loading...