Kastan's group workspace
Aug-05__19:41
What makes this group special?
Tags
Notes
Tags
Aug-05__19:41
BATCH_SIZE1280
NUM_EPOCHS=60
NUM_MICRO_BATCHES=8
SLURM=513928
TP=16
WORLD_SIZE=32
Author
State
Crashed
Start time
August 6th, 2022 12:43:08 AM
Runtime
27m 52s
Tracked hours
-
Run path
kastan/LLM-Distributed-Quantization/xahx7ef3
OS
Linux-4.18.0-305.49.1.el8_4.x86_64-x86_64-with-glibc2.28
Python version
3.9.12
Command
/u/kastanday/LLM-Distributed-Quantization/benchmarks/gpt/v2_train.py --config /u/kastanday/LLM-Distributed-Quantization/benchmarks/gpt/configs/gpt2_8b_2p5d_256.py --host gpua010 --port 29500 --world_size 32 --rank 25
System Hardware
| CPU count | 64 |
| GPU count | 4 |
| GPU type | NVIDIA A100-SXM4-40GB |
W&B CLI Version
0.13.0
Group
Aug-05__19:41Config
Config parameters are your model's inputs. Learn more
- {} 25 keys▶
- 1,280
- 1
- "col_ai_quant"
- "/u/kastanday/LLM-Distributed-Quantization/datasets/small-gpt-dataset.json"
- {} 1 key▶
- "AMP_TYPE.NAIVE"
- "titans.model.gpt.gpt.gpt2_8B"
- "titans.model.gpt.gpt.gpt2_xl"
- 1
- 0.00015
- "./gpt2_2.5d_tp16_bs1280_lr0.00015_accum1_clip_grad1.0/"
- {} 1 key▶
- "titans.loss.lm_loss.gpt_lmloss.GPTLMLoss"
- {} 5 keys▶
- 60
- "4"
- 8
- {} 2 keys▶
- 0.00015
- 0.01
- {} 2 keys▶
- 2
- {} 3 keys▶
- 1
- "2.5d"
- 16
- 1,024
- "2.5d"
- 16
- 1,280
- "32"
- 50,304
- 21
- 0.01
Summary
Summary metrics are your model's outputs. Learn more
No summary metrics saved for this run.
Check the summary metrics documentation for more information.
Artifact Outputs
This run produced these artifacts as outputs. Total: 1. Learn more
Type
Name
Consumer count
Loading...