Preetham-gali's group workspace
hSY45YakxtAmpHucKxDCZz_19ewbxaz
What makes this group special?
Tags
shiv-0-0
Notes
Author
State
Finished
Start time
August 17th, 2021 7:56:55 AM
Runtime
3h 52m 58s
Tracked hours
3h 52m 40s
Run path
eleutherai/distilling/1tikf8xm
OS
Linux-5.8.0-59-generic-x86_64-with-glibc2.29
Python version
3.8.5
Git repository
git clone https://github.com/EleutherAI/gpt-neox.git
Git state
git checkout -b "shiv-0-0" ecc3c2327d2fd053cc67c2cc656e1d847db2f797
Command
pretrain_gpt2.py --local_rank=0 --num_gpus 4 --deepspeed_config "{\"train_batch_size\": 540, \"train_micro_batch_size_per_gpu\": 5, \"gradient_accumulation_steps\": 27, \"optimizer\": {\"type\": \"Adam\", \"params\": {\"lr\": 0.0003, \"betas\": [0.9, 0.999], \"eps\": 1e-08}}, \"fp16\": {\"fp16\": true, \"enabled\": true, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"hysteresis\": 2, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"stage\": 1, \"allgather_partitions\": true, \"allgather_bucket_size\": 500000000, \"overlap_comm\": true, \"reduce_scatter\": true, \"reduce_bucket_size\": 500000000, \"contiguous_gradients\": true, \"cpu_offload\": false}, \"wall_clock_breakdown\": true}" --megatron_config "{\"num_gpus\": 4, \"train_batch_size\": 540, \"train_micro_batch_size_per_gpu\": 5, \"gradient_accumulation_steps\": 27, \"optimizer\": {\"type\": \"Adam\", \"params\": {\"lr\": 0.0003, \"betas\": [0.9, 0.999], \"eps\": 1e-08}}, \"fp16\": {\"fp16\": true, \"enabled\": true, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"hysteresis\": 2, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"stage\": 1, \"allgather_partitions\": true, \"allgather_bucket_size\": 500000000, \"overlap_comm\": true, \"reduce_scatter\": true, \"reduce_bucket_size\": 500000000, \"contiguous_gradients\": true, \"cpu_offload\": false}, \"wall_clock_breakdown\": true, \"precision\": \"fp16\", \"num_layers\": 24, \"hidden_size\": 1024, \"num_attention_heads\": 8, \"seq_length\": 2048, \"max_position_embeddings\": 2048, \"pos_emb\": \"rotary\", \"no_weight_tying\": true, \"attention_config\": [\"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\"], \"sparsity_config\": {}, \"lr_decay_style\": \"cosine\", \"lr_decay_iters\": 320000, \"optimizer_type\": \"Adam\", \"zero_stage\": 1, \"zero_reduce_scatter\": true, \"zero_contiguous_gradients\": true, \"zero_reduce_bucket_size\": 500000000, \"zero_allgather_bucket_size\": 500000000, \"lr\": 0.0003, \"data_path\": \"/mnt/ssd-1/data/pile/pile_text_document\", \"data_impl\": \"mmap\", \"save\": \"checkpoints/medium-training/1-1-0\", \"save_interval\": 10000, \"finetune\": true, \"batch_size\": 5, \"train_iters\": 320000, \"eval_iters\": 10, \"keep_last_n_checkpoints\": 4, \"split\": \"949,50,1\", \"vocab_file\": \"/mnt/ssd-1/data/gpt2-vocab.json\", \"merge_file\": \"/mnt/ssd-1/data/gpt2-merges.txt\", \"attention_dropout\": 0, \"hidden_dropout\": 0, \"weight_decay\": 0, \"checkpoint_activations\": true, \"synchronize_each_layer\": true, \"partition_activations\": true, \"gas\": 27, \"clip_grad\": 1.0, \"dynamic_loss_scale\": true, \"pipe_parallel_size\": 1, \"is_pipe_parallel\": true, \"use_wandb\": true, \"wandb_group\": \"hSY45YakxtAmpHucKxDCZz_19ewbxaz\", \"wandb_team\": \"eleutherai\", \"wandb_project\": \"distilling\", \"log_dir\": \"logs\", \"log_interval\": 100, \"user_script\": \"pretrain_gpt2.py\"}"
System Hardware
CPU count | 128 |
GPU count | 4 |
GPU type | NVIDIA A100-SXM4-40GB |
W&B CLI Version
0.10.28
Config
Config parameters are your model's inputs. Learn more
- {} 186 keys▶
- "gelu"
- false
- 1,000
- 0
- 0
- 0
- null
- false
- false
- [] 24 items▶
- 0
- false
- 5
- false
- false
- true
- false
- 1
- false
- 1
- false
- "mmap"
- "/mnt/ssd-1/data/pile/pile_text_document"
- false
- null
- true
- true
- false
- false
- "nccl"
- false
- null
- null
- null
- false
- true
- false
- 1,000
- 10
- ""
- null
- null
- null
- true
- null
- {} 6 keys▶
- {} 8 keys▶
- 500,000,000
- true
- 1
46 ... 95▶▶96 ... 145▶▶146 ... 181▶▶
Summary
Summary metrics are your model's outputs. Learn more
- {} 29 keys▶
- 109,629,937,622,472.44
- 13.749365980625152
- 39.27453824132242
- 0.5195140838623047
- 0.1118183135986328
- 404.874324798584
- 0.8051395416259766
- 102,997.76983261108
- 4.068851470947266
- 11.81316375732422
- 102,967.50450134276
- 102,975.14224052428
- 103,004.50539588928
- 280.1547050476074
- 359.0736389160156
- 29,366.528511047363
- 74.8878157428386
- 0.2610759490516577
- 21.351871790196995
- 0.2958599806630684
- 307.31940269470215
- 0.4107952117919922
- 406.91423416137695
- 0
- 0.0000928125
- 3.6650736331939697
- 32,768
- 3.665481567382813
- 39.07494885379973
Artifact Outputs
This run produced these artifacts as outputs. Total: 1. Learn more
Type
Name
Consumer count
Loading...