Igoro's group workspace
Re
What makes this group special?
Tags
Notes
Author
State
Finished
Start time
January 28th, 2022 6:55:52 PM
Runtime
2d 20h 58m 44s
Tracked hours
-
Run path
eleutherai/gpt-thicc/2kpyvxx2
OS
Linux-5.11.0-34-generic-x86_64-with-glibc2.29
Python version
3.8.10
Git repository
git clone https://github.com/EleutherAI/gpt-neox.git
Git state
git checkout -b "new-9-0" 49e60fe7ad14f6991a7fa678d3a0c330d09b9ff4
Command
train.py --local_rank=0 --deepspeed_config "{\"train_batch_size\": 1536, \"train_micro_batch_size_per_gpu\": 4, \"gradient_accumulation_steps\": 32, \"optimizer\": {\"type\": \"Adam\", \"params\": {\"lr\": 1e-06, \"betas\": [0.9, 0.95], \"eps\": 1e-08}}, \"fp16\": {\"fp16\": true, \"enabled\": true, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"initial_scale_power\": 12, \"hysteresis\": 2, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"stage\": 1, \"allgather_partitions\": true, \"allgather_bucket_size\": 1260000000, \"overlap_comm\": true, \"reduce_scatter\": true, \"reduce_bucket_size\": 1260000000, \"contiguous_gradients\": true, \"cpu_offload\": false}, \"steps_per_print\": 2}" --megatron_config "{\"train_batch_size\": 1536, \"train_micro_batch_size_per_gpu\": 4, \"gradient_accumulation_steps\": 32, \"optimizer\": {\"type\": \"Adam\", \"params\": {\"lr\": 1e-06, \"betas\": [0.9, 0.95], \"eps\": 1e-08}}, \"fp16\": {\"fp16\": true, \"enabled\": true, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"initial_scale_power\": 12, \"hysteresis\": 2, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"stage\": 1, \"allgather_partitions\": true, \"allgather_bucket_size\": 1260000000, \"overlap_comm\": true, \"reduce_scatter\": true, \"reduce_bucket_size\": 1260000000, \"contiguous_gradients\": true, \"cpu_offload\": false}, \"steps_per_print\": 2, \"precision\": \"fp16\", \"num_layers\": 44, \"hidden_size\": 6144, \"num_attention_heads\": 64, \"seq_length\": 2048, \"max_position_embeddings\": 2048, \"pos_emb\": \"rotary\", \"no_weight_tying\": true, \"attention_config\": [\"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\"], \"sparsity_config\": {}, \"scaled_upper_triang_masked_softmax_fusion\": true, \"bias_gelu_fusion\": true, \"rotary_pct\": 0.25, \"init_method\": \"small_init\", \"output_layer_init_method\": \"wang_init\", \"gpt_j_residual\": true, \"output_layer_parallelism\": \"column\", \"lr_decay_style\": \"cosine\", \"lr_decay_iters\": 10000, \"min_lr\": 1e-06, \"optimizer_type\": \"Adam\", \"zero_stage\": 1, \"zero_reduce_scatter\": true, \"zero_contiguous_gradients\": true, \"zero_reduce_bucket_size\": 1260000000, \"zero_allgather_bucket_size\": 1260000000, \"lr\": 1e-06, \"tokenizer_type\": \"HFTokenizer\", \"train_data_paths\": [\"/mnt/ssd-1/P3_combined/train_text_document\"], \"test_data_paths\": [\"/mnt/ssd-1/P3_combined/test_text_document\"], \"valid_data_paths\": [\"/mnt/ssd-1/P3_combined/validation_text_document\"], \"train_data_weights\": [1.0], \"valid_data_weights\": [1.0], \"test_data_weights\": [1.0], \"data_impl\": \"mmap\", \"save\": \"/mnt/ssd-1/20B_finetune\", \"load\": \"/mnt/ssd-1/20B_checkpoints\", \"save_interval\": 250, \"finetune\": true, \"batch_size\": 4, \"train_iters\": 10000, \"eval_iters\": 10, \"eval_interval\": 125, \"vocab_file\": \"/mnt/ssd-1/data/20B_tokenizer.json\", \"attention_dropout\": 0.0, \"hidden_dropout\": 0.0, \"checkpoint_activations\": true, \"synchronize_each_layer\": true, \"gas\": 32, \"clip_grad\": 1.0, \"dynamic_loss_scale\": true, \"pipe_parallel_size\": 4, \"model_parallel_size\": 2, \"is_pipe_parallel\": true, \"wandb_group\": \"20B_finetune_3oz74lbx\", \"wandb_team\": \"eleutherai\", \"wandb_project\": \"gpt-thicc\", \"log_dir\": \"/mnt/ssd-1/logs\", \"tensorboard_dir\": \"/mnt/ssd-1/tensorboard\", \"log_interval\": 2, \"user_script\": \"train.py\", \"global_num_gpus\": 96}"
System Hardware
| CPU count | 128 |
| GPU count | 8 |
| GPU type | NVIDIA A100-SXM4-40GB |
W&B CLI Version
0.10.28
Group
ReConfig
Config parameters are your model's inputs. Learn more
- {} 181 keys▶
- "gelu"
- false
- 1,000
- null
- false
- [] 44 items▶
- 0
- false
- 4
- false
- true
- false
- true
- false
- 1
- false
- 1
- null
- false
- "mmap"
- null
- false
- null
- true
- true
- false
- false
- "nccl"
- null
- null
- null
- false
- true
- false
- 125
- 10
- ""
- null
- null
- null
- true
- null
- {} 7 keys▶
- false
- false
- 32
- {} 8 keys▶
- 1,260,000,000
- true
- 1
46 ... 95▶▶96 ... 145▶▶146 ... 176▶▶
Summary
Summary metrics are your model's outputs. Learn more
No summary metrics saved for this run.
Check the summary metrics documentation for more information.