Igoro's group workspace
final_image_test_0_3dck60v7
What makes this group special?
Tags
new-9-0
Notes
Author
State
Finished
Start time
November 3rd, 2021 11:15:54 PM
Runtime
19s
Tracked hours
-
Run path
eleutherai/gpt-thicc/lqzq5but
OS
Linux-5.11.0-34-generic-x86_64-with-glibc2.29
Python version
3.8.10
Git repository
git clone https://github.com/EleutherAI/gpt-neox.git
Git state
git checkout -b "new-9-0" 19b16838b8275bac12b95ae2b84f1087041c6282
Command
train.py --local_rank=0 --deepspeed_config "{\"train_batch_size\": 1536, \"train_micro_batch_size_per_gpu\": 4, \"gradient_accumulation_steps\": 32, \"optimizer\": {\"params\": {\"betas\": [0.9, 0.95], \"eps\": 1e-08, \"lr\": 9.7e-05}, \"type\": \"Adam\"}, \"fp16\": {\"enabled\": true, \"fp16\": true, \"hysteresis\": 2, \"initial_scale_power\": 12, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"allgather_bucket_size\": 1440000000, \"allgather_partitions\": true, \"contiguous_gradients\": true, \"cpu_offload\": false, \"overlap_comm\": true, \"reduce_bucket_size\": 1440000000, \"reduce_scatter\": true, \"stage\": 1}, \"steps_per_print\": 2, \"wall_clock_breakdown\": true, \"zero_allow_untested_optimizer\": true}" --megatron_config "{\"train_batch_size\": 1536, \"train_micro_batch_size_per_gpu\": 4, \"gradient_accumulation_steps\": 32, \"optimizer\": {\"params\": {\"betas\": [0.9, 0.95], \"eps\": 1e-08, \"lr\": 9.7e-05}, \"type\": \"Adam\"}, \"fp16\": {\"enabled\": true, \"fp16\": true, \"hysteresis\": 2, \"initial_scale_power\": 12, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"allgather_bucket_size\": 1440000000, \"allgather_partitions\": true, \"contiguous_gradients\": true, \"cpu_offload\": false, \"overlap_comm\": true, \"reduce_bucket_size\": 1440000000, \"reduce_scatter\": true, \"stage\": 1}, \"steps_per_print\": 2, \"wall_clock_breakdown\": true, \"zero_allow_untested_optimizer\": true, \"precision\": \"fp16\", \"num_layers\": 44, \"hidden_size\": 6144, \"num_attention_heads\": 64, \"seq_length\": 2048, \"max_position_embeddings\": 2048, \"pos_emb\": \"rotary\", \"no_weight_tying\": true, \"attention_config\": [\"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\"], \"sparsity_config\": {}, \"scaled_upper_triang_masked_softmax_fusion\": true, \"bias_gelu_fusion\": true, \"rotary_pct\": 0.25, \"init_method\": \"small_init\", \"output_layer_init_method\": \"wang_init\", \"gpt_j_residual\": true, \"output_layer_parallelism\": \"column\", \"lr_decay_style\": \"cosine\", \"lr_decay_iters\": 100000, \"optimizer_type\": \"Adam\", \"use_bnb_optimizer\": true, \"zero_stage\": 1, \"zero_reduce_scatter\": true, \"zero_contiguous_gradients\": true, \"zero_reduce_bucket_size\": 1440000000, \"zero_allgather_bucket_size\": 1440000000, \"lr\": 9.7e-05, \"data_path\": \"/mnt/ssd-1/data/pile_filtered_tokenized/pile_filtered_text_document\", \"data_impl\": \"mmap\", \"save_interval\": 1000, \"batch_size\": 4, \"train_iters\": 10, \"eval_iters\": 0, \"keep_last_n_checkpoints\": 4, \"split\": \"949,50,1\", \"vocab_file\": \"/mnt/ssd-1/data/gpt2-vocab.json\", \"merge_file\": \"/mnt/ssd-1/data/gpt2-merges.txt\", \"attention_dropout\": 0, \"hidden_dropout\": 0, \"weight_decay\": 0, \"checkpoint_activations\": true, \"synchronize_each_layer\": true, \"gas\": 32, \"clip_grad\": 1.0, \"dynamic_loss_scale\": true, \"pipe_parallel_size\": 4, \"model_parallel_size\": 2, \"is_pipe_parallel\": true, \"wandb_group\": \"final_image_test_0_3dck60v7\", \"wandb_team\": \"eleutherai\", \"wandb_project\": \"gpt-thicc\", \"log_dir\": \"/mnt/ssd-1/logs\", \"tensorboard_dir\": \"/mnt/ssd-1/tensorboard\", \"log_interval\": 2, \"user_script\": \"train.py\", \"global_num_gpus\": 96}"
System Hardware
| CPU count | 128 |
| GPU count | 8 |
| GPU type | NVIDIA A100-SXM4-40GB |
W&B CLI Version
0.12.6
Config
Config parameters are your model's inputs. Learn more
No config parameters were saved for this run.
Check the configuration documentation for more information.
Summary
Summary metrics are your model's outputs. Learn more
No summary metrics saved for this run.
Check the summary metrics documentation for more information.