eleutherai

Igoro's group workspace

Group: final_image_test_0_3dck60v7

1-12

of 12

Tags

Notes

Author

sdtblck

State

Finished

Start time

November 3rd, 2021 11:15:54 PM

Runtime

19s

Tracked hours

Run path

eleutherai/gpt-thicc/lqzq5but

Linux-5.11.0-34-generic-x86_64-with-glibc2.29

Python version

3.8.10

Git repository

git clone https://github.com/EleutherAI/gpt-neox.git

Git state

git checkout -b "new-9-0" 19b16838b8275bac12b95ae2b84f1087041c6282

Command

train.py --local_rank=0 --deepspeed_config "{\"train_batch_size\": 1536, \"train_micro_batch_size_per_gpu\": 4, \"gradient_accumulation_steps\": 32, \"optimizer\": {\"params\": {\"betas\": [0.9, 0.95], \"eps\": 1e-08, \"lr\": 9.7e-05}, \"type\": \"Adam\"}, \"fp16\": {\"enabled\": true, \"fp16\": true, \"hysteresis\": 2, \"initial_scale_power\": 12, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"allgather_bucket_size\": 1440000000, \"allgather_partitions\": true, \"contiguous_gradients\": true, \"cpu_offload\": false, \"overlap_comm\": true, \"reduce_bucket_size\": 1440000000, \"reduce_scatter\": true, \"stage\": 1}, \"steps_per_print\": 2, \"wall_clock_breakdown\": true, \"zero_allow_untested_optimizer\": true}" --megatron_config "{\"train_batch_size\": 1536, \"train_micro_batch_size_per_gpu\": 4, \"gradient_accumulation_steps\": 32, \"optimizer\": {\"params\": {\"betas\": [0.9, 0.95], \"eps\": 1e-08, \"lr\": 9.7e-05}, \"type\": \"Adam\"}, \"fp16\": {\"enabled\": true, \"fp16\": true, \"hysteresis\": 2, \"initial_scale_power\": 12, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"allgather_bucket_size\": 1440000000, \"allgather_partitions\": true, \"contiguous_gradients\": true, \"cpu_offload\": false, \"overlap_comm\": true, \"reduce_bucket_size\": 1440000000, \"reduce_scatter\": true, \"stage\": 1}, \"steps_per_print\": 2, \"wall_clock_breakdown\": true, \"zero_allow_untested_optimizer\": true, \"precision\": \"fp16\", \"num_layers\": 44, \"hidden_size\": 6144, \"num_attention_heads\": 64, \"seq_length\": 2048, \"max_position_embeddings\": 2048, \"pos_emb\": \"rotary\", \"no_weight_tying\": true, \"attention_config\": [\"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\"], \"sparsity_config\": {}, \"scaled_upper_triang_masked_softmax_fusion\": true, \"bias_gelu_fusion\": true, \"rotary_pct\": 0.25, \"init_method\": \"small_init\", \"output_layer_init_method\": \"wang_init\", \"gpt_j_residual\": true, \"output_layer_parallelism\": \"column\", \"lr_decay_style\": \"cosine\", \"lr_decay_iters\": 100000, \"optimizer_type\": \"Adam\", \"use_bnb_optimizer\": true, \"zero_stage\": 1, \"zero_reduce_scatter\": true, \"zero_contiguous_gradients\": true, \"zero_reduce_bucket_size\": 1440000000, \"zero_allgather_bucket_size\": 1440000000, \"lr\": 9.7e-05, \"data_path\": \"/mnt/ssd-1/data/pile_filtered_tokenized/pile_filtered_text_document\", \"data_impl\": \"mmap\", \"save_interval\": 1000, \"batch_size\": 4, \"train_iters\": 10, \"eval_iters\": 0, \"keep_last_n_checkpoints\": 4, \"split\": \"949,50,1\", \"vocab_file\": \"/mnt/ssd-1/data/gpt2-vocab.json\", \"merge_file\": \"/mnt/ssd-1/data/gpt2-merges.txt\", \"attention_dropout\": 0, \"hidden_dropout\": 0, \"weight_decay\": 0, \"checkpoint_activations\": true, \"synchronize_each_layer\": true, \"gas\": 32, \"clip_grad\": 1.0, \"dynamic_loss_scale\": true, \"pipe_parallel_size\": 4, \"model_parallel_size\": 2, \"is_pipe_parallel\": true, \"wandb_group\": \"final_image_test_0_3dck60v7\", \"wandb_team\": \"eleutherai\", \"wandb_project\": \"gpt-thicc\", \"log_dir\": \"/mnt/ssd-1/logs\", \"tensorboard_dir\": \"/mnt/ssd-1/tensorboard\", \"log_interval\": 2, \"user_script\": \"train.py\", \"global_num_gpus\": 96}"

System Hardware

CPU count	128
GPU count	8
GPU type	NVIDIA A100-SXM4-40GB

W&B CLI Version

0.12.6

Group

final_image_test_0_3dck60v7

Config parameters are your model's inputs. Learn more

No config parameters were saved for this run.

Check the configuration documentation for more information.

Summary metrics are your model's outputs. Learn more

No summary metrics saved for this run.

Check the summary metrics documentation for more information.