Skip to main content

Igoro's group workspace

final_image_test_0_3dck60v7

What makes this group special?
Tags

new-9-0

Notes
Author
State
Finished
Start time
November 3rd, 2021 11:15:54 PM
Runtime
19s
Tracked hours
-
Run path
eleutherai/gpt-thicc/lqzq5but
OS
Linux-5.11.0-34-generic-x86_64-with-glibc2.29
Python version
3.8.10
Git repository
git clone https://github.com/EleutherAI/gpt-neox.git
Git state
git checkout -b "new-9-0" 19b16838b8275bac12b95ae2b84f1087041c6282
Command
train.py --local_rank=0 --deepspeed_config "{\"train_batch_size\": 1536, \"train_micro_batch_size_per_gpu\": 4, \"gradient_accumulation_steps\": 32, \"optimizer\": {\"params\": {\"betas\": [0.9, 0.95], \"eps\": 1e-08, \"lr\": 9.7e-05}, \"type\": \"Adam\"}, \"fp16\": {\"enabled\": true, \"fp16\": true, \"hysteresis\": 2, \"initial_scale_power\": 12, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"allgather_bucket_size\": 1440000000, \"allgather_partitions\": true, \"contiguous_gradients\": true, \"cpu_offload\": false, \"overlap_comm\": true, \"reduce_bucket_size\": 1440000000, \"reduce_scatter\": true, \"stage\": 1}, \"steps_per_print\": 2, \"wall_clock_breakdown\": true, \"zero_allow_untested_optimizer\": true}" --megatron_config "{\"train_batch_size\": 1536, \"train_micro_batch_size_per_gpu\": 4, \"gradient_accumulation_steps\": 32, \"optimizer\": {\"params\": {\"betas\": [0.9, 0.95], \"eps\": 1e-08, \"lr\": 9.7e-05}, \"type\": \"Adam\"}, \"fp16\": {\"enabled\": true, \"fp16\": true, \"hysteresis\": 2, \"initial_scale_power\": 12, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"allgather_bucket_size\": 1440000000, \"allgather_partitions\": true, \"contiguous_gradients\": true, \"cpu_offload\": false, \"overlap_comm\": true, \"reduce_bucket_size\": 1440000000, \"reduce_scatter\": true, \"stage\": 1}, \"steps_per_print\": 2, \"wall_clock_breakdown\": true, \"zero_allow_untested_optimizer\": true, \"precision\": \"fp16\", \"num_layers\": 44, \"hidden_size\": 6144, \"num_attention_heads\": 64, \"seq_length\": 2048, \"max_position_embeddings\": 2048, \"pos_emb\": \"rotary\", \"no_weight_tying\": true, \"attention_config\": [\"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\"], \"sparsity_config\": {}, \"scaled_upper_triang_masked_softmax_fusion\": true, \"bias_gelu_fusion\": true, \"rotary_pct\": 0.25, \"init_method\": \"small_init\", \"output_layer_init_method\": \"wang_init\", \"gpt_j_residual\": true, \"output_layer_parallelism\": \"column\", \"lr_decay_style\": \"cosine\", \"lr_decay_iters\": 100000, \"optimizer_type\": \"Adam\", \"use_bnb_optimizer\": true, \"zero_stage\": 1, \"zero_reduce_scatter\": true, \"zero_contiguous_gradients\": true, \"zero_reduce_bucket_size\": 1440000000, \"zero_allgather_bucket_size\": 1440000000, \"lr\": 9.7e-05, \"data_path\": \"/mnt/ssd-1/data/pile_filtered_tokenized/pile_filtered_text_document\", \"data_impl\": \"mmap\", \"save_interval\": 1000, \"batch_size\": 4, \"train_iters\": 10, \"eval_iters\": 0, \"keep_last_n_checkpoints\": 4, \"split\": \"949,50,1\", \"vocab_file\": \"/mnt/ssd-1/data/gpt2-vocab.json\", \"merge_file\": \"/mnt/ssd-1/data/gpt2-merges.txt\", \"attention_dropout\": 0, \"hidden_dropout\": 0, \"weight_decay\": 0, \"checkpoint_activations\": true, \"synchronize_each_layer\": true, \"gas\": 32, \"clip_grad\": 1.0, \"dynamic_loss_scale\": true, \"pipe_parallel_size\": 4, \"model_parallel_size\": 2, \"is_pipe_parallel\": true, \"wandb_group\": \"final_image_test_0_3dck60v7\", \"wandb_team\": \"eleutherai\", \"wandb_project\": \"gpt-thicc\", \"log_dir\": \"/mnt/ssd-1/logs\", \"tensorboard_dir\": \"/mnt/ssd-1/tensorboard\", \"log_interval\": 2, \"user_script\": \"train.py\", \"global_num_gpus\": 96}"
System Hardware
CPU count128
GPU count8
GPU typeNVIDIA A100-SXM4-40GB
W&B CLI Version
0.12.6
Config

Config parameters are your model's inputs. Learn more

No config parameters were saved for this run.

Check the configuration documentation for more information.

Summary

Summary metrics are your model's outputs. Learn more

No summary metrics saved for this run.

Check the summary metrics documentation for more information.