eleutherai

Igoro's group workspace

Group: final_image_test_0_2konuiqr

1-12

of 12

Tags

Notes

Author

sdtblck

State

Finished

Start time

November 2nd, 2021 11:00:06 PM

Runtime

8m 42s

Tracked hours

Run path

eleutherai/gpt-thicc/3421905c

Linux-5.11.0-34-generic-x86_64-with-glibc2.29

Python version

3.8.10

Git repository

git clone https://github.com/EleutherAI/gpt-neox.git

Git state

git checkout -b "new-2-0" 19b16838b8275bac12b95ae2b84f1087041c6282

Command

train.py --local_rank=0 --deepspeed_config "{\"train_batch_size\": 1536, \"train_micro_batch_size_per_gpu\": 4, \"gradient_accumulation_steps\": 32, \"optimizer\": {\"params\": {\"betas\": [0.9, 0.95], \"eps\": 1e-08, \"lr\": 9.7e-05}, \"type\": \"Adam\"}, \"fp16\": {\"enabled\": true, \"fp16\": true, \"hysteresis\": 2, \"initial_scale_power\": 12, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"allgather_bucket_size\": 1440000000, \"allgather_partitions\": true, \"contiguous_gradients\": true, \"cpu_offload\": false, \"overlap_comm\": true, \"reduce_bucket_size\": 1440000000, \"reduce_scatter\": true, \"stage\": 1}, \"steps_per_print\": 2, \"wall_clock_breakdown\": true, \"zero_allow_untested_optimizer\": true}" --megatron_config "{\"train_batch_size\": 1536, \"train_micro_batch_size_per_gpu\": 4, \"gradient_accumulation_steps\": 32, \"optimizer\": {\"params\": {\"betas\": [0.9, 0.95], \"eps\": 1e-08, \"lr\": 9.7e-05}, \"type\": \"Adam\"}, \"fp16\": {\"enabled\": true, \"fp16\": true, \"hysteresis\": 2, \"initial_scale_power\": 12, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"allgather_bucket_size\": 1440000000, \"allgather_partitions\": true, \"contiguous_gradients\": true, \"cpu_offload\": false, \"overlap_comm\": true, \"reduce_bucket_size\": 1440000000, \"reduce_scatter\": true, \"stage\": 1}, \"steps_per_print\": 2, \"wall_clock_breakdown\": true, \"zero_allow_untested_optimizer\": true, \"precision\": \"fp16\", \"num_layers\": 44, \"hidden_size\": 6144, \"num_attention_heads\": 64, \"seq_length\": 2048, \"max_position_embeddings\": 2048, \"pos_emb\": \"rotary\", \"no_weight_tying\": true, \"attention_config\": [\"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\"], \"sparsity_config\": {}, \"scaled_upper_triang_masked_softmax_fusion\": true, \"bias_gelu_fusion\": true, \"rotary_pct\": 0.25, \"init_method\": \"small_init\", \"output_layer_init_method\": \"wang_init\", \"gpt_j_residual\": true, \"output_layer_parallelism\": \"column\", \"lr_decay_style\": \"cosine\", \"lr_decay_iters\": 100000, \"optimizer_type\": \"Adam\", \"use_bnb_optimizer\": true, \"zero_stage\": 1, \"zero_reduce_scatter\": true, \"zero_contiguous_gradients\": true, \"zero_reduce_bucket_size\": 1440000000, \"zero_allgather_bucket_size\": 1440000000, \"lr\": 9.7e-05, \"data_path\": \"/mnt/ssd-1/data/pile_filtered_tokenized/pile_filtered_text_document\", \"data_impl\": \"mmap\", \"save_interval\": 1000, \"batch_size\": 4, \"train_iters\": 10, \"eval_iters\": 0, \"keep_last_n_checkpoints\": 4, \"split\": \"949,50,1\", \"vocab_file\": \"/mnt/ssd-1/data/gpt2-vocab.json\", \"merge_file\": \"/mnt/ssd-1/data/gpt2-merges.txt\", \"attention_dropout\": 0, \"hidden_dropout\": 0, \"weight_decay\": 0, \"checkpoint_activations\": true, \"synchronize_each_layer\": true, \"gas\": 32, \"clip_grad\": 1.0, \"dynamic_loss_scale\": true, \"pipe_parallel_size\": 4, \"model_parallel_size\": 2, \"is_pipe_parallel\": true, \"wandb_group\": \"final_image_test_0_2konuiqr\", \"wandb_team\": \"eleutherai\", \"wandb_project\": \"gpt-thicc\", \"log_dir\": \"/mnt/ssd-1/logs\", \"tensorboard_dir\": \"/mnt/ssd-1/tensorboard\", \"log_interval\": 2, \"user_script\": \"train.py\", \"global_num_gpus\": 96}"

System Hardware

CPU count	128
GPU count	8
GPU type	NVIDIA A100-SXM4-40GB

W&B CLI Version

0.10.28

Group

final_image_test_0_2konuiqr

Config parameters are your model's inputs. Learn more

▶
Config parameters:{} 180 keys
- activation:
  "gelu"
- adlr_autoresume:
  false
- adlr_autoresume_interval:
  1,000
- amp:
  null
- apply_query_key_layer_scaling:
  false
- ▶
  attention_config:[] 44 items
- attention_dropout:
  0
- attention_softmax_in_fp32:
  false
- batch_size:
  4
- bias_dropout_fusion:
  false
- bias_gelu_fusion:
  true
- char_level_ppl:
  false
- checkpoint_activations:
  true
- checkpoint_in_cpu:
  false
- checkpoint_num_layers:
  1
- checkpoint_validation_with_forward_pass:
  false
- clip_grad:
  1
- contiguous_checkpointing:
  false
- data_impl:
  "mmap"
- data_path:
  "/mnt/ssd-1/data/pile_filtered_tokenized/pile_filtered_text_document"
- deepscale:
  false
- deepscale_config:
  null
- deepspeed:
  true
- deepspeed_activation_checkpointing:
  true
- deepspeed_mpi:
  false
- detect_nvlink_pairs:
  false
- distributed_backend:
  "nccl"
- do_test:
  null
- do_train:
  null
- do_valid:
  null
- dump_state:
  false
- dynamic_loss_scale:
  true
- eod_mask_loss:
  false
- eval_interval:
  1,000
- eval_iters:
  0
- eval_results_prefix:
  ""
- eval_tasks:
  null
- exclude:
  null
- exit_interval:
  null
- finetune:
  false
- flops_profiler:
  null
- ▶
  fp16:{} 7 keys
- fp16_lm_cross_entropy:
  false
- fp32_allreduce:
  false
- gas:
  32
- git_hash:
  "19b1683"
- ▶
  zero_optimization:{} 8 keys
- zero_reduce_bucket_size:
  1,440,000,000
- zero_reduce_scatter:
  true
- zero_stage:
  1

Summary metrics are your model's outputs. Learn more

No summary metrics saved for this run.

Check the summary metrics documentation for more information.