Skip to main content

Igoro's group workspace

final_image_test_0_2konuiqr

What makes this group special?
Tags

new-2-0

Notes
Author
State
Finished
Start time
November 2nd, 2021 11:00:06 PM
Runtime
8m 42s
Tracked hours
-
Run path
eleutherai/gpt-thicc/3421905c
OS
Linux-5.11.0-34-generic-x86_64-with-glibc2.29
Python version
3.8.10
Git repository
git clone https://github.com/EleutherAI/gpt-neox.git
Git state
git checkout -b "new-2-0" 19b16838b8275bac12b95ae2b84f1087041c6282
Command
train.py --local_rank=0 --deepspeed_config "{\"train_batch_size\": 1536, \"train_micro_batch_size_per_gpu\": 4, \"gradient_accumulation_steps\": 32, \"optimizer\": {\"params\": {\"betas\": [0.9, 0.95], \"eps\": 1e-08, \"lr\": 9.7e-05}, \"type\": \"Adam\"}, \"fp16\": {\"enabled\": true, \"fp16\": true, \"hysteresis\": 2, \"initial_scale_power\": 12, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"allgather_bucket_size\": 1440000000, \"allgather_partitions\": true, \"contiguous_gradients\": true, \"cpu_offload\": false, \"overlap_comm\": true, \"reduce_bucket_size\": 1440000000, \"reduce_scatter\": true, \"stage\": 1}, \"steps_per_print\": 2, \"wall_clock_breakdown\": true, \"zero_allow_untested_optimizer\": true}" --megatron_config "{\"train_batch_size\": 1536, \"train_micro_batch_size_per_gpu\": 4, \"gradient_accumulation_steps\": 32, \"optimizer\": {\"params\": {\"betas\": [0.9, 0.95], \"eps\": 1e-08, \"lr\": 9.7e-05}, \"type\": \"Adam\"}, \"fp16\": {\"enabled\": true, \"fp16\": true, \"hysteresis\": 2, \"initial_scale_power\": 12, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"allgather_bucket_size\": 1440000000, \"allgather_partitions\": true, \"contiguous_gradients\": true, \"cpu_offload\": false, \"overlap_comm\": true, \"reduce_bucket_size\": 1440000000, \"reduce_scatter\": true, \"stage\": 1}, \"steps_per_print\": 2, \"wall_clock_breakdown\": true, \"zero_allow_untested_optimizer\": true, \"precision\": \"fp16\", \"num_layers\": 44, \"hidden_size\": 6144, \"num_attention_heads\": 64, \"seq_length\": 2048, \"max_position_embeddings\": 2048, \"pos_emb\": \"rotary\", \"no_weight_tying\": true, \"attention_config\": [\"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\"], \"sparsity_config\": {}, \"scaled_upper_triang_masked_softmax_fusion\": true, \"bias_gelu_fusion\": true, \"rotary_pct\": 0.25, \"init_method\": \"small_init\", \"output_layer_init_method\": \"wang_init\", \"gpt_j_residual\": true, \"output_layer_parallelism\": \"column\", \"lr_decay_style\": \"cosine\", \"lr_decay_iters\": 100000, \"optimizer_type\": \"Adam\", \"use_bnb_optimizer\": true, \"zero_stage\": 1, \"zero_reduce_scatter\": true, \"zero_contiguous_gradients\": true, \"zero_reduce_bucket_size\": 1440000000, \"zero_allgather_bucket_size\": 1440000000, \"lr\": 9.7e-05, \"data_path\": \"/mnt/ssd-1/data/pile_filtered_tokenized/pile_filtered_text_document\", \"data_impl\": \"mmap\", \"save_interval\": 1000, \"batch_size\": 4, \"train_iters\": 10, \"eval_iters\": 0, \"keep_last_n_checkpoints\": 4, \"split\": \"949,50,1\", \"vocab_file\": \"/mnt/ssd-1/data/gpt2-vocab.json\", \"merge_file\": \"/mnt/ssd-1/data/gpt2-merges.txt\", \"attention_dropout\": 0, \"hidden_dropout\": 0, \"weight_decay\": 0, \"checkpoint_activations\": true, \"synchronize_each_layer\": true, \"gas\": 32, \"clip_grad\": 1.0, \"dynamic_loss_scale\": true, \"pipe_parallel_size\": 4, \"model_parallel_size\": 2, \"is_pipe_parallel\": true, \"wandb_group\": \"final_image_test_0_2konuiqr\", \"wandb_team\": \"eleutherai\", \"wandb_project\": \"gpt-thicc\", \"log_dir\": \"/mnt/ssd-1/logs\", \"tensorboard_dir\": \"/mnt/ssd-1/tensorboard\", \"log_interval\": 2, \"user_script\": \"train.py\", \"global_num_gpus\": 96}"
System Hardware
CPU count128
GPU count8
GPU typeNVIDIA A100-SXM4-40GB
W&B CLI Version
0.10.28
Config

Config parameters are your model's inputs. Learn more

  • {} 180 keys
    • "gelu"
    • false
    • 1,000
    • null
    • false
    • [] 44 items
      • 0
      • false
      • 4
      • false
      • true
      • false
      • true
      • false
      • 1
      • false
      • 1
      • false
      • "mmap"
      • "/mnt/ssd-1/data/pile_filtered_tokenized/pile_filtered_text_document"
      • false
      • null
      • true
      • true
      • false
      • false
      • "nccl"
      • null
      • null
      • null
      • false
      • true
      • false
      • 1,000
      • 0
      • ""
      • null
      • null
      • null
      • false
      • null
      • {} 7 keys
        • false
        • false
        • 32
        • "19b1683"
        • 46 ... 95
          96 ... 145
          146 ... 175
        • {} 8 keys
          • 1,440,000,000
          • true
          • 1
        Summary

        Summary metrics are your model's outputs. Learn more

        No summary metrics saved for this run.

        Check the summary metrics documentation for more information.