Skip to main content

Preetham-gali's group workspace

hSY45YakxtAmpHucKxDCZz_19ewbxaz

What makes this group special?
Tags

shiv-0-0

Notes
State
Finished
Start time
August 17th, 2021 7:56:55 AM
Runtime
3h 52m 58s
Tracked hours
3h 52m 40s
Run path
eleutherai/distilling/1tikf8xm
OS
Linux-5.8.0-59-generic-x86_64-with-glibc2.29
Python version
3.8.5
Git repository
git clone https://github.com/EleutherAI/gpt-neox.git
Git state
git checkout -b "shiv-0-0" ecc3c2327d2fd053cc67c2cc656e1d847db2f797
Command
pretrain_gpt2.py --local_rank=0 --num_gpus 4 --deepspeed_config "{\"train_batch_size\": 540, \"train_micro_batch_size_per_gpu\": 5, \"gradient_accumulation_steps\": 27, \"optimizer\": {\"type\": \"Adam\", \"params\": {\"lr\": 0.0003, \"betas\": [0.9, 0.999], \"eps\": 1e-08}}, \"fp16\": {\"fp16\": true, \"enabled\": true, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"hysteresis\": 2, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"stage\": 1, \"allgather_partitions\": true, \"allgather_bucket_size\": 500000000, \"overlap_comm\": true, \"reduce_scatter\": true, \"reduce_bucket_size\": 500000000, \"contiguous_gradients\": true, \"cpu_offload\": false}, \"wall_clock_breakdown\": true}" --megatron_config "{\"num_gpus\": 4, \"train_batch_size\": 540, \"train_micro_batch_size_per_gpu\": 5, \"gradient_accumulation_steps\": 27, \"optimizer\": {\"type\": \"Adam\", \"params\": {\"lr\": 0.0003, \"betas\": [0.9, 0.999], \"eps\": 1e-08}}, \"fp16\": {\"fp16\": true, \"enabled\": true, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"hysteresis\": 2, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"stage\": 1, \"allgather_partitions\": true, \"allgather_bucket_size\": 500000000, \"overlap_comm\": true, \"reduce_scatter\": true, \"reduce_bucket_size\": 500000000, \"contiguous_gradients\": true, \"cpu_offload\": false}, \"wall_clock_breakdown\": true, \"precision\": \"fp16\", \"num_layers\": 24, \"hidden_size\": 1024, \"num_attention_heads\": 8, \"seq_length\": 2048, \"max_position_embeddings\": 2048, \"pos_emb\": \"rotary\", \"no_weight_tying\": true, \"attention_config\": [\"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\"], \"sparsity_config\": {}, \"lr_decay_style\": \"cosine\", \"lr_decay_iters\": 320000, \"optimizer_type\": \"Adam\", \"zero_stage\": 1, \"zero_reduce_scatter\": true, \"zero_contiguous_gradients\": true, \"zero_reduce_bucket_size\": 500000000, \"zero_allgather_bucket_size\": 500000000, \"lr\": 0.0003, \"data_path\": \"/mnt/ssd-1/data/pile/pile_text_document\", \"data_impl\": \"mmap\", \"save\": \"checkpoints/medium-training/1-1-0\", \"save_interval\": 10000, \"finetune\": true, \"batch_size\": 5, \"train_iters\": 320000, \"eval_iters\": 10, \"keep_last_n_checkpoints\": 4, \"split\": \"949,50,1\", \"vocab_file\": \"/mnt/ssd-1/data/gpt2-vocab.json\", \"merge_file\": \"/mnt/ssd-1/data/gpt2-merges.txt\", \"attention_dropout\": 0, \"hidden_dropout\": 0, \"weight_decay\": 0, \"checkpoint_activations\": true, \"synchronize_each_layer\": true, \"partition_activations\": true, \"gas\": 27, \"clip_grad\": 1.0, \"dynamic_loss_scale\": true, \"pipe_parallel_size\": 1, \"is_pipe_parallel\": true, \"use_wandb\": true, \"wandb_group\": \"hSY45YakxtAmpHucKxDCZz_19ewbxaz\", \"wandb_team\": \"eleutherai\", \"wandb_project\": \"distilling\", \"log_dir\": \"logs\", \"log_interval\": 100, \"user_script\": \"pretrain_gpt2.py\"}"
System Hardware
CPU count128
GPU count4
GPU typeNVIDIA A100-SXM4-40GB
W&B CLI Version
0.10.28
Config

Config parameters are your model's inputs. Learn more

  • {} 186 keys
    • "gelu"
    • false
    • 1,000
    • 0
    • 0
    • 0
    • null
    • false
    • false
    • [] 24 items
      • 0
      • false
      • 5
      • false
      • false
      • true
      • false
      • 1
      • false
      • 1
      • false
      • "mmap"
      • "/mnt/ssd-1/data/pile/pile_text_document"
      • false
      • null
      • true
      • true
      • false
      • false
      • "nccl"
      • false
      • null
      • null
      • null
      • false
      • true
      • false
      • 1,000
      • 10
      • ""
      • null
      • null
      • null
      • true
      • null
      • {} 6 keys
        • 46 ... 95
          96 ... 145
          146 ... 181
        • {} 8 keys
          • 500,000,000
          • true
          • 1
        Summary

        Summary metrics are your model's outputs. Learn more

        • {} 29 keys
          • 109,629,937,622,472.44
          • 13.749365980625152
          • 39.27453824132242
          • 0.5195140838623047
          • 0.1118183135986328
          • 404.874324798584
          • 0.8051395416259766
          • 102,997.76983261108
          • 4.068851470947266
          • 11.81316375732422
          • 102,967.50450134276
          • 102,975.14224052428
          • 103,004.50539588928
          • 280.1547050476074
          • 359.0736389160156
          • 29,366.528511047363
          • 74.8878157428386
          • 0.2610759490516577
          • 21.351871790196995
          • 0.2958599806630684
          • 307.31940269470215
          • 0.4107952117919922
          • 406.91423416137695
          • 0
          • 0.0000928125
          • 3.6650736331939697
          • 32,768
          • 3.665481567382813
          • 39.07494885379973
        Artifact Outputs

        This run produced these artifacts as outputs. Total: 1. Learn more

        Loading...