Skip to main content

Preetham-gali's group workspace

2qb64HRVvbTxkAUpFAHti3_2965okl5

What makes this group special?
Tags

distilling-0-0-topk1024

Notes
State
Failed
Start time
September 24th, 2021 4:01:18 AM
Runtime
6d 21h 56m 12s
Tracked hours
6d 21h 56m 10s
Run path
eleutherai/distilling/ylrux17x
OS
Linux-5.11.0-34-generic-x86_64-with-glibc2.29
Python version
3.8.5
Git repository
git clone https://github.com/EleutherAI/gpt-neox.git
Git state
git checkout -b "distilling-0-0-topk1024" 54e0e672fd2c589599d95789a8fb46327ed2097a
Command
pretrain_gpt2.py --local_rank=0 --num_gpus 4 --deepspeed_config "{\"train_batch_size\": 540, \"train_micro_batch_size_per_gpu\": 5, \"gradient_accumulation_steps\": 27, \"optimizer\": {\"type\": \"Adam\", \"params\": {\"lr\": 0.0003, \"betas\": [0.9, 0.999], \"eps\": 1e-08}}, \"fp16\": {\"enabled\": true, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"hysteresis\": 2, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"stage\": 1, \"allgather_partitions\": true, \"allgather_bucket_size\": 500000000, \"overlap_comm\": true, \"reduce_scatter\": true, \"reduce_bucket_size\": 500000000, \"contiguous_gradients\": true, \"cpu_offload\": false}, \"wall_clock_breakdown\": true}" --megatron_config "{\"num_gpus\": 4, \"train_batch_size\": 540, \"train_micro_batch_size_per_gpu\": 5, \"gradient_accumulation_steps\": 27, \"optimizer\": {\"type\": \"Adam\", \"params\": {\"lr\": 0.0003, \"betas\": [0.9, 0.999], \"eps\": 1e-08}}, \"fp16\": {\"enabled\": true, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"hysteresis\": 2, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"stage\": 1, \"allgather_partitions\": true, \"allgather_bucket_size\": 500000000, \"overlap_comm\": true, \"reduce_scatter\": true, \"reduce_bucket_size\": 500000000, \"contiguous_gradients\": true, \"cpu_offload\": false}, \"wall_clock_breakdown\": true, \"precision\": \"fp16\", \"lr_decay_style\": \"cosine\", \"lr_decay_iters\": 250000, \"optimizer_type\": \"Adam\", \"zero_stage\": 1, \"zero_reduce_scatter\": true, \"zero_contiguous_gradients\": true, \"zero_reduce_bucket_size\": 500000000, \"zero_allgather_bucket_size\": 500000000, \"lr\": 0.0003, \"data_path\": \"/mnt/ssd-1/data/pile/pile_text_document\", \"data_impl\": \"mmap\", \"save\": \"checkpoints/distilling/med-to-small-topk1024\", \"save_interval\": 10000, \"finetune\": true, \"batch_size\": 5, \"train_iters\": 250000, \"eval_iters\": 10, \"keep_last_n_checkpoints\": 4, \"split\": \"949,50,1\", \"vocab_file\": \"/mnt/ssd-1/data/gpt2-vocab.json\", \"merge_file\": \"/mnt/ssd-1/data/gpt2-merges.txt\", \"attention_dropout\": 0.0, \"hidden_dropout\": 0.0, \"weight_decay\": 0.1, \"checkpoint_activations\": true, \"synchronize_each_layer\": true, \"partition_activations\": true, \"gas\": 27, \"clip_grad\": 1.0, \"dynamic_loss_scale\": true, \"do_distillation\": true, \"teacher_model_args\": {\"precision\": null, \"num_layers\": 24, \"hidden_size\": 1024, \"num_attention_heads\": 8, \"seq_length\": 2048, \"max_position_embeddings\": 2048, \"norm\": \"layernorm\", \"layernorm_epsilon\": 1e-05, \"rms_norm_epsilon\": 1e-08, \"scalenorm_epsilon\": 1e-08, \"pos_emb\": \"rotary\", \"rpe_num_buckets\": 32, \"rpe_max_distance\": 128, \"no_weight_tying\": true, \"attention_config\": [\"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\"], \"sparsity_config\": {}, \"num_unique_layers\": null, \"param_sharing_style\": \"grouped\", \"make_vocab_size_divisible_by\": 128, \"apply_residual_connection_post_layernorm\": false, \"activation\": \"gelu\", \"scaled_upper_triang_masked_softmax_fusion\": false, \"scaled_masked_softmax_fusion\": false, \"bias_gelu_fusion\": false, \"bias_dropout_fusion\": false, \"fp16_lm_cross_entropy\": false, \"init_method_std\": 0.02, \"apply_query_key_layer_scaling\": false, \"use_cpu_initialization\": false, \"attention_softmax_in_fp32\": false, \"rotary_pct\": 1.0, \"rotary_emb_base\": 10000, \"init_method\": \"normal\", \"output_layer_init_method\": \"scaled_normal\", \"gmlp_attn_dim\": 64}, \"student_model_args\": {\"precision\": null, \"num_layers\": 12, \"hidden_size\": 1024, \"num_attention_heads\": 8, \"seq_length\": 2048, \"max_position_embeddings\": 2048, \"norm\": \"layernorm\", \"layernorm_epsilon\": 1e-05, \"rms_norm_epsilon\": 1e-08, \"scalenorm_epsilon\": 1e-08, \"pos_emb\": \"rotary\", \"rpe_num_buckets\": 32, \"rpe_max_distance\": 128, \"no_weight_tying\": true, \"attention_config\": [\"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\", \"global\"], \"sparsity_config\": {}, \"num_unique_layers\": null, \"param_sharing_style\": \"grouped\", \"make_vocab_size_divisible_by\": 128, \"apply_residual_connection_post_layernorm\": false, \"activation\": \"gelu\", \"scaled_upper_triang_masked_softmax_fusion\": false, \"scaled_masked_softmax_fusion\": false, \"bias_gelu_fusion\": false, \"bias_dropout_fusion\": false, \"fp16_lm_cross_entropy\": false, \"init_method_std\": 0.02, \"apply_query_key_layer_scaling\": false, \"use_cpu_initialization\": false, \"attention_softmax_in_fp32\": false, \"rotary_pct\": 1.0, \"rotary_emb_base\": 10000, \"init_method\": \"normal\", \"output_layer_init_method\": \"scaled_normal\", \"gmlp_attn_dim\": 64}, \"load_teacher\": \"/mnt/ssd-1/neox_checkpoints/dense_medium_checkpoints/global_step250000\", \"alpha_lm\": 1.0, \"alpha_kld\": 1.0, \"pipe_parallel_size\": 1, \"is_pipe_parallel\": true, \"use_wandb\": true, \"wandb_group\": \"2qb64HRVvbTxkAUpFAHti3_2965okl5\", \"wandb_team\": \"eleutherai\", \"wandb_project\": \"distilling\", \"log_dir\": \"logs\", \"log_interval\": 100, \"user_script\": \"pretrain_gpt2.py\"}"
System Hardware
CPU count128
GPU count4
GPU typeNVIDIA A100-SXM4-40GB
W&B CLI Version
0.12.1
Config

Config parameters are your model's inputs. Learn more

  • {} 186 keys
    • "gelu"
    • false
    • 1,000
    • 1
    • 1
    • 0
    • null
    • false
    • false
    • null
    • 0
    • false
    • 5
    • false
    • false
    • true
    • false
    • 1
    • false
    • 1
    • false
    • "mmap"
    • "/mnt/ssd-1/data/pile/pile_text_document"
    • false
    • null
    • true
    • true
    • false
    • false
    • "nccl"
    • true
    • null
    • null
    • null
    • false
    • true
    • false
    • 1,000
    • 10
    • ""
    • null
    • null
    • null
    • true
    • null
    • {} 5 keys
      • 46 ... 95
        96 ... 145
        146 ... 181
      • {} 8 keys
        • 500,000,000
        • true
        • 1
      Summary

      Summary metrics are your model's outputs. Learn more

      • {} 38 keys
        • 36,541,252,702,670.46
        • 41.36487409353256
        • 13.05455442168092
        • 0.6127357482910156
        • 0.20885467529296875
        • 567.2719478607178
        • 1.2650489807128906
        • 155,601.5362739563
        • 6.005048751831055
        • 17.325162887573242
        • 155,555.37009239197
        • 155,567.29793548584
        • 155,612.20979690552
        • 511.37304306030273
        • 764.9610042572021
        • 198,379.78053092957
        • 37.532471786018455
        • 0.18451538459837977
        • 47.850964032800896
        • 0.1376102144318473
        • 569.8182582855225
        • 0.5791187286376953
        • 570.5022811889648
        • 0
        • 6.5546875
        • 0.0002983239951089881
        • 10.530523300170898
        • 17.08087158203125
        • 1
        • 0
        • 6.3203125
        • 555.7466361419574
        • 8.500429153442383
        • 4,916.878482913281
        • 14.820825576782228
        • 2,732,768.0105929472
        • 0
        • 1
      Artifact Outputs

      This run produced these artifacts as outputs. Total: 1. Learn more

      Loading...