Skip to main content

Chilli's group workspace

K3ZsCJD4ynEa5rxJF75NcygEVFMJLKVRSTW4JQPsJa56

What makes this group special?
Tags

neox-visual-grounding-0-0

Notes
State
Crashed
Start time
April 27th, 2021 6:09:41 PM
Runtime
8m
Tracked hours
7m 48s
Run path
eleutherai/neox/1mfa8421
OS
Linux-5.4.0-54-generic-x86_64-with-glibc2.29
Python version
3.8.5
Git repository
git clone https://github.com/EleutherAI/gpt-neox.git
Git state
git checkout -b "neox-visual-grounding-0-0" 9992042ab113428022e5e91421c04917577b8e00
Command
pretrain_gpt2.py --local_rank=0 --num_gpus 6 --deepspeed_config "{\"train_batch_size\": 96, \"train_micro_batch_size_per_gpu\": 16, \"optimizer\": {\"type\": \"Adam\", \"params\": {\"lr\": 0.00025, \"betas\": [0.9, 0.999], \"eps\": 1e-08}}, \"fp16\": {\"fp16\": true, \"enabled\": true, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"hysteresis\": 2, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"stage\": 1, \"allgather_partitions\": true, \"allgather_bucket_size\": 500000000, \"overlap_comm\": true, \"reduce_scatter\": true, \"reduce_bucket_size\": 500000000, \"contiguous_gradients\": true, \"cpu_offload\": false}, \"wall_clock_breakdown\": true}" --megatron_config "{\"num_gpus\": 6, \"train_batch_size\": 96, \"train_micro_batch_size_per_gpu\": 16, \"optimizer\": {\"type\": \"Adam\", \"params\": {\"lr\": 0.00025, \"betas\": [0.9, 0.999], \"eps\": 1e-08}}, \"fp16\": {\"fp16\": true, \"enabled\": true, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"hysteresis\": 2, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"stage\": 1, \"allgather_partitions\": true, \"allgather_bucket_size\": 500000000, \"overlap_comm\": true, \"reduce_scatter\": true, \"reduce_bucket_size\": 500000000, \"contiguous_gradients\": true, \"cpu_offload\": false}, \"wall_clock_breakdown\": true, \"precision\": \"fp16\", \"num_layers\": 24, \"hidden_size\": 1536, \"num_attention_heads\": 16, \"seq_length\": 2048, \"max_position_embeddings\": 2048, \"pos_emb\": \"rotary\", \"no_weight_tying\": true, \"lr_decay_style\": \"cosine\", \"lr_decay_iters\": 320000, \"optimizer_type\": \"Adam\", \"zero_stage\": 1, \"zero_reduce_scatter\": true, \"zero_contiguous_gradients\": true, \"zero_reduce_bucket_size\": 500000000, \"zero_allgather_bucket_size\": 500000000, \"lr\": 0.00025, \"data_path\": \"data/enwik8/enwik8_text_document\", \"data_impl\": \"mmap\", \"save\": \"checkpoints/\", \"load\": \"checkpoints/\", \"save_interval\": 10000, \"batch_size\": 16, \"train_iters\": 320000, \"eval_iters\": 10, \"keep_last_n_checkpoints\": 4, \"split\": \"900,99,1\", \"vocab_file\": \"data/gpt2-vocab.json\", \"merge_file\": \"data/gpt2-merges.txt\", \"attention_dropout\": 0, \"hidden_dropout\": 0, \"weight_decay\": 0, \"checkpoint_activations\": true, \"synchronize_each_layer\": true, \"partition_activations\": true, \"gas\": 1, \"clip_grad\": 1.0, \"dynamic_loss_scale\": true, \"pipe_parallel_size\": 1, \"world_size\": 1, \"wandb_group\": \"K3ZsCJD4ynEa5rxJF75Ncy\", \"log_dir\": \"logs/\", \"tensorboard_dir\": \"/mnt/ssd-cluster/tensorboard\", \"log_interval\": 100, \"local_rank\": 0, \"rank\": 0, \"user_script\": \"pretrain_gpt2.py\"}"
System Hardware
CPU count112
GPU count6
GPU typeA100-PCIE-40GB
W&B CLI Version
0.10.25
Config

Config parameters are your model's inputs. Learn more

  • {} 162 keys
    • false
    • 1,000
    • null
    • false
    • false
    • 0
    • false
    • 16
    • false
    • false
    • true
    • false
    • 1
    • 1
    • false
    • "mmap"
    • "data/enwik8/enwik8_text_document"
    • false
    • null
    • true
    • false
    • false
    • false
    • false
    • "nccl"
    • null
    • null
    • null
    • false
    • true
    • false
    • 1,000
    • 10
    • null
    • null
    • false
    • null
    • {} 6 keys
      • false
      • false
      • 1
      • false
      • null
      • "9992042"
      • 1
      • 1
      • 46 ... 95
        96 ... 145
        146 ... 157
      • {} 8 keys
        • 500,000,000
        • true
        • 1
      Summary

      Summary metrics are your model's outputs. Learn more

      • {} 3 keys
        • 0.000002578125
        • 9.885313987731934
        • 65,536
      Artifact Outputs

      This run produced these artifacts as outputs. Learn more

      Loading...