Chilli's group workspace
BBtbK5KrG6G23TABQGZXM5eUaezasHviLSJrcy5hiUMp
What makes this group special?
Tags
neox-visual-grounding-0-0
Notes
Author
State
Crashed
Start time
April 27th, 2021 6:18:22 PM
Runtime
11m 2s
Tracked hours
10m 52s
Run path
eleutherai/neox/1kfcju34
OS
Linux-5.4.0-54-generic-x86_64-with-glibc2.29
Python version
3.8.5
Git repository
git clone https://github.com/EleutherAI/gpt-neox.git
Git state
git checkout -b "neox-visual-grounding-0-0" 9992042ab113428022e5e91421c04917577b8e00
Command
pretrain_gpt2.py --local_rank=0 --num_gpus 6 --deepspeed_config "{\"train_batch_size\": 96, \"train_micro_batch_size_per_gpu\": 16, \"optimizer\": {\"type\": \"Adam\", \"params\": {\"lr\": 0.00025, \"betas\": [0.9, 0.999], \"eps\": 1e-08}}, \"fp16\": {\"fp16\": true, \"enabled\": true, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"hysteresis\": 2, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"stage\": 1, \"allgather_partitions\": true, \"allgather_bucket_size\": 500000000, \"overlap_comm\": true, \"reduce_scatter\": true, \"reduce_bucket_size\": 500000000, \"contiguous_gradients\": true, \"cpu_offload\": false}, \"wall_clock_breakdown\": true}" --megatron_config "{\"num_gpus\": 6, \"train_batch_size\": 96, \"train_micro_batch_size_per_gpu\": 16, \"optimizer\": {\"type\": \"Adam\", \"params\": {\"lr\": 0.00025, \"betas\": [0.9, 0.999], \"eps\": 1e-08}}, \"fp16\": {\"fp16\": true, \"enabled\": true, \"loss_scale\": 0, \"loss_scale_window\": 1000, \"hysteresis\": 2, \"min_loss_scale\": 1}, \"gradient_clipping\": 1.0, \"zero_optimization\": {\"stage\": 1, \"allgather_partitions\": true, \"allgather_bucket_size\": 500000000, \"overlap_comm\": true, \"reduce_scatter\": true, \"reduce_bucket_size\": 500000000, \"contiguous_gradients\": true, \"cpu_offload\": false}, \"wall_clock_breakdown\": true, \"precision\": \"fp16\", \"num_layers\": 24, \"hidden_size\": 1536, \"num_attention_heads\": 16, \"seq_length\": 2048, \"max_position_embeddings\": 2048, \"pos_emb\": \"rotary\", \"no_weight_tying\": true, \"lr_decay_style\": \"cosine\", \"lr_decay_iters\": 320000, \"optimizer_type\": \"Adam\", \"zero_stage\": 1, \"zero_reduce_scatter\": true, \"zero_contiguous_gradients\": true, \"zero_reduce_bucket_size\": 500000000, \"zero_allgather_bucket_size\": 500000000, \"lr\": 0.00025, \"data_path\": \"/mnt/ssd-cluster/data/enron/enron_text_document\", \"data_impl\": \"mmap\", \"save\": \"/mnt/ssd-cluster/checkpoints\", \"load\": \"/mnt/ssd-cluster/checkpoints\", \"save_interval\": 10000, \"batch_size\": 16, \"train_iters\": 320000, \"eval_iters\": 10, \"keep_last_n_checkpoints\": 4, \"split\": \"900,99,1\", \"vocab_file\": \"/mnt/ssd-cluster/data/gpt2-vocab.json\", \"merge_file\": \"/mnt/ssd-cluster/data/gpt2-merges.txt\", \"attention_dropout\": 0, \"hidden_dropout\": 0, \"weight_decay\": 0, \"checkpoint_activations\": true, \"synchronize_each_layer\": true, \"partition_activations\": true, \"gas\": 1, \"clip_grad\": 1.0, \"dynamic_loss_scale\": true, \"pipe_parallel_size\": 1, \"world_size\": 1, \"wandb_group\": \"BBtbK5KrG6G23TABQGZXM5\", \"log_dir\": \"/mnt/ssd-cluster/logs\", \"tensorboard_dir\": \"/mnt/ssd-cluster/tensorboard\", \"log_interval\": 100, \"local_rank\": 0, \"rank\": 0, \"user_script\": \"pretrain_gpt2.py\"}"
System Hardware
CPU count | 112 |
GPU count | 6 |
GPU type | A100-PCIE-40GB |
W&B CLI Version
0.10.25
Config
Config parameters are your model's inputs. Learn more
- {} 162 keys▶
- false
- 1,000
- null
- false
- false
- 0
- false
- 16
- false
- false
- true
- false
- 1
- 1
- false
- "mmap"
- "/mnt/ssd-cluster/data/enron/enron_text_document"
- false
- null
- true
- false
- false
- false
- false
- "nccl"
- null
- null
- null
- false
- true
- false
- 1,000
- 10
- null
- null
- false
- null
- {} 6 keys▶
- false
- false
- 1
- false
- null
- "9992042"
- 1
- 1
- {} 8 keys▶
- 500,000,000
- true
- 1
46 ... 95▶▶96 ... 145▶▶146 ... 157▶▶
Summary
Summary metrics are your model's outputs. Learn more
- {} 3 keys▶
- 0.00000390625
- 9.11441421508789
- 16,384
Artifact Outputs
This run produced these artifacts as outputs. Total: 1. Learn more
Type
Name
Consumer count
Loading...