Chilli's group workspace
jmzpl6tw_k7d4ruyz
What makes this group special?
Tags
stella-ord-0-0
Notes
Author
State
Crashed
Start time
June 24th, 2024 4:29:19 AM
Runtime
2m 31s
Tracked hours
-
Run path
eleutherai/neox/sb6gnw18
OS
Linux-5.19.17-coreweave-x86_64-with-glibc2.17
Python version
3.8.19
Git repository
git clone https://github.com/EleutherAI/gpt-neox.git
Git state
git checkout -b "stella-ord-0-0" 4c426da8b6149e2313bc6e00584531f004cfe457
Command
train.py --local_rank=0 --deepspeed_config eyJ0cmFpbl9iYXRjaF9zaXplIjogMTI4LCAidHJhaW5fbWljcm9fYmF0Y2hfc2l6ZV9wZXJfZ3B1IjogNCwgImdyYWRpZW50X2FjY3VtdWxhdGlvbl9zdGVwcyI6IDQsICJvcHRpbWl6ZXIiOiB7InR5cGUiOiAiQWRhbSIsICJwYXJhbXMiOiB7ImxyIjogMC4wMDAyNSwgImJldGFzIjogWzAuOSwgMC45NV0sICJlcHMiOiAxZS0wOH19LCAiZnAzMl9hbGxyZWR1Y2UiOiB0cnVlLCAiZnAxNiI6IHsiZW5hYmxlZCI6IHRydWUsICJ0eXBlIjogImJmbG9hdDE2IiwgImF1dG9fY2FzdCI6IHRydWUsICJsb3NzX3NjYWxlIjogMCwgImxvc3Nfc2NhbGVfd2luZG93IjogMTAwMCwgImluaXRpYWxfc2NhbGVfcG93ZXIiOiAxMiwgImh5c3RlcmVzaXMiOiAyLCAibWluX2xvc3Nfc2NhbGUiOiAxfSwgInplcm9fb3B0aW1pemF0aW9uIjogeyJzdGFnZSI6IDAsICJhbGxnYXRoZXJfcGFydGl0aW9ucyI6IHRydWUsICJhbGxnYXRoZXJfYnVja2V0X3NpemUiOiA1MDAwMDAwMDAsICJvdmVybGFwX2NvbW0iOiB0cnVlLCAicmVkdWNlX3NjYXR0ZXIiOiB0cnVlLCAicmVkdWNlX2J1Y2tldF9zaXplIjogNTAwMDAwMDAwLCAiY29udGlndW91c19ncmFkaWVudHMiOiB0cnVlLCAiY3B1X29mZmxvYWQiOiBmYWxzZX0sICJ3YWxsX2Nsb2NrX2JyZWFrZG93biI6IHRydWV9 --megatron_config 
System Hardware
CPU count | 48 |
Logical CPU count | 96 |
GPU count | 8 |
GPU type | NVIDIA A40 |
W&B CLI Version
0.17.1
Group
jmzpl6tw_k7d4ruyzConfig
Config parameters are your model's inputs. Learn more
- {} 265 keys▶
- null
- "gelu"
- null
- false
- 1,000
- null
- false
- [] 16 items▶
- 0
- false
- null
- null
- null
- 4
- null
- false
- true
- false
- null
- true
- 1,000
- false
- 1
- "linear"
- false
- 1
- null
- null
- null
- null
- {} 2 keys▶
- "{ "pipe_parallel_size": 1, "model_parallel_size": 1, "num_layers": 16, "hidden_size": 2048, "num_attention_heads": 8, "seq_length": 2048, "max_position_embeddings": 2048, "pos_emb": "rotary", "rotary_pct": 0.25, "no_weight_tying": true, "gpt_j_residual": true, "output_layer_parallelism": "column", "scaled_upper_triang_masked_softmax_fusion": true, "bias_gelu_fusion": true, "init_method": "small_init", "output_layer_init_method": "wang_init", "optimizer": { "type": "Adam", "params": { "lr": 0.00025, "betas": [0.9, 0.95], "eps": 1.0e-8 } }, "min_lr": 0.000025, "zero_optimization": { "stage": 0, "allgather_partitions": true, "allgather_bucket_size": 500000000, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 500000000, "contiguous_gradients": true, "cpu_offload": false }, "fp16": { "enabled": true, "type": "bfloat16", "auto_cast": true, "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 12, "hysteresis": 2, "min_loss_scale": 1 }, "fp32_allreduce": true, "train_micro_batch_size_per_gpu": 4, "gradient_accumulation_steps": 4, "data_impl": "mmap", "num_workers": 1, "checkpoint_activations": true, "checkpoint_num_layers": 1, "partition_activations": true, "synchronize_each_layer": true, "gradient_clipping": 1.0, "weight_decay": 0.1, "hidden_dropout": 0, "attention_dropout": 0, "train_iters": 143000, "lr_decay_iters": 143000, "distributed_backend": "nccl", "lr_decay_style": "cosine", "warmup": 0.01, "checkpoint_factor": 1000, "extra_save_iters": [0,1,2,4,8,16,32,64,128,256,512], "eval_interval": 143000, "eval_iters": 10, "log_interval": 10, "steps_per_print": 10, "wall_clock_breakdown": true, "tokenizer_type": "HFTokenizer", "dataset_type": "pause", "dataset_cfg": { "pause_id": 50277, } } "
- "# Suggested data paths when using GPT-NeoX locally { "data_path": "data/enwik8/enwik8_text_document", # or for weighted datasets: # "train-data-paths": ["data/enwik8/enwik8_text_document", "data/enwik8/enwik8_text_document"], # "test-data-paths": ["data/enwik8/enwik8_text_document", "data/enwik8/enwik8_text_document"], # "valid-data-paths": ["data/enwik8/enwik8_text_document", "data/enwik8/enwik8_text_document"], # "train-data-weights": [1., 2.], # "test-data-weights": [2., 1.], # "valid-data-weights": [0.5, 0.4], # If weight_by_num_documents is True, Builds dataset weights from a multinomial distribution over groups of data according to the number of documents in each group. # WARNING: setting this to True will override any user provided weights # "weight_by_num_documents": false, # "weighted_sampler_alpha": 0.3, "vocab_file": "../pythia_type2.json", "save": "checkpoints", "load": "checkpoints", "checkpoint_validation_with_forward_pass": False, "tensorboard_dir": "tensorboard", "log_dir": "logs", "use_wandb": True, "wandb_host": "https://api.wandb.ai", "wandb_project": "neox" } "
- false
- false
- true
- null
- null
- 0
- null
- "mmap"
- "data/enwik8/enwik8_text_document"
- null
- {} 1 key▶
- 50,277
- "pause"
- false
- null
- true
- {} 8 keys▶
- 500,000,000
- true
- 0
46 ... 95▶▶96 ... 145▶▶146 ... 195▶▶196 ... 245▶▶246 ... 260▶▶
Summary
Summary metrics are your model's outputs. Learn more
No summary metrics saved for this run.
Check the summary metrics documentation for more information.
Artifact Outputs
This run produced these artifacts as outputs. Total: 1. Learn more
Type
Name
Consumer count
Loading...