Comment
Section 1
eval/dolma_common-crawl-validation/Perplexity
eval/dolma_common-crawl-validation/Perplexity
optim/total_grad_norm
optim/total_grad_norm
train/CrossEntropyLoss
train/CrossEntropyLoss
eval/pile-validation/CrossEntropyLoss
eval/pile-validation/CrossEntropyLoss
eval/pile-validation/Perplexity
eval/pile-validation/Perplexity
Run: olmoe-8x1b-newhp-newds-cx5-fine1-newtok
1
Run set 2
1
Run set 3
Run set 4
Name
0 visualized
State
Notes
User
Tags
Created
Runtime
Sweep
activation_checkpointing
algorithms.gradient_clipping.clipping_threshold
algorithms.gradient_clipping.clipping_type
autoresume
auxiliary_loss_multiplier
callbacks.speed_monitor.window_size
canceled_check_interval
compile.backend
compile.fullgraph
composer_commit_hash
composer_version
console_log_interval
data.drop_last
data.generate_attention_mask
data.generate_doc_lengths
data.instance_filter.repetition_max_count
data.instance_filter.repetition_max_period
data.instance_filter.repetition_min_period
data.label_mask_paths
data.memmap_dtype
data.num_workers
data.pad_direction
data.paths
data.persistent_workers
data.pin_memory
data.prefetch_factor
data.timeout
ddp.find_unused_params
ddp.grad_sync_mode
device_eval_batch_size
device_train_batch_size
device_train_grad_accum
device_train_microbatch_size
distributed_strategy
dry_run
enabled_algorithms/GradientClipping
epoch
eval_first
eval_interval
eval_loader.dataset.local
eval_loader.dataset.max_seq_len
eval_loader.dataset.shuffle
eval_loader.dataset.shuffle_seed
0
of 0
Add a comment
Created with ❤️ on Weights & Biases.
https://wandb.ai/ai2-llm/olmoe/reports/Why-Nikla-s-MoE-w-new-tokenizer-spikey-val---Vmlldzo4ODkzNzM2