Comment
Section 1
eval/dolma_common-crawl-validation/Perplexity
eval/dolma_common-crawl-validation/Perplexity
optim/total_grad_norm
optim/total_grad_norm
train/CrossEntropyLoss
train/CrossEntropyLoss
eval/pile-validation/CrossEntropyLoss
eval/pile-validation/CrossEntropyLoss
eval/pile-validation/Perplexity
eval/pile-validation/Perplexity
Run: olmoe-8x1b-newhp-newds-cx5-fine1-newtok
1
Run set 2
1
Run set 3
Run set 4
Add a comment
Created with ❤️ on Weights & Biases.
https://wandb.ai/ai2-llm/olmoe/reports/Why-Nikla-s-MoE-w-new-tokenizer-spikey-val---Vmlldzo4ODkzNzM2