Skip to main content
taishi-nakamura
Projects
Drop-Upcycling
Log in
Sign up
Overview
Workspace
Runs
Automat.
Sweeps
Reports
Artifacts
Taishi-nakamura's workspace
Personal workspace
Automated workspace
Changes are only visible to you.
Runs
167
Name
12 visualized
upcycle-8×1.56B-torch_rand_002_random_init_0.5_drop_upcycling_0.5_paper-h100-16node-128gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002_random_init_0.5_drop_upcycling_0.5_paper-h100-16node-128gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002_random_init_0.5_drop_upcycling_0.5_paper-h100-16node-128gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002_random_init_0.5_drop_upcycling_0.5_paper-h100-16node-128gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×152M-torch_rand_002_random_init_0.5_drop_upcycling_0.5_paper-h100-8node-64gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×152M-torch_rand_002_random_init_0.5_drop_upcycling_0.5_paper-h100-8node-64gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002-0.50_noise-h100-8node-64gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002-0.50_noise-h100-8node-64gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002-0.50_noise_load_balance-h100-4node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002-0.50_noise_load_balance-h100-4node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002-0.50_noise-h100-8node-64gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002-0.50_noise-h100-8node-64gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002-0.50_noise-h100-8node-64gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002-0.50_noise-h100-8node-64gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-Mixtral-8x152M-shuffle_torch_rand_002_iter_0477000_random_rand_init_0.50_noise-h100-8node-64gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1~
upcycle-Mixtral-8x152M-shuffle_torch_rand_002_iter_0477000_random_rand_init_0.50_noise-h100-8node-64gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1~
upcycle-8×1.56B-torch_rand_002_btx_load_balance-h100-8node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002_btx_load_balance-h100-8node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
from_scratch_Mixtral-8x1.56B_load_balance-h100-8node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
from_scratch_Mixtral-8x1.56B_load_balance-h100-8node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-shuffle_torch_rand_002_random_init_1.0_load_balance-h100-8node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-shuffle_torch_rand_002_random_init_1.0_load_balance-h100-8node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002_load_balance-h100-8node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002_load_balance-h100-8node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-shuffle_torch_rand_002_random_init_0.5_load_balance-h100-8node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-shuffle_torch_rand_002_random_init_0.5_load_balance-h100-8node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002_btx_load_balance-h100-8node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-torch_rand_002_btx_load_balance-h100-8node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-Mixtral-8x152M-shuffle_torch_rand_002_iter_0477000_random_rand_init_0.50_noise-h100-16node-64gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1~
upcycle-Mixtral-8x152M-shuffle_torch_rand_002_iter_0477000_random_rand_init_0.50_noise-h100-16node-64gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1~
upcycle-8×152M-torch_rand_002_btx-h100-16node-64gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1~
upcycle-8×152M-torch_rand_002_btx-h100-16node-64gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1~
upcycle-8×152M-torch_rand_002_btx-h100-16node-64gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1~
upcycle-8×152M-torch_rand_002_btx-h100-16node-64gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1~
upcycle-8×1.56B-shuffle_torch_rand_002_random_init_0.75-h100-8node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×1.56B-shuffle_torch_rand_002_random_init_0.75-h100-8node-32gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
from_scratch_Mixtral-8x3.78B-h100-16node-64gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
from_scratch_Mixtral-8x3.78B-h100-16node-64gpu-4096s-BS=1024-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
TSUBAME4-Llama-152M-en-h100-8node-32gpu-4096s-DP=32-TP=1-PP=1-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=1000-WD=0.1-GC=1-z-loss-2024-09-25-18-28-34
TSUBAME4-Llama-152M-en-h100-8node-32gpu-4096s-DP=32-TP=1-PP=1-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=1000-WD=0.1-GC=1-z-loss-2024-09-25-18-28-34
1-20
of 167
stats/tokens_per_sec_per_gpu
stats/tokens_per_sec_per_gpu
Showing first 10 runs
50k
100k
150k
200k
Step
10000
20000
30000
40000
50000
upcycle-8×152M-torch_rand_002_random_init_0.5_drop_upcycling_0.5_paper-h100-8node-64gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-Mixtral-8x152M-shuffle_torch_rand_002_iter_0477000_random_rand_init_0.50_noise-h100-8node-64gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1~
upcycle-Mixtral-8x152M-shuffle_torch_rand_002_iter_0477000_random_rand_init_0.50_noise-h100-16node-64gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1~
upcycle-8×152M-torch_rand_002_random_init_0.5-h100-2node-8gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×152M-torch_rand_002_random_init_0.5-h100-2node-8gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×152M-torch_rand_002_random_init_0.5-h100-2node-8gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×152M-torch_rand_002_random_init_0.5-h100-2node-8gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×152M-torch_rand_002_random_init_0.5-h100-2node-8gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×152M-torch_rand_002_random_init_0.5-h100-2node-8gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
upcycle-8×152M-torch_rand_002_random_init_0.5-h100-2node-8gpu-4096s-BS=512-LR=2e-4-MINLR=2e-5-WARMUP=2000-WD=0.1-GC=1
Previous
Next