Rdoublea's workspace
Runs
154
Name
2 visualized
State
Notes
User
Tags
Created
Runtime
Sweep
batch_size
checkpointer._component_
checkpointer.checkpoint_dir
checkpointer.checkpoint_files
checkpointer.model_type
checkpointer.output_dir
compile
dataset._component_
dataset.max_seq_len
dataset.packed
dataset.train_on_input
device
dtype
enable_activation_checkpointing
epochs
gradient_accumulation_steps
log_every_n_steps
log_peak_memory_stats
loss._component_
lr_scheduler._component_
lr_scheduler.num_warmup_steps
max_steps_per_epoch
memory_efficient_fsdp_wrap
metric_logger._component_
metric_logger.group
metric_logger.log_dir
metric_logger.name
metric_logger.project
model._component_
model.apply_lora_to_mlp
model.apply_lora_to_output
model.lora_alpha
model.lora_attn_modules
model.lora_rank
optimizer._component_
optimizer.foreach
optimizer.lr
optimizer.weight_decay
optimizer_in_bwd
output_dir
profiler._component_
profiler.active_steps
profiler.cpu
profiler.cuda
Crashed
Add notes...
rdoublea
21m 46s
-
1
torchtune.training.FullModelMetaCheckpointer
/home/rafiayub/checkpoints/17b_moe_svt_mp1pp1_non_te/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/Llama-4-17B/
true
torchtune.datasets.alpaca_dataset
-
false
-
cuda
bf16
false
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
100
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/lora-llama4-finetune
-
-
torchtune.models.llama4.lora_llama4_17bx16
true
false
32
["q_proj","v_proj","output_proj"]
16
torch.optim.AdamW
-
0.00002
-
false
/tmp/lora-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Killed
Add notes...
rdoublea
20m 12s
-
1
torchtune.training.FullModelMetaCheckpointer
/home/rafiayub/checkpoints/17b_moe_svt_mp1pp1_non_te/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/Llama-4-17B/
false
torchtune.datasets.alpaca_dataset
-
false
-
cuda
bf16
false
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
100
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/lora-llama4-finetune
-
-
torchtune.models.llama4.lora_llama4_17bx16
true
false
32
["q_proj","v_proj","output_proj"]
16
torch.optim.AdamW
-
0.00002
-
false
/tmp/lora-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Failed
Add notes...
rdoublea
1m 29s
-
1
torchtune.training.FullModelMetaCheckpointer
/home/rafiayub/checkpoints/17b_moe_svt_mp1pp1_non_te/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/Llama-4-17B/
false
torchtune.datasets.alpaca_dataset
-
false
-
cuda
bf16
false
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/lora-llama4-finetune
-
-
torchtune.models.llama4.lora_llama4_17bx16
true
false
32
["q_proj","v_proj","output_proj"]
16
torch.optim.AdamW
-
0.00002
-
false
/tmp/lora-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Failed
Add notes...
rdoublea
7s
-
1
torchtune.training.FullModelMetaCheckpointer
/home/rafiayub/checkpoints/17b_moe_svt_mp1pp1_non_te/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/Llama-4-17B/
false
torchtune.datasets.alpaca_dataset
-
false
-
cuda
bf16
false
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/lora-llama4-finetune
-
-
torchtune.models.llama4.lora_llama4_17bx16
true
false
32
["q_proj","v_proj","output_proj"]
16
torch.optim.AdamW
-
0.00002
-
false
/tmp/lora-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Failed
Add notes...
rdoublea
1m 48s
-
1
torchtune.training.FullModelMetaCheckpointer
/home/rafiayub/checkpoints/17b_moe_svt_mp1pp1_non_te/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/Llama-4-17B/
false
torchtune.datasets.alpaca_dataset
-
false
-
cuda
bf16
false
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/lora-llama4-finetune
-
-
torchtune.models.llama4.lora_llama4_17bx16
true
false
32
["q_proj","v_proj","output_proj"]
16
torch.optim.AdamW
-
0.00002
-
false
/tmp/lora-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Failed
Add notes...
rdoublea
59s
-
2
torchtune.training.FullModelMetaCheckpointer
/home/jessicazhong/pci-wsf/jessicazhong/checkpoints/20m_moe_svt_mp1pp1/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/torchtune/llama4_20Mx8/full
false
torchtune.datasets.alpaca_dataset
-
false
-
cuda
bf16
false
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/torchtune/llama4_20Mx8/full/logs
-
-
torchtune.models.llama4.llama4_20mx8
-
-
-
-
-
torch.optim.AdamW
-
0.0002
-
true
/tmp/torchtune/llama4_20Mx8/full
torchtune.training.setup_torch_profiler
1
true
true
Failed
Add notes...
rdoublea
2s
-
2
torchtune.training.FullModelMetaCheckpointer
/home/jessicazhong/pci-wsf/jessicazhong/checkpoints/20m_moe_svt_mp1pp1/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/Llama-4-20M-MOE/
false
torchtune.datasets.multimodal.librispeech_asr_dataset
-
false
-
cuda
bf16
false
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/full-llama4-finetune
-
-
torchtune.models.llama4.llama4_20mx8
-
-
-
-
-
torch.optim.AdamW
-
0.0002
-
true
/tmp/full-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Killed
Add notes...
rdoublea
27s
-
2
torchtune.training.FullModelMetaCheckpointer
/home/jessicazhong/pci-wsf/jessicazhong/checkpoints/20m_moe_svt_mp1pp1/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/Llama-4-20M-MOE/
false
torchtune.datasets.multimodal.librispeech_asr_dataset
-
false
-
cuda
bf16
false
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/full-llama4-finetune
-
-
torchtune.models.llama4.llama4_20mx8
-
-
-
-
-
torch.optim.AdamW
-
0.0002
-
true
/tmp/full-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Failed
Add notes...
rdoublea
26s
-
2
torchtune.training.FullModelMetaCheckpointer
/home/jessicazhong/pci-wsf/jessicazhong/checkpoints/20m_moe_svt_mp1pp1/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/Llama-4-20M-MOE/
false
torchtune.datasets.multimodal.librispeech_asr_dataset
-
false
-
cuda
bf16
false
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/full-llama4-finetune
-
-
torchtune.models.llama4.llama4_20mx8
-
-
-
-
-
torch.optim.AdamW
-
0.0002
-
true
/tmp/full-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Failed
Add notes...
rdoublea
1m 29s
-
1
torchtune.training.FullModelMetaCheckpointer
/tmp/Llama-4-20M-MOE/epoch_0
["ft-model-00001-of-00001.bin"]
LLAMA4
/tmp/Llama-4-20M-MOE/
false
torchtune.datasets.alpaca_dataset
-
false
-
cuda
bf16
true
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
10
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/full-llama4-finetune
-
-
torchtune.models.llama4.llama4_17bx16
-
-
-
-
-
torch.optim.AdamW
-
0.00002
-
false
/tmp/full-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Failed
Add notes...
rdoublea
1m 34s
-
1
torchtune.training.FullModelMetaCheckpointer
/tmp/Llama-4-20M-MOE/epoch_0
["ft-model-00001-of-00001.bin"]
LLAMA4
/tmp/Llama-4-20M-MOE/
false
torchtune.datasets.alpaca_dataset
-
false
-
cuda
bf16
true
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
10
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/full-llama4-finetune
-
-
torchtune.models.llama4.llama4_17bx16
-
-
-
-
-
torch.optim.AdamW
-
0.00002
-
false
/tmp/full-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Failed
Add notes...
rdoublea
4m 55s
-
1
torchtune.training.FullModelMetaCheckpointer
/tmp/Llama-4-20M-MOE/epoch_0
["ft-model-00001-of-00001.bin"]
LLAMA4
/tmp/Llama-4-20M-MOE/
false
torchtune.datasets.alpaca_dataset
-
false
-
cuda
bf16
true
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
10
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/full-llama4-finetune
-
-
torchtune.models.llama4.llama4_17bx16
-
-
-
-
-
torch.optim.AdamW
-
0.00002
-
false
/tmp/full-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Crashed
Add notes...
rdoublea
7m 31s
-
1
torchtune.training.FullModelMetaCheckpointer
/tmp/Llama-4-20M-MOE/epoch_0
["ft-model-00001-of-00001.bin"]
LLAMA4
/tmp/Llama-4-20M-MOE/
false
torchtune.datasets.alpaca_dataset
-
false
-
cuda
bf16
true
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
10
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/full-llama4-finetune
-
-
torchtune.models.llama4.llama4_17bx16
-
-
-
-
-
torch.optim.AdamW
-
0.00002
-
false
/tmp/full-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Failed
Add notes...
rdoublea
1m 22s
-
1
torchtune.training.FullModelMetaCheckpointer
/home/rafiayub/checkpoints/17b_moe_svt_mp1pp1_non_te/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/Llama-4-17B/
false
torchtune.datasets.alpaca_dataset
-
false
-
cuda
bf16
false
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/lora-llama4-finetune
-
-
torchtune.models.llama4.lora_llama4_17bx16
true
false
32
["q_proj","v_proj","output_proj"]
16
torch.optim.AdamW
-
0.00002
-
false
/tmp/lora-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Crashed
Add notes...
rdoublea
24m 13s
-
1
torchtune.training.FullModelMetaCheckpointer
/home/rafiayub/checkpoints/17b_moe_svt_mp1pp1_non_te/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/Llama-4-20M-MOE/
false
torchtune.datasets.multimodal.librispeech_asr_dataset
-
false
-
cuda
bf16
true
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/full-llama4-finetune
-
-
torchtune.models.llama4.llama4_17bx16
-
-
-
-
-
torch.optim.AdamW
-
0.00002
-
false
/tmp/full-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Failed
Add notes...
rdoublea
9m 46s
-
1
torchtune.training.FullModelMetaCheckpointer
/home/rafiayub/checkpoints/17b_moe_svt_mp1pp1_non_te/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/Llama-4-20M-MOE/
false
torchtune.datasets.multimodal.librispeech_asr_dataset
-
false
-
cuda
bf16
true
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/full-llama4-finetune
-
-
torchtune.models.llama4.llama4_17bx16
-
-
-
-
-
torch.optim.AdamW
-
0.00002
-
false
/tmp/full-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Failed
Add notes...
rdoublea
1m 26s
-
1
torchtune.training.FullModelMetaCheckpointer
/home/rafiayub/checkpoints/17b_moe_svt_mp1pp1_non_te/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/Llama-4-20M-MOE/
false
torchtune.datasets.multimodal.librispeech_asr_dataset
-
false
-
cuda
bf16
true
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/full-llama4-finetune
-
-
torchtune.models.llama4.llama4_17bx16
-
-
-
-
-
torch.optim.AdamW
-
0.00002
-
false
/tmp/full-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Failed
Add notes...
rdoublea
1m 23s
-
1
torchtune.training.FullModelMetaCheckpointer
/home/rafiayub/checkpoints/17b_moe_svt_mp1pp1_non_te/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/Llama-4-20M-MOE/
false
torchtune.datasets.multimodal.librispeech_asr_dataset
-
false
-
cuda
bf16
true
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/full-llama4-finetune
-
-
torchtune.models.llama4.llama4_17bx16
-
-
-
-
-
torch.optim.AdamW
-
0.00002
-
false
/tmp/full-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Failed
Add notes...
rdoublea
1m 24s
-
1
torchtune.training.FullModelMetaCheckpointer
/home/rafiayub/checkpoints/17b_moe_svt_mp1pp1_non_te/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/Llama-4-20M-MOE/
false
torchtune.datasets.multimodal.librispeech_asr_dataset
-
false
-
cuda
bf16
true
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/full-llama4-finetune
-
-
torchtune.models.llama4.llama4_17bx16
-
-
-
-
-
torch.optim.AdamW
-
0.00002
-
false
/tmp/full-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
Killed
Add notes...
rdoublea
19s
-
1
torchtune.training.FullModelMetaCheckpointer
/home/rafiayub/checkpoints/17b_moe_svt_mp1pp1_non_te/
["consolidated_with_vision_and_speech_encoder_weights.00.pth"]
LLAMA4
/tmp/Llama-4-20M-MOE/
false
torchtune.datasets.multimodal.librispeech_asr_dataset
-
false
-
cuda
bf16
true
1
1
1
true
torchtune.modules.loss.CEWithChunkedOutputLoss
-
-
1000
-
torchtune.training.metric_logging.WandBLogger
-
/tmp/full-llama4-finetune
-
-
torchtune.models.llama4.llama4_17bx16
-
-
-
-
-
torch.optim.AdamW
-
0.00002
-
false
/tmp/full-llama4-finetune
torchtune.training.setup_torch_profiler
1
true
true
1-20
of 154