Skip to main content

Llama 3.1 vs Tootsie SFT on Total Mixture

Created on March 6|Last edited on March 6

Select runs that logged train/loss
to visualize data in this line chart.
Run set
0
Name
2 visualized
State
Notes
User
Tags
Created
Runtime
Sweep
_attn_implementation_autoset
_name_or_path
accelerator_config.even_batches
accelerator_config.non_blocking
accelerator_config.split_batches
accelerator_config.use_seedable_sampler
adafactor
adam_beta1
adam_beta2
adam_epsilon
add_cross_attention
architectures
attention_probs_dropout_prob
auto_find_batch_size
auto_map.AutoConfig
auto_map.AutoModel
auto_map.AutoModelForMaskedLM
auto_map.AutoModelForMultipleChoice
auto_map.AutoModelForQuestionAnswering
auto_map.AutoModelForSequenceClassification
auto_map.AutoModelForTokenClassification
average_tokens_across_devices
batch_eval_metrics
bf16
bf16_full_eval
chat_train_urls
chunk_size_feed_forward
data.cache_dir
data.cache_options.batch_size
data.cache_options.num_shard_groups
data.cache_options.prefetch_per_group
data.cache_options.shard_order_randomization_key
data.cache_options.target_size_per_flush
data.configs.SlimPajama-627B.cache_dir
data.configs.SlimPajama-627B.plaintext
data.configs.SlimPajama-627B.stream
data.configs.SlimPajama-627B.tags
data.configs.SlimPajama-627B.text_key
data.configs.SlimPajama-627B.train_urls
data.configs.SlimPajama-627B.validation_urls
data.configs.SlimPajama-6B.cache_dir
data.configs.SlimPajama-6B.plaintext
data.configs.SlimPajama-6B.stream
data.configs.SlimPajama-6B.tags
0
of 0