nonenman

Nonenman's group workspace

Group: mlm_bipar

1-7

of 7

Tags

Notes

pretraining roberta-large on bipar

Author

nonenman

State

Finished

Start time

December 1st, 2023 3:22:21 PM

Runtime

36m 42s

Tracked hours

36m 40s

Run path

nonenman/RC_project/66ca88c5

Linux-4.18.0-372.46.1.el8_6.x86_64-x86_64-with-glibc2.28

Python version

3.10.0

Command

/pfs/data5/home/as/as_as/as_nonenman/python_scripts/train_mlm.py --model_name roberta-large --batch_size 48 --seq_length 384 --num_train_epochs 20 --learning_rate 2e-5 --mask_words True --mlm_probability 0.20

System Hardware

CPU count	64
Logical CPU count	128
GPU count	1
GPU type	NVIDIA A100 80GB PCIe

W&B CLI Version

0.14.0

Group

mlm_bipar

Config parameters are your model's inputs. Learn more

▶
Config parameters:{} 7 keys
- batch_size:
  48
- learning_rate:
  0.00002
- mlm_probability:
  0.2
- num_train_epochs:
  20
- seq_length:
  384
- warmup_proportion:
  0.06
- weight_decay:
  0.01

Summary metrics are your model's outputs. Learn more

▶
Summary metrics:{} 13 keys
- eval/avg_loss:
  0.9327388689631508
- eval/perplexity:
  2.545281303322198
- model/hidden_size:
  1,024
- model/max_position_embeddings:
  514
- model/num_attention_heads:
  16
- model/num_hidden_layers:
  24
- model/num_parameters:
  355,412,057
- model/vocab_size:
  50,265
- train/avg_loss:
  0.7320750010241369
- train/learning_rate:
  0
- train/loss:
  0.7438557147979736
- train/samples_per_sec:
  77.91145557830338
- train/total_runtime:
  36.21204754027227

This run produced these artifacts as outputs. Total: 1. Learn more

wandb-history

run-66ca88c5-history:v0

Nonenman's group workspace

mlm_bipar

30_roberta-large_A100_48_384_2e-5_20-words_20epochs