Skip to main content

Dchanda's group workspace

Timestamps visible
2022-01-23 01:26:09
Some weights of the model checkpoint at google/byt5-small were not used when initializing T5EncoderModel: ['decoder.block.3.layer.1.EncDecAttention.k.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.0.layer.0.SelfAttention.q.weight', 'decoder.block.2.layer.0.SelfAttention.q.weight', 'lm_head.weight', 'decoder.block.1.layer.2.layer_norm.weight', 'decoder.block.3.layer.1.EncDecAttention.q.weight', 'decoder.block.2.layer.2.layer_norm.weight', 'decoder.block.1.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.0.SelfAttention.q.weight', 'decoder.block.1.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.1.EncDecAttention.k.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.0.layer.1.EncDecAttention.q.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.2.layer_norm.weight', 'decoder.block.3.layer.0.SelfAttention.k.weight', 'decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'decoder.block.0.layer.0.SelfAttention.o.weight', 'decoder.block.3.layer.1.layer_norm.weight', 'decoder.block.3.layer.0.SelfAttention.o.weight', 'decoder.block.2.layer.1.EncDecAttention.v.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.0.SelfAttention.o.weight', 'decoder.block.1.layer.1.EncDecAttention.q.weight', 'decoder.embed_tokens.weight', 'decoder.block.2.layer.1.EncDecAttention.q.weight', 'decoder.block.2.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.1.layer_norm.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.1.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.1.EncDecAttention.k.weight', 'decoder.block.3.layer.2.layer_norm.weight', 'decoder.block.0.layer.2.DenseReluDense.wo.weight', 'decoder.final_layer_norm.weight', 'decoder.block.0.layer.1.EncDecAttention.k.weight', 'decoder.block.0.layer.1.layer_norm.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.2.DenseReluDense.wo.weight', 'decoder.block.0.layer.0.layer_norm.weight', 'decoder.block.0.layer.0.SelfAttention.k.weight', 'decoder.block.2.layer.1.layer_norm.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.0.SelfAttention.v.weight', 'decoder.block.3.layer.1.EncDecAttention.v.weight', 'decoder.block.0.layer.1.EncDecAttention.v.weight', 'decoder.block.1.layer.0.layer_norm.weight', 'decoder.block.2.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.2.DenseReluDense.wo.weight', 'decoder.block.2.layer.0.layer_norm.weight', 'decoder.block.3.layer.0.SelfAttention.q.weight', 'decoder.block.3.layer.0.layer_norm.weight', 'decoder.block.3.layer.1.EncDecAttention.o.weight', 'decoder.block.1.layer.1.EncDecAttention.o.weight', 'decoder.block.0.layer.1.EncDecAttention.o.weight', 'decoder.block.3.layer.0.SelfAttention.v.weight', 'decoder.block.1.layer.2.DenseReluDense.wo.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.0.SelfAttention.v.weight', 'decoder.block.2.layer.0.SelfAttention.k.weight']
2022-01-23 01:26:09
- This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
2022-01-23 01:26:09
- This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2022-01-23 01:49:21
100%|██████████| 1648/1648 [23:11<00:00,  1.18it/s, Epoch=1, LR=2.09e-5, Train_Loss=3.79]
2022-01-23 02:15:59
100%|██████████| 1648/1648 [23:11<00:00,  1.18it/s, Epoch=2, LR=3.64e-5, Train_Loss=2.63]
2022-01-23 01:52:46
Validation Loss Improved (inf ---> 3.2825545289858495)
2022-01-23 01:52:48
Model Saved
2022-01-23 01:52:49
  0%|          | 2/1648 [00:01<23:10,  1.18it/s, Epoch=2, LR=2.14e-5, Train_Loss=3.64]
2022-01-23 02:42:30
100%|██████████| 1648/1648 [23:00<00:00,  1.19it/s, Epoch=3, LR=9.7e-5, Train_Loss=2.08]
2022-01-23 02:19:26
Validation Loss Improved (3.2825545289858495 ---> 2.117788508331049)
2022-01-23 02:19:28
Model Saved
2022-01-23 02:19:30
  0%|          | 2/1648 [00:01<22:57,  1.19it/s, Epoch=3, LR=3.58e-5, Train_Loss=2.47]
2022-01-23 02:45:52
100%|██████████| 413/413 [03:23<00:00,  2.03it/s, Epoch=3, LR=9.7e-5, Valid_Loss=2]
2022-01-23 02:45:54
Validation Loss Improved (2.117788508331049 ---> 1.997600270214318)
2022-01-23 02:45:54
Model Saved
2022-01-23 02:45:54
Training complete in 1h 19m 46s
2022-01-23 02:45:54
Best Loss: 1.9976