Skip to main content

Transformers Bi Language Model on Opus Books Dataset

Training bi language translation model on opus books dataset from conversion of English to French. Include different strategies to reduce the training time and the loss of the model. Training machine RTX 3090
Created on September 8|Last edited on September 8

Results

  1. Initially the model was taking around 20 minutes per epoch with a batch size of 32.
  2. Upon integrating dynamic padding, the training duration was halved to just 10 minutes for each cycle.
  3. By implementing parameter sharing, the training time was further slashed to a quarter.
  4. The optimal loss was attained when combining AdamW + One Cycle Policy 3 phases + dynamic padding (dp) + parameter sharing (ps)

Section 1


0200400600Step0.00020.00040.00060.0008
0200400600Step0.00020.00040.00060.0008
0200400600Step0.00020.00040.00060.0008
Run set
6



Run set
6



Run set
6