Transformers Bi Language Model on Opus Books Dataset

Training bi language translation model on opus books dataset from conversion of English to French. Include different strategies to reduce the training time and the loss of the model. Training machine RTX 3090

Anant Gupta

Created on September 8|Last edited on September 8

Comment

﻿
Results Initially the model was taking around 20 minutes per epoch with a batch size of 32. 
Upon integrating dynamic padding, the training duration was halved to just 10 minutes for each cycle.
By implementing parameter sharing, the training time was further slashed to a quarter. 
The optimal loss was attained when combining AdamW + One Cycle Policy 3 phases + dynamic padding (dp) + parameter sharing (ps)
Section 1﻿
lr-Lion
lr-Lion
0200400600Step0.00020.00040.00060.0008
dp+ps+ocp+lion
lr-AdamW
lr-AdamW
0200400600Step0.00020.00040.00060.0008
dp+ps+ocp+adamw
lr-Adam
lr-Adam
0200400600Step0.00020.00040.00060.0008
dp+ps+ocp3phase
dp+ps+ocp
dynamic_padding+ps
dynamic_padding
Run set6
﻿
﻿
﻿
Run set6
﻿
﻿
﻿
Run set6
﻿
﻿

Add a comment