Assignment 2: Document Classification with Attention and Transformers

Created on June 10|Last edited on July 22
Comment
﻿
Task A: Document Classification with AttentionReporting and discussionTask B: Document Classification with TransformerReporting and discussionTask C: Document Classification with BERTReporting and discussion
﻿
To get a deep understanding and better insight from the figures are all interactive. To make the results stand out hover over the lines in the plot.
💡
Task A: Document Classification with Attention﻿
Loss
Loss
05001k1.5k2kStep0.511.522.5
Baseline
MultiplicativeAttention
MultiplicativeAttentionWithRNN
NoRNN
falsetrueuse_rnnfalsetrueuse_dot_attention0.300.320.340.360.380.400.420.440.460.480.500.520.540.56loss0.5940.5960.5980.6000.6020.6040.6060.6080.6100.6120.6140.6160.6180.620Accuracy0.6300.6320.6340.6360.6380.6400.6420.6440.6460.648Final Accuracy
Run set4
﻿
Reporting and discussionAs you can see above "Accuracy" in the table and in the figure is on validation set and "Final Accuracy" is on test set. All variants were better than baseline. The best "Final Accuracy" (0.6472) we got with not using RNN and dot attention. There was not much difference in all of them. 
﻿
Task B: Document Classification with Transformer﻿
﻿
Run set10
﻿
Reporting and discussionWith Transformer the very worst were those with less layers e.g 2, 3 and 4. One could see that the model hardly learns anything and is underfitted. But even with 16 heads, the values were no better than baseline model.  Transformer Encoder delivers very good results with 0.6378. Very close in front is the model with 4 head and 1 layer with 0.6389. 
﻿
Task C: Document Classification with BERT﻿
Run set1
﻿
Reporting and discussionBERT Model was the best of all with 0.6711 on test set, but it must also be mentioned that it takes a long time to train the model compared to the others. For pre-trained model we used "distilroberta-base" (https://huggingface.co/distilroberta-base). The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters. There was no improvement for 4 epochs, therefore it early stopped.
﻿
﻿
Add a comment