Assignment 2: Document Classification with Attention and Transformers
Created on June 10|Last edited on July 22
Comment
Task A: Document Classification with AttentionReporting and discussionTask B: Document Classification with TransformerReporting and discussionTask C: Document Classification with BERTReporting and discussion
To get a deep understanding and better insight from the figures are all interactive. To make the results stand out hover over the lines in the plot.
💡
Task A: Document Classification with Attention
Run set
4
Reporting and discussion
As you can see above "Accuracy" in the table and in the figure is on validation set and "Final Accuracy" is on test set. All variants were better than baseline. The best "Final Accuracy" (0.6472) we got with not using RNN and dot attention. There was not much difference in all of them.
Task B: Document Classification with Transformer
Run set
10
Reporting and discussion
With Transformer the very worst were those with less layers e.g 2, 3 and 4. One could see that the model hardly learns anything and is underfitted. But even with 16 heads, the values were no better than baseline model. Transformer Encoder delivers very good results with 0.6378. Very close in front is the model with 4 head and 1 layer with 0.6389.
Task C: Document Classification with BERT
Run set
1
Reporting and discussion
BERT Model was the best of all with 0.6711 on test set, but it must also be mentioned that it takes a long time to train the model compared to the others. For pre-trained model we used "distilroberta-base" (https://huggingface.co/distilroberta-base). The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters. There was no improvement for 4 epochs, therefore it early stopped.
Add a comment