Skip to main content

Assignment 2: Document Classification with Attention and Transformers

Created on June 10|Last edited on July 22

To get a deep understanding and better insight from the figures are all interactive. To make the results stand out hover over the lines in the plot.
💡

Task A: Document Classification with Attention


05001k1.5k2kStep0.511.522.5
falsetrueuse_rnnfalsetrueuse_dot_attention0.300.320.340.360.380.400.420.440.460.480.500.520.540.56loss0.5940.5960.5980.6000.6020.6040.6060.6080.6100.6120.6140.6160.6180.620Accuracy0.6300.6320.6340.6360.6380.6400.6420.6440.6460.648Final Accuracy
Run set
4


Reporting and discussion

As you can see above "Accuracy" in the table and in the figure is on validation set and "Final Accuracy" is on test set. All variants were better than baseline. The best "Final Accuracy" (0.6472) we got with not using RNN and dot attention. There was not much difference in all of them.



Task B: Document Classification with Transformer



Run set
10


Reporting and discussion

With Transformer the very worst were those with less layers e.g 2, 3 and 4. One could see that the model hardly learns anything and is underfitted. But even with 16 heads, the values were no better than baseline model. Transformer Encoder delivers very good results with 0.6378. Very close in front is the model with 4 head and 1 layer with 0.6389.



Task C: Document Classification with BERT


Run set
1


Reporting and discussion

BERT Model was the best of all with 0.6711 on test set, but it must also be mentioned that it takes a long time to train the model compared to the others. For pre-trained model we used "distilroberta-base" (https://huggingface.co/distilroberta-base). The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters. There was no improvement for 4 epochs, therefore it early stopped.