Skip to main content

Assignment 1: Document Classification with word embeddings, CNN, and LSTM

Created on April 18|Last edited on July 22

To get a deep understanding and better insight from the figures are all interactive. To make the results stand out hover over the lines in the plot.
💡

Report Classification Average Model


ClassificationAverageModel0.000.100.200.300.400.500.60
02004006008001k1.2kStep0.10.20.30.40.5
4006008001k1.2kStep0.5980.5990.60.601
Run set
1


Reporting and discussion

As you can see above "Accuracy" in the table and in the figure is on validation set and "Final Accuracy" is on test set. Just with the average model we are getting 0.6 on the test set. There was also improvement for 4 epochs so it early stopped.



Report CNNS


Run set
10


Reporting and discussion

The CNN with 0.5 dropout, kernels size 256 and not using the random embeddings got us the best results of 0.65. In the top right plot one can see which parameter have provided the best results. Not using dropout or or very low (0.1) delivers poor results. In the plot one can also see that is because the model overfits a lot. Where the kernels size is small (e.g 16 or 32) or large (e.g 512) the results were not so good. With the kernels size 1024 the loss drops very fast and with few epochs you can see very good results.



Report LSTM


Run set
11


Reporting and discussion

Unfortunately, you can't see the results of "Layer_4" here but they are visible in the notebook. With LSTM not using the dropout is giving the best results overall. Also the increase the number of layers does not increase the final accuracy. The best results we get from the model which uses Bidirectional LSTM, only one layer and no random embeddings. In the experiments, random embeddingd were used once and it did not give good results as you can see in the figure.