Assignment 1: Document Classification with word embeddings, CNN, and LSTM

Created on April 18|Last edited on July 22
Comment
﻿
Report Classification Average ModelReporting and discussionReport CNNSReporting and discussionReport LSTMReporting and discussion
﻿
To get a deep understanding and better insight from the figures are all interactive. To make the results stand out hover over the lines in the plot.
💡
Report Classification Average Model﻿
Final Accuracy
Final Accuracy
ClassificationAverageModel0.000.100.200.300.400.500.60
loss
loss
02004006008001k1.2kStep0.10.20.30.40.5
ClassificationAverageModel
Accuracy
Accuracy
4006008001k1.2kStep0.5980.5990.60.601
ClassificationAverageModel
Run set1
Name1 visualized
ClassificationAverageModel
ClassificationAverageModel
loss
Accuracy
Final Accuracy
0.052765
0.59722
0.60833
1-1
of 1
﻿
Reporting and discussionAs you can see above "Accuracy" in the table and in the figure is on validation set and "Final Accuracy" is on test set. Just with the average model we are getting 0.6 on the test set. There was also improvement for 4 epochs so it early stopped.
﻿
Report CNNS﻿
Run set10
﻿
Reporting and discussionThe CNN with 0.5 dropout, kernels size 256 and not using the random embeddings got us the best results of 0.65. In the top right plot one can see which parameter have provided the best results. Not using dropout or or very low (0.1) delivers poor results. In the plot one can also see that is because the model overfits a lot. Where the kernels size is small (e.g 16 or 32) or large (e.g 512) the results were not so good. With the kernels size 1024 the loss drops very fast and with few epochs you can see very good results. 
﻿
Report LSTM﻿
Run set11
﻿
Reporting and discussionUnfortunately, you can't see the results of "Layer_4" here but they are visible in the notebook. With LSTM not using the dropout is giving the best results overall. Also the increase the number of layers does not increase the final accuracy. The best results we get from the model which uses Bidirectional LSTM, only one layer and no random embeddings. In the experiments, random embeddingd were used once and it did not give good results as you can see in the figure. 
﻿
Add a comment