Skip to main content

Swift or Shakespeare?

Text classification with SpaCy by an ML newbie
Created on April 18|Last edited on October 2

Context

Data

Gathered from Kaggle and processed with a simple Node script
    • 105155 lines
    • Includes only spoken player lines (e.g. lines like "SCENE I. London. The Palace" or "ENTER King Henry" are filtered out)
    • Many lines are cut awkwardly, for example this sentence is split into 3 distinct "lines":
      • Well, Hal, well, and in some sort it jumps with my
      • humour as well as waiting in the court, I can tell
      • you.
    • 2174 lines
    • 8358 lines (but pop music is repetitive, # of unique lines is lower)
    • Does not include newest album, Midnights (2022)
    • Much smaller data set compared to Shakespeare

Validation Strategy

Two strategies for testing the performance of the model:
  1. Randomize the data and split into 90% for training and 10% for validation
  2. Use all data for training and create a separate validation set from the BuzzFeed quizzes
The "random split" approach resulted in high accuracies across the board regardless of other variables. Meanwhile "quiz" mode had lower accuracies and more noticeable performance differences between runs. This makes "quiz" mode far more interesting so that's primarily what we'll be focused on going forward.




Architecture

I tried training the model with three architectures: bow (bag-of-words), ensemble, and simple_cnn. The metrics that were most affected by architecture were accuracy and run duration. The bar graphs below show that bow did fairly poorly at only 50-60% accuracy, but was significantly faster to train compared to the other two. Meanwhile, CNN is the most accurate but also the slowest.



Batch Size

The metrics that seem to have a strong correlation with batch size are loss and run duration. Smaller batch sizes appear to result in higher loss values and longer run durations. I was expecting to see a more obvious pattern with accuracy but there doesn't seem to be one.






Most Accurate Run: CNN 128

The most accurate run was simple_cnn with batch size of 128. The model got 27 out of 32 correct, which is roughly 84%! Not bad. How did you do in comparison?