In this quick project, I add W&B logging to Facebook Research's fastText library for text representation and classification. It's helpful for tasks like word embeddings and sentiment analysis, especially in a resource-constrained environment.
I follow the text classification tutorial to train a supervised classifier to recognize the main topics of cooking questions on a discussion forum, e.g. "baking", "equipment" (kitchen implements/tools), "substitutions" (in recipes).
Note: this is a brief experimental integration: FastText runs on CPU only, uses C/C++, and is not really a deep learning model. However, it works astonishingly fast, achieves impressive results, and yields compact models. In an era of gargantuan Transformer-style models, it is refreshing to train a model in seconds and consider only the explainable fundamentals of language understanding.
Optimal learning rate appears to be around 0.5-0.6 (yellows). Both lower values for learning rate (in red) and higher values for learning rate (in blue) lead to higher model loss.
Increasing N gram size (number of consecutive words to treat as a feature) increases the loss. Changing the dimension of the word embedding vector has basically no effect—perhaps its influence would be more clear in a more complex domain.
Subjectively, the quality of the model doesn't change considerably, and loss is not the most meaningful metric. It might be interesting to construct a set of test questions that would help distinguish model quality, e.g. by using rare vocabulary or allowing for multiple interpretations by a reader (both counterproductive strategies for getting a question answered in a forum).
Other exciting possibilities: