FastText proof of concept

In this quick project, I add W&B logging to Facebook Research's fastText library for text representation and classification. It's helpful for tasks like word embeddings and sentiment analysis, especially in a resource-constrained environment.

I follow the text classification tutorial to train a supervised classifier to recognize the main topics of cooking questions on a discussion forum, e.g. "baking", "equipment" (kitchen implements/tools), "substitutions" (in recipes).

Note: this is a brief experimental integration: FastText runs on CPU only, uses C/C++, and is not really a deep learning model. However, it works astonishingly fast, achieves impressive results, and yields compact models. In an era of gargantuan Transformer-style models, it is refreshing to train a model in seconds and consider only the explainable fundamentals of language understanding.

Optimal Learning Rate 0.5-0.6

Optimal learning rate appears to be around 0.5-0.6 (yellows). Both lower values for learning rate (in red) and higher values for learning rate (in blue) lead to higher model loss.

Experiments varying learning rate

Experiments varying learning rate

Highest influence: ngram size

Increasing N gram size (number of consecutive words to treat as a feature) increases the loss. Changing the dimension of the word embedding vector has basically no effect—perhaps its influence would be more clear in a more complex domain.

Word embedding vector dimension and ngram size

Word embedding vector dimension and ngram size

Future directions

Subjectively, the quality of the model doesn't change considerably, and loss is not the most meaningful metric. It might be interesting to construct a set of test questions that would help distinguish model quality, e.g. by using rare vocabulary or allowing for multiple interpretations by a reader (both counterproductive strategies for getting a question answered in a forum).

Other exciting possibilities: