fastText Nanobot for the Transformer Age
This article explains how to integrate fastText with Weights & Biases to visualize incredibly efficient natural language processing (NLP).
Created on February 7|Last edited on November 3
Comment
In this quick project, I add Weights & Biases logging to Facebook Research's fastText library for text representation and classification. It's helpful for tasks like word embeddings and sentiment analysis, especially in a resource-constrained environment.
I follow the text classification tutorial to train a supervised classifier to recognize the main topics of cooking questions on a discussion forum, e.g. "baking", "equipment" (kitchen implements/tools), "substitutions" (in recipes).
Note: this is a brief experimental integration. fastText runs on CPU only, uses C/C++, and is not really a deep learning model. However, it works astonishingly fast, achieves impressive results, and yields compact models. In an era of gargantuan Transformer-style models, it is refreshing to train a model in seconds and consider only the explainable fundamentals of language understanding.
Table of Contents
Optimal Learning Rate 0.5-0.6Experiments varying learning rateHighest Influence: N-gram SizeWord Embedding Vector Dimension and N-gram SizeFuture Directions

Optimal Learning Rate 0.5-0.6
Optimal learning rate appears to be around 0.5-0.6 (yellows). Both lower values for learning rate (in red) and higher values for learning rate (in blue) lead to higher model loss.
Experiments varying learning rate
Vary LR
8
Highest Influence: N-gram Size
Increasing N-gram size (number of consecutive words to treat as a feature) increases the loss. Changing the dimension of the word embedding vector has basically no effect—perhaps its influence would be more clear in a more complex domain.
Word Embedding Vector Dimension and N-gram Size
Word Dim (green)
3
Ngrams (violet)
3
Future Directions
Subjectively, the quality of the model doesn't change considerably, and loss is not the most meaningful metric. It might be interesting to construct a set of test questions that would help distinguish model quality, e.g. by using rare vocabulary or allowing for multiple interpretations by a reader (both counterproductive strategies for getting a question answered in a forum).
Other exciting possibilities:
- train on larger and more diverse datasets
- consider embedding multimodal data, like images from recipe sites
- run a hyperparameter sweep over all the available variables
- train an ensemble or stack the models (ah, that deep learning instinct).
Add a comment
Hi
Thanks for this post!
Can you please share the code? I couldn't find how to log the loss in Fasttext after each iteration.
thanks again
Reply
Iterate on AI agents and models faster. Try Weights & Biases today.