Classifying Tweets with Weights & Biases
Classifying tweets with W&B
Created on September 18|Last edited on September 18
Comment
A common NLP task is to classify text. The most common text classification is done in sentiment analysis, where texts are classified as positive or negative. In this project, we will consider a slightly harder problem, classifying whether a tweet is about an actual disaster happening or not.
Not all tweets that contain words associated with disasters are actually about disasters. A tweet such as, "California forests on fire near San Francisco" is a tweet that should be taken into consideration, whereas "California this weekend was on fire, good times in San Francisco" can safely be ignored.
The goal of the task here is to build a classifier that separates the tweets that relate to real disasters from irrelevant tweets. The dataset that we are using consists of hand-labeled tweets that were obtained by searching Twitter for words common to disaster tweets.
Note: You can find the accompanying code in this Colab Notebook. We highly encourage you to fork it, tweak the parameters, or try the model with your own dataset!
Setup
Start out by installing the experiment tracking library and setting up your free W&B account:
- pip install wandb – Install the W&B library
- import wandb – Import the wandb library
- from wandb.keras import WandbCallback – Import the wandb keras callback
The dataset is called “Disasters on Social Media”, which is gathered from Figure Eight. Contributors looked at over 10,000 tweets culled with a variety of searches like “ablaze”, “quarantine”, and “pandemonium”, then noted whether the tweet referred to a disaster event (as opposed to a joke with the word or a movie review or something non-disastrous).

Prepare The Target
There are several possible prediction targets in this dataset. In our case, humans were asked to rate a tweet, and, they were given three options, Relevant, Not Relevant, and Can't Decide, as the text below shows:
# Remove the category "Can't Decide"df = df[df.choose_one != "Can't Decide"]
# Keep only the 2 columns below as we only want to map text to relevancedf = df[['text','choose_one’]]# Convert the target into binary numbersdf['relevant'] = df.choose_one.map({'Relevant':1,'Not Relevant':0})
Lemmatization
A lemma (in the field of linguistics) is the word under which the set of related words or forms appears in a dictionary. For example, "was" and "is" appear under "be," "mice" appears under "mouse," and so on. Quite often, the specific form of a word does not matter very much, so it can be a good idea to convert all your text into its lemma form.
import spacynlp = spacy.load('en',disable=['tagger','parser','ner’])# Loop over the words in the 'text' column# Save the lemma of the word in a new 'lemmas' columndf['lemmas'] = df['text'].apply(lambda row: [w.lemma_ for w in nlp(row)])# Turn the lists in 'lemmas' back to textdf['joint_lemmas'] = df['lemmas'].apply(lambda row: ' '.join(row))
Here is the new data frame:

Word Embeddings
The order of words in a text matters. Therefore, we can expect higher performance if we do not just look at texts in aggregate but see them as a sequence.
Embeddings work like a lookup table. For each token, they store a vector. When the token is given to the embedding layer, it returns the vector for that token and passes it through the neural network. As the network trains, the embeddings get optimized as well.
Remember that neural networks work by calculating the derivative of the loss function with respect to the parameters (weights) of the model. Through backpropagation, we can also calculate the derivative of the loss function with respect to the input of the model. Thus we can optimize the embeddings to deliver ideal inputs that help our model.
Before we start with training word embeddings, we need to do some pre-processing steps. In particular, we need to assign each word token a number and create a NumPy array full of sequences.
# The Tokenizer class allows us to specify how many words to considerfrom keras.preprocessing.text import Tokenizermax_words = 7000 # We will only consider the 7K most used words in this dataset# Create a new Tokenizer objecttokenizer = Tokenizer(num_words=max_words)# Generate tokens by counting frequencytokenizer.fit_on_texts(df['joint_lemmas'])# Transform the text into tokenized sequencessequences = tokenizer.texts_to_sequences(df['joint_lemmas'])# Look up the mappings of words to numbers from the tokenizer word indexword_index = tokenizer.word_index
Next, we need to turn our sequences into sequences of equal length. This is not always necessary, as some model types can deal with sequences of different lengths, but it usually makes sense and is often required.
# Use Keras' pad_sequences to bring all of the sequences to the same lengthfrom keras.preprocessing.sequence import pad_sequences# Make all sequences 140 words long (max length of tweets)maxlen = 140data = pad_sequences(sequences, maxlen=maxlen)# Split data into training and test setsfrom sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(data, df['relevant'],test_size = 0.3, shuffle=True, random_state = 1024)
Below I add Weights and Biases to track my model performance:
- wandb.init() – Initialize a new W&B run. Each run is single execution of the training script.
- wandb.config – Save all your hyperparameters in a config object. This lets you use W&B app to sort and compare your runs by hyperparameter values.
# Initilize a new wandb runwandb.init(entity="khanhnamle1994", project="tweet-classification")# Default values for hyper-parametersconfig = wandb.config # Config holds and saves hyperparameters and inputsconfig.epochs = 10 # Number of epochsconfig.batch_size = 32 # Batch sizeconfig.embedding_dim = 70 # Dimension of the embedding layerconfig.activation = 'sigmoid' # Activation functionconfig.optimizer = 'adam' # Optimization technique
Feedforward Neural Network
Let's train our word vectors. To use embeddings, we have to specify how large we want the word vectors to be. The 70-dimensional vector that we have chosen to use is able to capture good embeddings even for quite large vocabularies. Additionally, we also have to specify how many words we want embeddings for and how long our sequences are.
from keras.models import Sequentialfrom keras.layers import Embedding, Flatten, Denseembedding_dim = config.embedding_dim# Create the Modelmodel = Sequential()model.add(Embedding(max_words, embedding_dim, input_length=maxlen))model.add(Flatten())model.add(Dense(1, activation=config.activation))# Display model architecturemodel.summary()

The embedding layer has 70 parameters for 70,000 words equaling 490,000 parameters in total. This might possibly lead to overfitting. The next step is to compile and train our model.
# Compile the modelmodel.compile(optimizer=config.optimizer,loss='binary_crossentropy',metrics=['acc’])# Fit and train the modelhistory = model.fit(X_train, y_train,epochs=config.epochs,batch_size=config.batch_size,validation_data=(X_test, y_test),callbacks=[WandbCallback()])
The model achieves about 78% accuracy on the test set, but over 98% accuracy on the training set. The large number of parameters in the custom embeddings has led to overfitting.
Long Short Term Memory Network
Text is a time series. Different words follow each other and the order in which they do matters. Therefore, every neural network-based technique for time series problems can also be used for NLP. Below I used the Long Short Term Memory model, which can not only process single data points but also entire sequences of data. They were developed to deal with the exploding and vanishing gradient problems that can be encountered when training traditional Recurrent Neural Networks.
from keras.layers import LSTMembedding_dim = config.embedding_dim# Create another model and replace Flatten 'layer' with 'LSTM' layermodel_lstm = Sequential()model_lstm.add(Embedding(max_words, embedding_dim, input_length=maxlen))model_lstm.add(LSTM(32))model_lstm.add(Dense(1, activation=config.activation))model_lstm.summary()

model_lstm.compile(optimizer=config.optimizer,loss='binary_crossentropy',metrics=['acc’])history = model_lstm.fit(X_train, y_train,epochs=config.epochs,batch_size=config.batch_size,validation_data=(X_test, y_test),callbacks=[WandbCallback()])
The model achieves about 77% accuracy on the test set, but over 97% accuracy on the training set. Not much better than the previous model.
Bidirectional Recurrent Neural Network
Next, I used the Bidirectional Recurrent Neural Networks, which splits the neurons of a regular RNN into two directions, one for positive time direction (forward states), and another for negative time direction (backward states). In this generative model, the output layer can get information from past (backwards) and future (forward) states simultaneously.
from keras.layers import Bidirectionalembedding_dim = config.embedding_dim# Create another model and wrap Bidirectional layer around LSTM layermodel_birnn = Sequential()model_birnn.add(Embedding(max_words, embedding_dim, input_length=maxlen))model_birnn.add(Bidirectional(LSTM(64,return_sequences=True)))model_birnn.add(Bidirectional(LSTM(32)))model_birnn.add(Dense(1, activation=config.activation))model_birnn.summary()

model_birnn.compile(optimizer=config.optimizer,loss='binary_crossentropy',metrics=['acc’])history = model_birnn.fit(X_train, y_train,epochs=config.epochs,batch_size=config.batch_size,validation_data=(X_test, y_test),callbacks=[WandbCallback()])
The model achieves about 76% accuracy on the test set, but over 97% accuracy on the training set. Seems like we hits diminishing returns.
Comparison
Let’s have a comparison on the performance between these models in Weights and Biases. In the images below:
- The run Vanilla-Feedforward-NN is the Vanilla Feedforward Neural Network model.
- The run LSTM is the Long Short Term Memory Network model.
- The run Bidirectional-RNN is the Bidirectional Recurrent Neural Network model.

As seen above, the feedforward model has the highest accuracy on the training set, followed by the Bidirectional RNN and the LSTM.

The results on the test set shows that the Feedforward model still has the highest accuracy. The LSTM model does better than the Bidirectional model this time.
Project Overview
- Check out the project page to see your results in the shared project.
- Press 'option+space' to expand the runs table, comparing all the results from everyone who has tried this script.
- Click on the name of a run to dive in deeper to that single run on its own run page.

Visualize Performance
Click through to a single run to see more details about that run. For example, on this run page you can see the performance metrics I logged when I ran this script.

Review Code
The overview tab picks up a link to the code. In this case, it's a link to the Google Colab. If you're running a script from a git repo, we'll pick up the SHA of the latest git commit and give you a link to that version of the code in your own GitHub repo.

Visualize System Metrics
The System tab on the runs page lets you visualize how resource efficient your model was. It lets you monitor the GPU, memory, CPU, disk, and network usage in one spot.

Here are some more resources that you can use to learn about W&B:
- Documentation - Python docs
- Gallery - example reports in W&B
- Articles - blog posts and tutorials
- Community - join our Slack community forum
Add a comment
Tags: Articles, Classification
Iterate on AI agents and models faster. Try Weights & Biases today.