Text classification with TensorFlow and W&B Weave
Learn how to systematically evaluate and optimize text classification models using TensorFlow and W&B Weave, ensuring robust, reliable insights from unstructured data.
Created on March 29|Last edited on February 28
Comment
Text data is everywhere, yet its unstructured format often complicates analysis and interpretation. Natural Language Processing (NLP), a subset of artificial intelligence, helps overcome this by enabling machines to understand and process vast amounts of textual information efficiently.
A central NLP task, text classification uses supervised machine learning methods to automatically categorize or predict the labels of unseen text documents. Businesses apply text classification to various use cases such as sentiment analysis, topic categorization, and spam detection, simplifying workflows and improving decision-making accuracy.

Source: Author
Table of contents
What is text classification?Challenges in text classificationExploring Model Architectures for Text ClassificationThe Power of Pre-trained ModelsMetrics for Evaluating Text Classification ModelsBuilding and Optimizing a Text Classification ModelConclusion
What is text classification?
Text classification is a pivotal task within the realm of Natural Language Processing (NLP), facilitating the automated analysis and categorization of textual data for a wide range of applications.
At its core, text classification involves the assignment of predefined categories or labels to text documents based on their content. This process enables machines to comprehend and make decisions based on textual information, akin to how humans classify and organize written content.

The significance of text classification is underscored by its versatility in addressing diverse business challenges. From sentiment analysis to topic categorization and spam detection, text classification serves as a foundational tool for extracting valuable insights from textual data.
Using machine learning techniques, text classification models learn from labeled training data, associating specific patterns in text with corresponding categories. Through this learning process, these models become adept at classifying unseen text documents with high accuracy, enabling automation and efficiency in various tasks.
Challenges in text classification

While text classification significantly automates and simplifies textual data analysis, several key challenges must be considered:
- Imbalanced data: Certain categories may be overrepresented, causing biases in model predictions. Effective balancing techniques are essential to achieve reliable accuracy across all classes.
- Contextual interpretation: Models often struggle to grasp subtle meanings such as ambiguity, sarcasm, or colloquial language. Advanced contextual understanding is necessary to ensure accurate classification.
- Large dataset management: The sheer size and complexity of text datasets present significant hurdles in terms of storage, processing, and computational resources. Efficient data management and scalable processing techniques are crucial for effectively handling large-scale data.
- Feature engineering complexity: Selecting and representing informative features from raw text data is vital yet challenging. Properly engineered features are critical for enhancing classification performance, particularly given the unstructured nature of text.
- Domain-specific issues: Specialized fields, like medicine or law, introduce unique terminologies and jargon, requiring targeted datasets and domain-specific knowledge to achieve accurate text classification.
Successfully addressing these challenges involves combining deep domain expertise, sophisticated machine learning methodologies, and meticulous data preprocessing. Overcoming these obstacles ensures robust text classification models, ultimately enabling insightful and accurate data-driven decisions.
Exploring Model Architectures for Text Classification
Various machine learning models are employed in text classification, each with unique architectures and capabilities. Here's an overview of some commonly used models:
Naive Bayes

Description
Naive Bayes is a probabilistic classifier based on Bayes' theorem with an assumption of independence among features. Despite its simplicity and "naive" assumption of feature independence, it often performs surprisingly well in text classification tasks.
Strengths
- Simplicity: Naive Bayes is straightforward to implement and understand, making it an excellent choice for quick prototyping.
- Efficiency: It's computationally efficient, requiring a relatively small amount of training data and memory compared to other models.
- Works well with high-dimensional data: Naive Bayes performs well even with a large number of features (words) in the dataset, making it suitable for text classification tasks.
Weaknesses
- Strong Independence Assumptions: The "naive" assumption of feature independence may not hold true in many real-world scenarios, leading to suboptimal performance.
- Limited Expressiveness: Due to its simplicity, Naive Bayes may struggle with capturing complex relationships and interactions between features.
Application
- Email Spam Detection: One of the classic applications of Naive Bayes is in email spam filtering, where it classifies incoming emails as either spam or non-spam based on the presence of certain keywords or features.
- Document Classification: Naive Bayes is also used for document classification tasks, such as categorizing news articles, academic papers, or customer reviews into predefined categories.
- Sentiment Analysis: In sentiment analysis, Naive Bayes can classify text data (e.g., product reviews, and social media posts) into positive, negative, or neutral sentiment categories.
Despite its simplifying assumptions, Naive Bayes can serve as a strong baseline model for text classification tasks, especially when dealing with limited training data or when computational resources are constrained.
Support Vector Machines (SVM)

Description
Support Vector Machines are a powerful supervised learning algorithm used for classification tasks. SVM aims to find the hyperplane that best separates the classes in the feature space. It works by mapping input data into a high-dimensional feature space and finding the optimal hyperplane that maximizes the margin between classes.
Strengths
- Effective in high-dimensional spaces: SVM works well in high-dimensional spaces like text data, making it suitable for text classification tasks where the feature space is often large.
- Robust to overfitting: SVMs are less prone to overfitting, especially in high-dimensional spaces, due to their ability to maximize the margin between classes.
- Works well with small datasets: SVM performs well even with relatively small datasets, making it suitable for scenarios where data availability is limited.
Weaknesses
- Computationally expensive: SVM can be computationally expensive, especially when dealing with large datasets, as it involves solving a quadratic optimization problem.
- Sensitive to parameter tuning: SVM performance is sensitive to the choice of parameters like the kernel function and regularization parameter, which may require careful tuning.
- Limited interpretability: The decision boundary produced by SVM may be difficult to interpret, especially in high-dimensional spaces.
Application
- Text Classification: SVM separates text data into classes based on features.
- Image Classification: SVM categorizes images into different classes using extracted features.
- Handwriting Recognition: SVM classifies handwritten characters based on extracted features.
Support Vector Machines offer a powerful and versatile approach to text classification, especially when dealing with high-dimensional feature spaces and limited training data. With careful parameter tuning and feature selection, SVMs can achieve high accuracy in various text classification tasks.
The Power of Pre-trained Models

Pre-trained models like BERT and GPT have revolutionized NLP by providing pretrained language representations that capture rich semantic and contextual information from text data. BERT, introduced by Google, and GPT, developed by OpenAI, offer powerful capabilities for tasks like text classification.
BERT is known for bidirectional language understanding and can be fine-tuned for text classification tasks with minimal modifications. Similarly, GPT, renowned for its generative abilities, can also be fine-tuned for classification tasks.
These pre-trained models enable transfer learning, allowing knowledge learned from large-scale pretraining tasks to be transferred and fine-tuned for downstream tasks with limited labeled data. They generate contextual embeddings that capture deep semantic and syntactic information, enhancing classification performance.
Applications of pre-trained models include text classification tasks like sentiment analysis, topic categorization, and spam detection, as well as other NLP tasks such as named entity recognition and question answering.
In summary, pre-trained models have transformed text classification by providing powerful tools for capturing contextual information and generating informative representations, enabling developers to build state-of-the-art models with minimal training data and superior performance across various NLP tasks.
Metrics for Evaluating Text Classification Models

Evaluating the performance of text classification models is essential for assessing their effectiveness in categorizing text data into predefined classes or categories. Various evaluation metrics provide insights into the model's accuracy, precision, recall, and overall performance, guiding developers in optimizing model parameters and improving classification outcomes.
Accuracy: Accuracy is a fundamental metric that measures the proportion of correctly classified instances out of the total instances in the dataset. It provides an overall assessment of the model's correctness in predicting class labels and is calculated as the ratio of correctly classified instances to the total number of instances.
Precision: Precision measures the proportion of true positive predictions (correctly classified instances) among all instances predicted as positive (including both true positives and false positives). It quantifies the model's ability to avoid false positive predictions and is calculated as the ratio of true positive predictions to the total number of positive predictions.
Recall (Sensitivity): Recall, also known as sensitivity or true positive rate, measures the proportion of true positive predictions among all instances belonging to the positive class in the dataset. It quantifies the model's ability to capture all positive instances and is calculated as the ratio of true positive predictions to the total number of actual positive instances.
F1 Score: The F1 score is the harmonic mean of precision and recall, providing a balanced measure of a model's performance that considers both false positives and false negatives. It balances the trade-off between precision and recall and is calculated as the harmonic mean of precision and recall, taking into account both the numerator and denominator of each metric.
Confusion Matrix: A confusion matrix is a tabular representation of the model's performance that summarizes the number of true positive, true negative, false positive, and false negative predictions. It provides insights into the model's classification errors and can be used to calculate various evaluation metrics, including accuracy, precision, recall, and F1 score.
Evaluation metrics play a crucial role in assessing the performance of text classification models and guiding model optimization and improvement efforts. They help developers gain insights into the model's strengths and weaknesses and make informed decisions to enhance its performance and robustness for real-world applications.
Building and Optimizing a Text Classification Model
We are performing text document classification on a dataset containing 2225 text documents categorized into five different categories: politics, sport, tech, entertainment, and business. The goal is to develop a text classification model using TensorFlow and W&B. It starts with data cleaning, preparation of the data, creating a CNN model, and integrating W&B for tracking and hyperparameter tuning. Finally, it conducts a hyperparameter tuning sweep to find the best model configuration.
1. Installing some libraries
!pip install -q nltk!pip install -q wandb
2. Importing Libraries
Import necessary libraries including pandas, numpy, seaborn, matplotlib.pyplot, nltk, and wandb.
import pandas as pdimport numpy as npimport seaborn as snsimport matplotlib.pyplot as pltimport nltkfrom nltk.corpus import stopwords
3. Loading and Exploring the Data
Load the dataset into a pandas DataFrame and explore its shape and unique labels.
df = pd.read_csv('df_file.csv')dfdf.shapedf['Label'].unique()df['Label'].value_counts()uniques = df['Label'].unique().for label in uniques:df_unique = df[df['Label'] == label]print(f"Label : {label} \n")print(f"First: {df_unique['Text'].values[0]} \n")print(f"Second: {df_unique['Text'].values[1]} \n")
4. Data Cleaning
Use NLTK for text cleaning, which involves converting text to lowercase, removing punctuation and special characters, tokenizing the text, removing stopwords, and lemmatizing the tokens.
nltk.download('stopwords')nltk.download('punkt')nltk.download('wordnet')stop_words = set(stopwords.words('english'))lemmatizer = WordNetLemmatizer()def clean_text(text):text = text.lower()text = re.sub(r'[^a-zA-Z\s]', '', text)tokens = word_tokenize(text)cleaned_tokens = [lemmatizer.lemmatize(token) for token in tokens if token not in stop_words]return cleaned_tokenstext_1 = df['Text'].values[0]print(f"Original text: \n {text_1}")cleaned_text = clean_text(text_1)print(f"Result after cleaning: \n {cleaned_text}")cleaned_data = []for text in df['Text']:cleaned_text = clean_text(text)cleaned_data.append(cleaned_text)print(cleaned_data[0])len(cleaned_data)
5. Converting to Sequences
Use TensorFlow to convert tokenized text data to sequences of integers (word indices) and pad sequences to ensure uniform length.
tokenizer = tf.keras.preprocessing.text.Tokenizer()tokenizer.fit_on_texts(cleaned_data)sequences = tokenizer.texts_to_sequences(cleaned_data)max_length = max(len(seq) for seq in sequences)padded_sequences = tf.keras.preprocessing.sequence.pad_sequences(sequences, maxlen=max_length)padded_sequences
6. Prepare the Labels
Convert numerical labels to one-hot encoding using TensorFlow.
tokenizer = tf.keras.preprocessing.text.Tokenizer()tokenizer.fit_on_texts(cleaned_data)sequences = tokenizer.texts_to_sequences(cleaned_data)max_length = max(len(seq) for seq in sequences)padded_sequences = tf.keras.preprocessing.sequence.pad_sequences(sequences, maxlen=max_length)padded_sequences
7. Creating our Model
Create a sequential model using TensorFlow, consisting of embedding layer, convolutional layers with max pooling, dense layers, and dropout layer for regularization.
model = tf.keras.Sequential([tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_length),tf.keras.layers.Conv1D(filters=256, kernel_size=5, activation='relu'),tf.keras.layers.MaxPooling1D(pool_size=2),tf.keras.layers.Conv1D(filters=128, kernel_size=3, activation='relu'),tf.keras.layers.GlobalMaxPooling1D(),tf.keras.layers.Dense(64, activation='relu'),tf.keras.layers.Dropout(0.5), # Add dropout layer for regularizationtf.keras.layers.Dense(num_classes, activation='softmax')])
8. Compile the Model
Compile the model with an optimizer, loss function, and evaluation metrics.
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])model.summary()num_epochs = 10batch_size = 64
9. Split Data for Training
Split the data into training and testing sets using train_test_split from scikit-learn.
from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(padded_sequences, one_hot_labels, test_size=0.2, random_state=42)
10. Initialize W&B
wandb.init(project="text-classification-project", config={"vocab_size": vocab_size,"embedding_dim": embedding_dim,"max_length": max_length,"num_classes": num_classes,"conv1_filters": 256,"conv1_kernel_size": 5,"pool1_size": 2,"conv2_filters": 128,"conv2_kernel_size": 3,"dense_units": 64,"dropout_rate": 0.5,"optimizer": "adam","loss_function": "categorical_crossentropy","epochs": num_epochs,"batch_size": batch_size})wandb.config.update({"optimizer": "adam","loss_function": "categorical_crossentropy"})
11. Train your Model
history = model.fit(X_train, y_train, epochs=num_epochs, batch_size=batch_size, validation_data=(X_test, y_test),callbacks=[wandb.keras.WandbCallback()])wandb.finish()

Source: Author
The graphs above show the results logged in wandb. The first graph shows that as the number of epochs increases the training loss decreases, helping us evaluate that the model can be fine-tuned for more epochs for improved accuracy. The second graph (val_accuracy) shows that as the number of epochs increases, the model converges to 1.
12. Define Sweep Configuration
Define the configuration for hyperparameter tuning sweep using Weights & Biases.
sweep_config = {'method': 'random', # Specify the search method'metric': {'name': 'val_accuracy', 'goal': 'maximize'}, # Metric to optimize'parameters': {'embedding_dim': {'values': [50, 100, 200]},'filters1': {'values': [64, 128, 256]},'kernel_size1': {'values': [3, 5, 7]},'filters2': {'values': [64, 128, 256]},'kernel_size2': {'values': [3, 5, 7]},'dense_units': {'values': [64, 128, 256]},'dropout_rate': {'values': [0.2, 0.3, 0.5]},"epochs": {"values": [1,2,3,5]},"learning_rate": {"distribution": "uniform","min": 0.0001,"max": 0.1},"batch_size": {"values": [8, 16, 32, 64, 128]}}}sweep_id = wandb.sweep(sweep_config, project="text-classification-project")from wandb.keras import WandbCallbackdef train():default_config = {"vocab_size": 27887,"embedding_dim": 100,"num_classes": 5,"conv1_filters": 256,"conv1_kernel_size": 5,"pool1_size": 2,"conv2_filters": 128,"conv2_kernel_size": 3,"dense_units": 64,"dropout_rate": 0.5,"optimizer": "adam","loss_function": "categorical_crossentropy","epochs": 10, # Adjust as needed"batch_size": 64}wandb.init(config=default_config)config = wandb.configmodel = tf.keras.Sequential([tf.keras.layers.Embedding(input_dim=config.vocab_size, output_dim=config.embedding_dim, input_length=max_length),tf.keras.layers.Conv1D(filters=config.conv1_filters, kernel_size=config.conv1_kernel_size, activation='relu'),tf.keras.layers.MaxPooling1D(pool_size=config.pool1_size),tf.keras.layers.Conv1D(filters=config.conv2_filters, kernel_size=config.conv2_kernel_size, activation='relu'),tf.keras.layers.GlobalMaxPooling1D(),tf.keras.layers.Dense(config.dense_units, activation='relu'),tf.keras.layers.Dropout(config.dropout_rate),tf.keras.layers.Dense(config.num_classes, activation='softmax')])model.compile(optimizer=config.optimizer, loss=config.loss_function, metrics=['accuracy'])X_train, X_test, y_train, y_test = train_test_split(padded_sequences, one_hot_labels, test_size=0.2, random_state=42)# Train the modelmodel.fit(X_train, y_train, epochs=config.epochs, batch_size=config.batch_size, validation_data=(X_test, y_test), callbacks=[WandbCallback()])wandb.agent(sweep_id, train, count=30)

Source: Author
From the observation derived from the model, with the help of the sweeping process, we arrived at a model that has the highest validation accuracy which is 97.3%.
Conclusion
Text classification leverages machine learning to automatically categorize text data, enabling businesses to extract insights from unstructured content. By employing techniques like Naive Bayes, SVMs, and pre-trained models, along with tools such as TensorFlow and W&B, developers can build accurate models for tasks like sentiment analysis and topic categorization. Despite challenges, the field of NLP continues advancing, offering sophisticated solutions to enhance text classification performance and provide valuable decision-making support across industries.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.