Predicting Semantic Similarity using KerasNLP
This report demonstrates how to use the KerasNLP to predict semantic similarity.
Created on September 2|Last edited on November 13
Comment
Semantic similarity refers to the task of determining the degree of similarity between two sentences in terms of their meaning. This example uses SNLI (Stanford Natural Language Inference) corpus to predict sentence semantic similarity. We will learn how to use KerasNLP library, an extension of the core Keras API, for the same task. Furthermore, we will discover how KerasNLP effectively reduces boilerplate code and simplifies the process of building and utilizing models. For more information on KerasNLP, please refer to KerasNLP's official documentation.
💡
Getting Started with KerasNLPOverview of the SNLI DatasetPreprocessingEstablishing a Semantic Similarity Baseline with BERTExperimenting with Learning RatesDeploying Learning Rate SchedulerChoosing the Right Hyperparameter for the Model with Sweeps!Conclusion
Getting Started with KerasNLP
Let's first install keras-nlp
!pip install -q keras-nlp
Next up, let's import all the necessary libraries and set backend framework for Keras. (KerasNLP supports new KerasCore out-of-the-box).
import osos.environ["KERAS_BACKEND"] = "jax" # or "tensorflow" or "torch"import numpy as npimport tensorflow as tfimport keras_core as kerasimport keras_nlpimport tensorflow_datasets as tfdsimport wandbfrom wandb.keras import WandbMetricsLoggerPROJECT_NAME = "semantic-similarity-with-keras-nlp"
Overview of the SNLI Dataset
To load the SNLI dataset, we use the tensorflow-datasets library, which contains over 550,000 samples in total. However, to ensure that this example runs quickly, we use only 20% of the training samples. Every sample in the dataset contains three components:
- hypothesis,
- premise,
- and label
Each represents the original caption provided to the author of the pair, while the hypothesis refers to the hypothesis caption created by the author of the pair. The label is assigned by annotators to indicate the similarity between the two sentences.
The SNLI dataset contains three possible similarity label values:
- Contradiction represents completely dissimilar sentences,
- Entailment denotes similar meaning sentences.
- Neutral refers to sentences where no clear similarity or dissimilarity can be established between them.
snli_train = tfds.load("snli", split="train[:20%]")snli_val = tfds.load("snli", split="validation")snli_test = tfds.load("snli", split="test")
Preprocessing
In our dataset, we have identified that some samples have missing or incorrectly labeled data, which is denoted by a value of -1. To ensure the accuracy and reliability of our model, we simply filter out these samples from our dataset.
def filter_labels(sample):return sample["label"] >= 0def split_labels(sample):x = (sample["hypothesis"], sample["premise"])y = sample["label"]return x, ytrain_ds = (snli_train.filter(filter_labels).map(split_labels, num_parallel_calls=tf.data.AUTOTUNE))val_ds = (snli_val.filter(filter_labels).map(split_labels, num_parallel_calls=tf.data.AUTOTUNE))test_ds = (snli_test.filter(filter_labels).map(split_labels, num_parallel_calls=tf.data.AUTOTUNE))def get_batched_dataset(batch_size):train_set = train_ds.batch(batch_size)val_set = val_ds.batch(batch_size)test_set = test_ds.batch(batch_size)return train_set, val_set, test_set
Establishing a Semantic Similarity Baseline with BERT
We use the BERT model from KerasNLP to establish a baseline for our semantic similarity task. The keras_nlp.models.BertClassifier class attaches a classification head to the BERT Backbone, mapping the backbone outputs to a logit output suitable for a classification task. This significantly reduces the need for custom code.
KerasNLP models have built-in tokenization capabilities that handle tokenization by default based on the selected model. However, users can also use custom preprocessing techniques as per their specific needs. If we pass a tuple as input, the model will tokenize all the strings and concatenate them with a "[SEP]" separator. We use this model with pretrained weights, and we can use the from_preset() method to use our own preprocessor. For the SNLI dataset, we set num_classes to 3.
KerasNLP task models come with compilation defaults. We can now train the model we just instantiated by calling the fit() method.
with wandb.init(project=PROJECT_NAME, name="baseline") as run:bert_classifier = keras_nlp.models.BertClassifier.from_preset("bert_tiny_en_uncased", num_classes=3)train_set, val_set, test_set = get_batched_dataset(512)bert_classifier.fit(train_set, validation_data=val_set, epochs=1, callbacks=[WandbMetricsLogger(log_freq="batch")])bert_classifier.evaluate(test_set, callbacks=[WandbMetricsLogger(log_freq="batch")])
Experimenting with Learning Rates
with wandb.init(project=PROJECT_NAME, name="change-lr-bs") as run:bert_classifier = keras_nlp.models.BertClassifier.from_preset("bert_tiny_en_uncased", num_classes=3)bert_classifier.compile(loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),optimizer=keras.optimizers.Adam(5e-5),metrics=[keras.metrics.SparseCategoricalAccuracy()])train_set, val_set, test_set = get_batched_dataset(512)bert_classifier.fit(train_set, validation_data=val_set, epochs=1, callbacks=[WandbMetricsLogger(log_freq="batch")])bert_classifier.evaluate(test_set, callbacks=[WandbMetricsLogger(log_freq="batch")])
Deploying Learning Rate Scheduler
class TriangularSchedule(keras.optimizers.schedules.LearningRateSchedule):"""Linear ramp up for `warmup` steps, then linear decay to zero at `total` steps."""def __init__(self, rate, warmup, total):self.rate = rateself.warmup = warmupself.total = totaldef get_config(self):config = {"rate": self.rate, "warmup": self.warmup, "total": self.total}return configdef __call__(self, step):step = keras.ops.cast(step, dtype="float32")rate = keras.ops.cast(self.rate, dtype="float32")warmup = keras.ops.cast(self.warmup, dtype="float32")total = keras.ops.cast(self.total, dtype="float32")warmup_rate = rate * step / self.warmupcooldown_rate = rate * (total - step) / (total - warmup)triangular_rate = keras.ops.minimum(warmup_rate, cooldown_rate)return keras.ops.maximum(triangular_rate, 0.0)
Now we will use the Traingular Learning Rate schedule we just defined.
with wandb.init(project=PROJECT_NAME, name="lr-schedule") as run:bert_classifier = keras_nlp.models.BertClassifier.from_preset("bert_tiny_en_uncased", num_classes=3)train_set, val_set, test_set = get_batched_dataset(512)# Get the total count of training batches.# This requires walking the dataset to filter all -1 labels.epochs = 3total_steps = sum(1 for _ in train_ds.as_numpy_iterator()) * epochswarmup_steps = int(total_steps * 0.2)bert_classifier.compile(loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),optimizer=keras.optimizers.AdamW(TriangularSchedule(1e-4, warmup_steps, total_steps)),metrics=[keras.metrics.SparseCategoricalAccuracy()],)bert_classifier.fit(train_set, validation_data=val_set, epochs=epochs, callbacks=[WandbMetricsLogger(log_freq="batch")])bert_classifier.evaluate(test_set, callbacks=[WandbMetricsLogger(log_freq="batch")])
Choosing the Right Hyperparameter for the Model with Sweeps!
Hyperparameter sweeps, enabling finding the best possible combinations of hyperparameter values for ML models for a specific dataset through orchestrating a Run and managing experiments for configuration of our training code.
Let's first define our sweep config.
import wandbsweep_config = {'project': PROJECT_NAME,'method': 'random','run_cap': 6,'metric': {'name': 'accuracy','goal': 'maximize'},'parameters': {'learning_rate': {'values': [5e-6, 2e-5, 5e-5, 1e-4]},'batch_size': {'values': [64, 128, 256, 512]}}}sweep_defaults = {'learning_rate': 5e-5,'batch_size': 512,}
def train():wandb.init(project=PROJECT_NAME, config=sweep_defaults)bert_classifier = keras_nlp.models.BertClassifier.from_preset("bert_tiny_en_uncased", num_classes=3)train_set, val_set, test_set = get_batched_dataset(wandb.config.batch_size)optimizer = keras.optimizers.AdamW(learning_rate = wandb.config.learning_rate,epsilon = 1e-8)bert_classifier.compile(loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),optimizer=optimizer,metrics=[keras.metrics.SparseCategoricalAccuracy()])bert_classifier.fit(train_set, validation_data=val_set, epochs=2,callbacks=[WandbMetricsLogger(log_freq="batch")])bert_classifier.evaluate(test_set, callbacks=[WandbMetricsLogger(log_freq="batch")])wandb.agent(sweep_id, function=train)
From the above visualization we can clearly see that batch-size of 128 and learning-rate of 0.0001 we achieve accuracy of 0.727. If we increase batch size to 256 while keeping the learning rate constant, accuracy falls down to 0.6235. In a different run, with batch-size of 64 and learning rate of 0.00002, we achieve accuracy of 0.674. If we keep learning rate same and increase batch size to 256, our accuracy falls down to 0.4161. Thus a general conclusion for this dataset and bert-tiny is to keep learning rate high and batch size low to get higher accuracy.
Conclusion
Throughout this tutorial, we demonstrated how to use a pretrained BERT model to establish a baseline and improve performance by trying different training settings. The KerasNLP toolbox provides a range of modular building blocks for preprocessing text, including pretrained state-of-the-art models and low-level Transformer Encoder layers. We believe that this makes experimenting with natural language solutions more accessible and efficient.
Add a comment
An overall suggestion is to make use of markdown panels inside panel grids.
Reply
NLI (Stanford Natural Language Inference) corpus Link the dataset.
1 reply
Panel
1. There seems to be an empty heading above this panel. Please remove it.
2. Tell us more about this sweep. How did it impact the result of the experiment? What insight can we gain from this sweep? 1 reply
'epochs':{ 'values':[1, 2] } This is probably not needed.
1 reply
Iterate on AI agents and models faster. Try Weights & Biases today.