Skip to main content

A Brief Introduction to Prompt Tuning

This article aims to provide a brief overview of Prompt Tuning Method for Language Model Adaptation from the Google Research Lab along with code and interactive visualisations.
Created on March 17|Last edited on June 28
Today, we're going to look at a recent paper from Google Research Lab: The Power of Scale for Parameter-Efficient Prompt Tuning
Like other papers in the soft prompting domain—such as It’s Not Just Size That Matters Small Language Models Are Also Few-Shot Learners and (P-tuning) GPT Understands, Too)—the authors propose a new method for using prompting for language model adaptation. They note that prompt tuning is immensely productive when it comes to scale, i.e. as we scale the model parameters prompt tuning helps close the gap like some traditional fine-tuning methods. They note that prompt-tuning is a simplification of prefix-tuning.

Table of Contents





Motivation

Prompting is the approach of adding extra information for the model to condition on during the generation process (every task is modeled as task generation, for instance classification is seen as predicting a series of tokens that represent class labels).
Discrete prompting involves manually choosing additional prompt tokens to be concatenated or prepended with the input i.e. a series of prompt tokens PP are added to the input XX and the model is then expected to model the probability
Pθ(Y[P;X])\large P_{\theta}(Y | [P;X])

...while the model parameters θ\theta are kept fixed. This then forces us to find prompts either via manual search of automatic methods. But other papers like P-Tuning also note that these methods often lead to unstable model performance.

Prompt-Tuning: The Method

TL;DR: The authors propose Prompt Tuning as a method for adapting language models by adding only kk additional tuneable tokens per downstream task to be prepended to the input text for a pretrained model. Notably the authors demonstrate that language model capacity is a key ingredient for these approaches to succeed.
💡
Soft prompting methods like prompt-tuning on the other hand involve learning the tokens via the backpropagation. It's important to note however the prompts have their own set of parameters θP\theta_P which are different from θ\theta. This enables us to reparameterise as:
Pθ;θP(Y[P;X])\large P_{\theta; \theta_P}(Y | [P;X])

...where backpropagation only changes θP\theta_P.
Given a series of nn tokens, the T5 family of models embeds these tokens into embeddings in XeRn×eX_e \in \mathbb{R}^{n \times e} where ee is the dimensionality of the embedding space. In the case of Prompt Tuning there are additional embeddings PeRp×eP_e \in \mathbb{R}^{p \times e} where pp is the length of the prompt.
Figure 1: Comparison between traditional Fine Tuning and Prompt Tuning. Source Figure 2 from the original paper.

Prompt Tuning: The Code

Similar to LoRA and P-Tuning for this article we'll use the 🤗/peft library. It works seamlessly with the transformers ecosystem, allowing for easy integration with various Trainer APIs as well.
A Valid Configuration for Prompt Tuning in 🤗/peft is of the following form:
from peft import PromptTuningConfig

peft_config = PromptTuningConfig(
task_type="SEQ_CLS",
num_virtual_tokens=10,
)

## Any model can then be transformed into a Peft Model
from peft import get_peft_model
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(config.model, return_dict=True)
peft_model = get_peft_model(model, peft_config)

Results

Let's look at one such training run showing the Accuracy and F1 Score on evaluating a roberta-base model on the GLUE benchmark using Prompt-Tuning as a adaptation method.

Run set
1


Conclusion

In this article, you read through a brief overview of Prompt-Tuning as a method for efficiently adapting pretrained language models and how we can use Weights & Biases to explore the training process and how that can lead to valuable insights.
To see the full suite of W&B features, please check out this short 5-minute guide. If you want more reports covering the math and "from-scratch" code implementations, let us know in the comments down below or on our forum ✨!
Check out these other reports on Fully Connected covering other LLM-related topics like Audio Transformers and hyperparameter optimization.

Iterate on AI agents and models faster. Try Weights & Biases today.