A Brief Introduction to Prompt Tuning

This article aims to provide a brief overview of Prompt Tuning Method for Language Model Adaptation from the Google Research Lab along with code and interactive visualisations.
Saurav Maheshkar
Created on March 17|Last edited on June 28
Comment
Today, we're going to look at a recent paper from Google Research Lab: The Power of Scale for Parameter-Efficient Prompt Tuning﻿
Like other papers in the soft prompting domain—such as It’s Not Just Size That Matters Small Language Models Are Also Few-Shot Learners and (P-tuning) GPT Understands, Too)—the authors propose a new method for using prompting for language model adaptation. They note that prompt tuning is immensely productive when it comes to scale, i.e. as we scale the model parameters prompt tuning helps close the gap like some traditional fine-tuning methods. They note that prompt-tuning is a simplification of prefix-tuning.
﻿Link to Colab Notebook ⟶\longrightarrow⟶﻿﻿﻿
Table of ContentsMotivationPrompt-Tuning: The MethodPrompt Tuning: The CodeResultsConclusion
﻿
﻿
MotivationPrompting is the approach of adding extra information for the model to condition on during the generation process (every task is modeled as task generation, for instance classification is seen as predicting a series of tokens that represent class labels).
Discrete prompting involves manually choosing additional prompt tokens to be concatenated or prepended with the input i.e. a series of prompt tokens PPP﻿ are added to the input XXX﻿ and the model is then expected to model the probability 
Pθ(Y∣[P;X])\large P_{\theta}(Y | [P;X])Pθ​(Y∣[P;X])﻿
...while the model parameters θ\thetaθ﻿ are kept fixed. This then forces us to find prompts either via manual search of automatic methods. But other papers like P-Tuning also note that these methods often lead to unstable model performance.
Prompt-Tuning: The MethodTL;DR: The authors propose Prompt Tuning as a method for adapting language models by adding only kkk﻿ additional tuneable tokens per downstream task to be prepended to the input text for a pretrained model. Notably the authors demonstrate that language model capacity is a key ingredient for these approaches to succeed.
💡
Soft prompting methods like prompt-tuning on the other hand involve learning the tokens via the backpropagation. It's important to note however the prompts have their own set of parameters θP\theta_PθP​﻿ which are different from θ\thetaθ﻿. This enables us to reparameterise as:
Pθ;θP(Y∣[P;X])\large P_{\theta; \theta_P}(Y | [P;X])Pθ;θP​​(Y∣[P;X])﻿
...where backpropagation only changes θP\theta_PθP​﻿.
Given a series of nnn﻿ tokens, the T5 family of models embeds these tokens into embeddings in Xe∈Rn×eX_e \in \mathbb{R}^{n \times e}Xe​∈Rn×e﻿ where eee﻿ is the dimensionality of the embedding space. In the case of Prompt Tuning there are additional embeddings Pe∈Rp×eP_e \in \mathbb{R}^{p \times e}Pe​∈Rp×e﻿ where ppp﻿ is the length of the prompt.
Figure 1: Comparison between traditional Fine Tuning and Prompt Tuning. Source Figure 2 from the original paper.
Prompt Tuning: The Code﻿Link to Colab Notebook ⟶\longrightarrow⟶﻿﻿﻿
Similar to LoRA and P-Tuning for this article we'll use the  🤗/peft library. It works seamlessly with the transformers ecosystem, allowing for easy integration with various Trainer APIs as well. 
A Valid Configuration for Prompt Tuning in  🤗/peft is of the following form:
from peft import PromptTuningConfig
﻿
peft_config = PromptTuningConfig(
    task_type="SEQ_CLS",
    num_virtual_tokens=10,
)
﻿
## Any model can then be transformed into a Peft Model
from peft import get_peft_model
from transformers import AutoModelForSequenceClassification
﻿
model = AutoModelForSequenceClassification.from_pretrained(config.model, return_dict=True)
peft_model = get_peft_model(model, peft_config)
ResultsLet's look at one such training run showing the Accuracy and F1 Score on evaluating a roberta-base model on the GLUE benchmark using Prompt-Tuning as a adaptation method.
﻿
Run set1
﻿
﻿Link to Colab Notebook ⟶\longrightarrow⟶﻿﻿﻿
ConclusionIn this article, you read through a brief overview of Prompt-Tuning as a method for efficiently adapting pretrained language models and how we can use Weights & Biases to explore the training process and how that can lead to valuable insights.
To see the full suite of W&B features, please check out this short 5-minute guide. If you want more reports covering the math and "from-scratch" code implementations, let us know in the comments down below or on our forum ✨!﻿
Check out these other reports on Fully Connected covering other LLM-related topics like Audio Transformers and hyperparameter optimization.
Brief Introduction to P-Tuning
This articles aims to provide a brief overview of the paper "GPT Understands, Too", joint work from Tsinghua University and MIT that introduced P-Tuning as a way to efficiently tune Pretrained Language Models, along with code and interactive visualizations.
A Brief Introduction to LoRA
This article givens an overview of LoRA (Low-Rank Adaptation) of Large Language Models , using W&B for interactive visualizations. It includes code samples for you to follow.
What Are Intrinsic Dimensions? The Secret Behind LoRA
This article provides a brief overview of intrinsic dimensions and how they enable Low-Rank Domain Adaptation. We also provide code samples which use Weights & Biases for interactive visualizations. 
AdaLoRA: Adaptive Budget Allocation for LoRA
This article provides an overview of "Adaptive Budget Allocation for Parameter Efficient Fine-Tuning" using W&B for interactive visualizations. It includes code samples for you to follow!
﻿
﻿
Add a comment
Tags: Articles, Prompts, LLM, GenAI, NLP, Intermediate
Iterate on AI agents and models faster. Try Weights & Biases today.