Microsoft Introduces New Method For Prompt Engineering
Prompts are all you need?
Created on March 20|Last edited on March 20
Comment
Prompt engineering is currently a very important aspect of any application looking to utilize LLM’s, as small variations in the prompt can dramatically change the output of the model, and can make or break a user experience. Microsoft recently has published some work they call “Universal Prompt Retrieval for Improving Zero-Shot Evaluation,” (UPRISE). Essentially, the problem they aim to tackle is that of efficiently determining prompts to use for various models and tasks. The issue is that simply fine-tuning a LLM on existing prompts can be very costly, as not only do the models contain massive amounts of parameters, but they also are evolving extremely quickly, which means that new models will eventually need to be fine tuned again. In addition, this fine tuning often does not improve performance for other tasks outside of the tasks that were fine-tuned.
Automating Prompt Selection Efficiently
In order to solve this issue, the researchers were able to design a system which is able to efficiently learn to select existing prompts that will perform well for a given task, and they were able to accomplish this using a single 2.7 billion parameter LLM, which is much smaller than many other LLM’s like GPT-3. Before diving into more technical details, here is an image describing the training and inference process.


As can be seen, the training process involves a GPT-Neo LLM (left), along with a bi-encoder model (right) , which is the retrieval model. The bi-encoder takes in both an input for a specific task, along with a sampled prompt. The sampled prompt may be either be the correct prompt or the incorrect prompt, and the true label (whether or not the prompt is adequate) is determined by the GPT-Neo Model. So essentially the prompts are sampled, tested, and the results of these tests are used to update the retriever model. This retriever model is trained for a given dataset containing a diverse set of tasks and prompts, and this model ultimately learns what good and bad prompts look like. At inference time (far right), the retrieval model samples many prompts, and the prompts that seem best to the retrieval model are ultimately used for feeding into the LLM to obtain an answer.
The Future for Prompt Engineering
The technical details are somewhat complex for this system, and I would recommend diving into the paper if you are interested in a more granular explanation. Overall, this approach has many advantages over regular fine tuning. For one, it allows for cross model prompt retrieval. This means that the retrieval model can easily be used with other LLM’s like LLAMA or GPT-4, without the need for fine tuning the foundation model. In addition, this method shows strong performance for cross-task retrieval, meaning that the method generalizes well to different tasks, something that previous methods struggle with. The authors also note that this method also mitigates many of the hallucination issues with ChatGPT, which currently is a major issue for the product. Overall, prompt engineering seems to be an emerging field, as it offers a more cost and time efficient way to improve existing LLM performance. Whether or not these methods will be relevant in 5 years is uncertain, however, they are definitely proving to be valuable for our current LLM’s, and it will be exciting to see even more advanced prompt engineering methods arise in the future!
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.