Prompt Engineering for LLMs: A Practical, Conceptual Guide

Exploring what actually works in prompt engineering
Created on September 2|Last edited on September 7
Comment
﻿
IntroductionWhat is prompt engineering? When I first heard of this field, it seemed a little funny. Engineering what you want to say to an AI model? Are you testing my social skills? 
I had never really given much thought to prompt engineering. It seemed as simple as learning how to Google search and that wasn't something AI researchers actively explored. What, then, is this all about? Well, the Large Language Models (LLMs) of today like ChatGPT, GPT-4, and the hundreds of competitor models don't just ingest your "search query" and make do with it. Like a computer vision model vulnerable to noise or adversarial attacks, slight changes to the input of an LLM does affect its outputs. 
So what's prompt engineering? Simply put:
Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs). 

— https://www.promptingguide.ai/﻿
An example prompt.
Notice how the prompt is a sentence complete with an instruction. What would happen if we, instead, gave ChatGPT the entire wiki page for retro style? Now that would certainly give the model's output a lot more context and explanation!
Table of ContentsIntroductionTable of ContentsMethodsSimple TricksAdvanced MethodsPrompt ProblemsResourcesReferences
﻿
﻿
MethodsGet ready: this is a pretty dense section. But don't worry! I'll have short breaks here and there where I spotlight a prompt engineering tool.  
As this is a practical guide, I'll highlight just the popular methods involved without much coverage of the underlying experiments and evaluation present in these method papers. 
Let's get started. In fact, let's start off with google searching. If you aren't already proficient with searching on Google, try this free course provided by freeCodeCamp. 
The main difference between searching on Google and querying ChatGPT is how you structure the query. 
On Google, it might be a couple words followed by a keyword. Say I want to understand more about dictionary unpacking in Python. I'd query the search engine like "python dict unpacking." I prefixed my query with "python" to narrow down to just this programming language and I specified "dict" instead of "dictionary" because they mean the same thing in Python. Finally, I typed in unpacking. The scope of every keyword from left to right narrows until I have precisely what I want.  
ChatGPT, of course, has a wealth of knowledge embedded in its weights, but it is not a search engine, nor is it connected to the internet. Though there are plugins and other frameworks and projects (especially within the open-source community!) that let these LLMs access the internet.
So, that being said, querying ChatGPT is very similar to Google Search, except our query is a full-fledged sentence and worded like a question or an instruction as if we were asking a librarian what a certain book is about: "Could you explain dictionary unpacking in Python?"
A cropped version of the output.
This prompt isn't bad. It's a complete sentence with clear instruction. To make it even better, we could start with a couple simple tricks before moving onto more complex ones. 
Before we begin, it's important to understand that a prompt consists of:
Instruction
Context
Input Data
Output Indicator
Often the instruction and input data are the same, but in some cases you might actually be passing in a lot of data (separate from context) and you must instruct the model to look through it. 
Simple TricksHave complete sentences and proper grammar! The model is trained on a lot of text data and a lot of it is written, edited, reviewed, and published. 
Make sure your instructions are clear. Not just "python unpacking", but "what is python unpacking?"
Instruction with too many steps? Split your instruction/question into multiple sub-steps.
If you're providing a complex instruction/question, tell the model the explain!
Be concise and simplify your prompt.
Use instructive keywords or phrase your query as a question.
Explain what dictionary unpacking is in python.
Try formatting your question. Instead of just "What is dictionary unpacking in python?" you can try:
Context: <insert context>
Question: What is dictionary unpacking in python?
Output indicator: <insert what you want the answer to look like> 
﻿
Answer: <FILL_IN>
Try out delimiters. These could be in XML, YAML, HTML, JSON, or even `` or "".
<user_input>
What is dictionary unpacking in python?
</user_input>
Answer:
If you really want/need to, generate multiple outputs from the same query and compare them to see if they all agree. You can try with the a similar query each time or with other models. I'll call this aggregating the outputs. 
﻿
Short intermission. Need help crafting prompts? Check out these prompt engineering tools!
﻿
﻿PromptSource, a python library (with a streamlit app available) for creating, sharing, and using prompts
﻿EveryPrompt, a playground for prompt engineering with a user interface
﻿Dust.tt, an AI assistant product for your company knowledge
﻿PromptTools, a python library for prompt engineering, evaluating LLMs, and leveraging vector databases﻿
﻿APE, a paper on automatic prompt engineering 
﻿
﻿
﻿Promptify, a lightweight prompt + LLM pipelining tool 
﻿
Advanced MethodsSimple tricks will get you far in most tasks. But asking it to do tasks that just can't be directly looked up will be a little more nuanced. This is where the LLM shines. It doesn't just regurgitate information like a search engine, but it has some form of intelligence. It can reason and understand. 
With that let's walk through some more complex methods. 
﻿Role/Persona assignment: not nearly as complex as some of these methods are, but very useful; providing a meta-prompt, (as I call it) describing the role/persona the model should take, before the actual prompt. 
﻿In-Context Learning (ICL): not exactly a method but an emerging paradigm; the premise of ICL is that you provide context or a couple examples of what you want in the prompt.
From source.
Here you have kkk﻿ demonstrations/examples of a review followed by a sentiment. Then, you pass in your actual prompt which consists of a review and the model has to fill in the sentiment. The idea is that the model gets a "feel" for what you're looking for and how the output should look.
﻿Chain of Thought (CoT): explaining with intermediate steps; be wary, the number of examples, the order, and the examples you choose all matter (ideally a handful of examples 2-4ish and organized randomly but class-balanced).
From source.
Essentially, you're not just providing a demonstration, but you're also providing an explanation or intermediate steps leading to that answer. This will encourage the model to provide intermediate reasoning steps before coming to a conclusion.
﻿Chain of Thought Self-Consistency (CoT-SC): exact same as CoT but you run it for kkk﻿ different tries, then you take the average of these typically via majority voting.
From source.
﻿Zero-shot CoT (LTSBS): Instead of providing an example with intermediate reasoning steps, just ask the question and tell the model: "Let's think step by step".
From source.
﻿Tree of Thoughts (ToT): builds a tree of intermediate thoughts where the tree is pruned to only follow the most promising nodes.
From source.
﻿Least-to-Most Prompting: (enforced causal CoT), automatically dividing a task into subtasks; solutions for subtask are provided as context for next subtask
From source.
﻿Selection-Inference: (enforced causal CoT), iteratively selects and infers upon useful context to arrive at an answer.
﻿SI with Faithful Reasoning: similar except primarily a unique halting component 
From source.
SI with faithful reasoning. From source.
﻿STaR (Self-Taught Reasoner): less so a prompt engineering method and more so a method to fine-tune a model without the need of labeling thousands of explanations; kind of like semi-supervised learning, the method prompts the model to generate rationales for question-answer pairs -> good rationales are used for future finetuning, and bad ones are revised till they are correct.
From source.
﻿Generate Knowledge Prompting: generate knowledge related to a question, then integrate the knowledge into the context of the question prompt and pass that through an LLM.
From source.
There are a couple other prompt engineering methods:
﻿Maieutic prompting: generate a maieutic tree of abductive and recursive explanations and frame as satisfiability problem (T/F)
﻿
﻿Automatic Prompt Engineering (APE): paper; inference LLM generates candidate instructions based on solution output; target model executes these instructions and best instruction is picked based on execution metrics 
﻿
﻿Active-Prompt: task-specific active learning to iteratively improve CoT prompt demonstrations.
﻿
﻿
We just covered a lot of different advanced techniques. Where can we start using them? Unfortunately, besides research paper code, there aren't many libraries (yet) that implement these techniques. Below, I'll include my top picks for great LLM frameworks for building said prompt techniques into your LLM pipeline. 
﻿LangChain, a comprehensive framework for LLM pipelines 
﻿LlamaIndex, a data framework for LLMs (not so much prompt-focused) 
﻿ludwig, a low-code framework for building custom LLMs
﻿ThoughtSource, a CoT library
﻿
Prompt ProblemsThe prompt is the interface to the model. As such, it can be vulnerable to adversarial attacks. I'll list some I've learned below.
Before I go through the list, it's important to understand an offensive attack and the types of injections.
delivery mechanism: prompt type used to deliver the payload (malicious output)
payload: the malicious output
indirect injection: a type of prompt injection that makes use of 3rd party data sources like web searches/API calls.
recursive injection: a type of prompt injection that hacks through multiple layers of the model evaluation.
code injection: special case of prompt injection delivers code as payload.
﻿
prompt injection: clever prompts to change the model’s behavior.
prompt leaking: prompt attacks designed to leak details from the prompt that are confidential.
jailbreaking: unethical instructions can be bypassed by clever prompting.
Do Anything Now (DAN): assign the DAN persona to the model followed by an instruction.
waluigi effect: After you train an LLM to satisfy a desirable property P, then it’s easier to elicit the chatbot into satisfying the exact opposite of property P.
GPT-4 Simulator: have the LLM simulate an autoregressive model that doesn’t have the guardrails the LLM has.
Game Simulator: simulate a game.
Obfuscation/Token Smuggling: replace words that would trigger filters with typos or synonyms.
Base64 Encoding: encoding an injection in base64.
Fill-in-the-blank attack
Payload Splitting: splitting the adversarial input into multiple parts.
fragmentation concatenation attack: payload is split into multiple parts and concatenated by LLM.
Defined Dictionary Attack: a form of prompt injection to evade sandwich defense.
And with any offense, there's defense!
adding defense in the instruction (warnings and disclaimers)
json formatted input and outputs
use quotations for inputs
adversarial prompt detectors (assign detector role to model)
filtering: check for words/phrases that should be blocked
instruction defense: add guardrail statements in the prompt
post-prompting: put user input before the prompt in the input string (can counter “ignore the above instruction”)
random sequence enclosure: enclose user input in between 2 random sequences of characters; the longer the more effective
sandwich defense: sandwich user input in between 2 prompts (the 2 prompts are the same but can be worded differently)
XML tagging: surround user input with XML tags like <user_input> and </user_input> or </user_input\>
Separate LLM Evaluation: use a separate LLM to evaluate the safety of the prompt
use a different model
fine-tuning
soft prompting
length restrictions
These 2 brief lists summarize types of offensive and defensive maneuvers one can make in prompting an LLM! Who knew there were so many ways to attack an LLM?
ResourcesHopefully you learned a thing or two as I certainly did. If not, don't worry. This section will be a list of resources for your future endeavors into prompt engineering!
﻿https://github.com/openai/openai-cookbook/tree/main﻿
﻿https://www.promptingguide.ai/papers#overviews﻿
﻿https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/﻿
﻿https://learn.deeplearning.ai/chatgpt-prompt-eng﻿
﻿https://huyenchip.com/2023/04/11/llm-engineering.html#prompt_optimization﻿
﻿https://learnprompting.org﻿
﻿https://platform.openai.com/examples﻿
Happy reading! And thank you for reading my short nutshell guide on prompt engineering. 
References﻿https://github.com/openai/openai-cookbook/tree/main﻿
﻿https://www.promptingguide.ai/papers#overviews﻿
﻿https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/﻿
﻿https://learn.deeplearning.ai/chatgpt-prompt-eng﻿
﻿https://huyenchip.com/2023/04/11/llm-engineering.html#prompt_optimization﻿
﻿https://learnprompting.org﻿
﻿https://platform.openai.com/examples﻿
﻿https://dan-ai.io/﻿
﻿https://github.com/0xk1h0/ChatGPT_DAN﻿
﻿https://github.com/LeaderbotX400/chatbot-experiments﻿
﻿https://www.make-safe-ai.com/is-bing-chat-safe/﻿
﻿https://youtu.be/BRiNw490Eq0?si=7z36CCUzXW8mHHMH﻿
﻿https://github.com/bigscience-workshop/promptsource﻿
﻿https://www.everyprompt.com/﻿
﻿https://dust.tt/﻿
﻿https://github.com/hegelai/prompttools/﻿
﻿https://www.pinecone.io/learn/vector-database/﻿
﻿https://github.com/keirp/automatic_prompt_engineer﻿
﻿https://medium.com/@mlblogging.k/understanding-in-context-learning-in-large-language-models-like-gpt3-gpt-j-gptneox-e0a71063a6db﻿
﻿https://www.vedereai.com/language-models-perform-reasoning-via-chain-of-thought/﻿
﻿https://arxiv.org/pdf/2305.10601.pdf﻿
﻿https://arxiv.org/pdf/2205.11916.pdf﻿
﻿https://arxiv.org/pdf/2205.10625.pdf﻿
﻿https://arxiv.org/pdf/2208.14271.pdf﻿
﻿https://arxiv.org/pdf/2110.08387.pdf﻿
﻿https://arxiv.org/pdf/2302.12246.pdf﻿
﻿https://arxiv.org/abs/2211.01910﻿
﻿https://arxiv.org/abs/2205.11822﻿
﻿https://arxiv.org/pdf/2203.14465.pdf﻿
﻿https://arxiv.org/pdf/2205.09712.pdf﻿
﻿https://arxiv.org/abs/2203.11171﻿
﻿https://arxiv.org/abs/2201.11903﻿
﻿https://ai.stanford.edu/blog/understanding-incontext/﻿
﻿https://learnprompting.org/docs/basics/roles﻿
﻿https://github.com/OpenBioLink/ThoughtSource﻿
﻿https://github.com/jerryjliu/llama_index﻿
﻿https://github.com/langchain-ai/langchain﻿
﻿https://github.com/ludwig-ai/ludwig﻿
﻿
Add a comment
Tags: Prompts, Articles, Intermediate, LLM, NLP, GenAI
Iterate on AI agents and models faster. Try Weights & Biases today.