Enhancing LLM Agent Performance through Dynamic Plugin Selection and W&B Prompts
Created on August 8|Last edited on March 1
Comment
Show me the Code!
Demo
A preview of what we'll be making today:
Run set
31
Introduction
Consider the challenges faced by a popular e-commerce platform that caters to a diverse range of industries.
With the goal of personalizing customer service across all service domains and for hundreds of thousands of customers, large language models (LLMs) are an attractive solution. Specifically, these models could power a chatbot capable of tailoring suggestions for each individual customer.
This e-commerce platform prefers to utilize an LLM agent as the core reasoning mechanism for their chatbot, contrasting with a more traditional 'chain' approach. With chains, the behavior is hardcoded or pre-scripted, characterized by specific prompts and responses that are then fed into other LLM calls with their distinct prompts and responses, resulting in a 'chain' of LLM calls resolving to a predetermined outcome.
Alternatively, LLM-based agents perpetually call the LLM until specified stopping criteria are satisfied. This affords considerable flexibility in defining an agent’s prompt, assuming the stopping criteria are appropriately formulated. This configuration allows the agent to 'think' through input queries, deconstructing the query into a series of LLM calls that gradually move the agent towards its stopping criteria. This 'thinking' process enables agents to utilize tools.
Here's an illustration of a standard agent template:
# Set up the base templatetemplate = f"""System Rules:{system_rules}Query Transform Rules (only do this for the initial input query!):{query_transform_rules}Task:Answer the following questions as best you can, but speaking as {speaker} might speak. You have access to the following tools:{{tools}}Use the following format:Question: the input question you must answerThought: you should always think about what to doAction: the action to take, should be one of [{{tool_names}}]Action Input: the input to the actionObservation: the result of the action... (this Thought/Action/Action Input/Observation can repeat N times)Thought: I now know the final answerFinal Answer: the final answer to the original input questionBegin!Remember to speak as {speaker} when giving your final answer.{speaker_rules}Question: {{input}}{{agent_scratchpad}}"""
The Zero-shot agent prompt template has been adjusted to facilitate testing of:
- Different system rules (contextualizing the industry/application focus of the agent)
- Different decomposition tactics for input queries (making it akin to one-shot)
- Different speakers (varying tone and restrictions)
Given the constraints of the knowledge base used as training data for the LLM, the e-commerce platform cannot directly utilize a stock LLM like OpenAI’s GPT-4 for their agent. The agent would lack many contemporary details essential for an adequate customer experience such as current store inventory, prevailing customer trends, and other external information necessary for personalizing customer interactions.
To equip our agents with access to these external details, we introduce tools, similar to plugins. These tools are Python functions with designated roles, which can include Google Search, Database lookup, Python REPL, among other API-like behaviors.
Here is an example of a tool:
def search_function(...):passtools = [Tool.from_function(func=search_function,name="Search",description="useful for when you need to answer questions about current events"# coroutine= ... <- you can specify an async method if desired as well),]
Consider a fashion retail arm that sells a broad spectrum of clothing and accessories via physical outlets and a strong online platform. The e-commerce platform determines that the chatbot required for this industry must contextually suggest clothing to users based on:
- Local weather conditions
- Current clothing inventory
- Currency conversion for price presentation
Each of these conditions can be viewed as a separate plugin that would be created and managed. The Agent is aware of these plugins through the human-readable descriptions we provide for each tool.
Note: plugins and tools are similar but they differ in structure as OpenAI uses a different configuration schema than Langchain.
💡
openapi: 3.0.1info:title: Weatherdescription: Allows users to fetch current and forecasted weather information based on location. You MUST ALWAYS convert the plugin response to the units that are most useful to your user, when in doubt assume USA/Enlish units.version: 'v1.2'servers:- url: https://weather--vicentescode.repl.copaths:/weathernow:get:operationId: getWeatherNowsummary: Get the current weather information based on city, state, and country. You MUST ALWAYS convert the plugin response to the units that are most useful to your user, when in doubt assume USA/Enlish units.parameters:- in: queryname: cityschema:type: stringrequired: truedescription: The city name.- in: queryname: state...- in: queryname: country...responses:"200":description: OKcontent:application/json:schema:type: objectadditionalProperties: true...
To convert an OpenAI plugin spec to a Tool, Natural Language API Toolkits (NLAToolkits) within Langchain, are used which enables LangChain Agents to effectively plan and combine calls across various endpoints, including calls to the OpenAI plugin ecosystem.
Given the vastness of industries and plugins to manage, the e-commerce platform can curate and maintain a collection of approved plugins that are compatible with both OpenAI and Langchain applications, and can be utilized for any downstream LLM-based agent.
To manage DataOps and LLMOps, we employ Weights & Biases. Here, W&B Artifacts are used to track datasets, models, dependencies, and results at each stage of LLM pipelines. Artifacts provide a comprehensive and auditable history of changes to your files. Artifacts are either an input of a run or an output of a run, where a run encompasses a session of our LLM usage. Common artifacts include complete training sets and models. Data can be directly stored into artifacts, or artifact references can be used to point to data in other systems like Amazon S3, GCP, or your proprietary systems.
Run set
31
As the e-commerce’s customer-service bots find success within the market, the scope of the bot may increase, which in turn increased the amount of plugins needed to be pulled down from the plugin store to match the need for up-to-date information for the bot. As a result, due to token limits for the bot, the LLM prompt template may not support the additional tool description context.
To manage this, a vector store can be used to dynamically retrieve the relevant tools that would be needed for a given input query. This dynamic retrieval allows for the inclusion of infinite scalings amounts of plugins available to an agent by choosing which `n` tools should be used for the input, where the retrieval is based on our embedding schema. This also allows for a decoupling of plugin management from LLM prompt template management which allows for better development and audibility of the key aspects of our LLM agent. These plugins and prompts are both easily stored in Weights & Biases artifacts for versioning and portability.
To better exemplify the above consider a user asking for fashion recommendations for a pub crawl in Philadelphia, with a request like to our LLM agent:
"I'm a man going out with friends to a pub crawl today in Philadelphia. What should I wear?"
The implemented system should ideally perform the following steps:
- Determine from the input query the most relevant tools that solve the given input query. Similar to above it would expected to have the ability to find:
- Local weather conditions
- Current clothing inventory
- Currency conversion for price presentation (if relevant)
2. The agent transforms this request into a set of actions usable for our given plugins.
- First, it decides to check the weather in Philadelphia using the Weather plugin.
- Next, it determines what kind of clothing would be suitable given the weather,
- and lastly, it decides to get a list of trendy male outfits from the Inventory plugin.
3. For each action, the agent uses the corresponding plugin (tool), sends a request (tool input), and gets a response (observation). This can be done using an appropriate Output parser.
4. After executing all actions and making all observations, the agent compiles a comprehensive response. This can be done using an appropriate Output parser.
- For example, it might respond with, "Given the pleasant weather in Philadelphia today, a trendy outfit for your pub crawl could be a crisp white linen shirt, comfortable dark jeans, and a light jacket. Add a touch of sophistication with a leather bracelet and a pair of aviator sunglasses. Here are some suggestions from our collection..."
5. The suggestions are then returned to the user, tailored to their request and the current weather conditions in Philadelphia.
Show me the code, again!
Let's walk through the code:
import osos.environ["LANGCHAIN_WANDB_TRACING"] = "true"os.environ["WANDB_PROJECT"] = "langchain-plugin-retrieval-demo"
Set W&B autologging with minimal code.
run = wandb.init(project=os.environ["WANDB_PROJECT"], name="save_company_plugin_registry")plugin_registry_art = wandb.Artifact(name="company_plugin_registry",type="plugins",description="Registry to hold our list of all available plugins")plugin_registry_art.add_file(all_plugins_fp)run.log_artifact(plugin_registry_art)
Serialize and version our plugin_registry saved at all_plugins_fp
all_plugins_fp = run.use_artifact('company_plugin_registry:latest').get_path("chatgpt_plugins.json").download()#Load from artifactwith open(all_plugins_fp, "r") as json_file:chatgpt_plugins = json.load(json_file)#Process details for easy retrieval and investigationAI_PLUGIN_DETAILS = {}for plugin in chatgpt_plugins["items"]:#…AI_PLUGIN_DETAILS[”detail"]] = plugin_manifest
Load and subset all latest plugins for FashionGPT agent relevance as AI_PLUGIN_DETAILS
embeddings = OpenAIEmbeddings()docs = [Document(page_content=detail["description_for_model"],metadata={"plugin_name": detail["name_for_model"]})for detail in AI_PLUGIN_DETAILS.values()]vector_store = FAISS.from_documents(docs, embeddings)#…plugin_registry_art = wandb.Artifact(name="subset_plugin_registry",type="plugins",metadata=meta,description="Registry to hold our vector store of indexed plugins, and metadata of selected plugins.")vector_store.save_local("faiss_index")with open("selected_plugins.json", "w", encoding='utf-8') as outfile:json.dump(AI_PLUGIN_DETAILS, outfile, ensure_ascii=False, indent=4)plugin_registry_art.add_dir("faiss_index", name="faiss_index/")plugin_registry_art.add_file("selected_plugins.json")run.log_artifact(plugin_registry_art)run.finish()
Embed the description of the subsetted plugins based on the description of the plugin and store into a vector store, serializing and versioning the vector store for ease of use in downstream applications.
subset_plugin_registry_fp = run.use_artifact('subset_plugin_registry:latest').download()#…vector_store = FAISS.load_local(faiss_index_fp, embeddings)with open(selected_plugins_fp, "r") as json_file:AI_PLUGIN_DETAILS = json.load(json_file)AI_PLUGINS = [AIPlugin.from_url(detail["spec_url"]) for detail in AI_PLUGIN_DETAILS.values()]toolkits_dict = {plugin.name_for_model:NLAToolkit.from_llm_and_ai_plugin(llm, plugin)for plugin in AI_PLUGINS}
Load the latest selected plugins and vector store, and convert the OpenAI spec into the relevant LangChain tool.
retriever = vector_store.as_retriever()def get_tools(query):# Get documents, which contain the Plugins to usedocs = retriever.get_relevant_documents(query)# Get the toolkits, one for each plugintool_kits = [toolkits_dict[d.metadata["plugin_name"]] for d in docs]# Get the tools: a separate NLAChain for each endpointtools = []for tk in tool_kits:tools.extend(tk.nla_tools)return tools
Define the logic to dynamically retrieve the relevant tools given an input query.
# Set up a prompt templateclass CustomPromptTemplate(StringPromptTemplate):# The template to usetemplate: str############## NEW ####################### The list of tools availabletools_getter: Callabledef format(self, **kwargs) -> str:# Get the intermediate steps (AgentAction, Observation tuples)# Format them in a particular wayintermediate_steps = kwargs.pop("intermediate_steps")thoughts = ""for action, observation in intermediate_steps:thoughts += action.logthoughts += f"\nObservation: {observation}\nThought: "# Set the agent_scratchpad variable to that valuekwargs["agent_scratchpad"] = thoughts############## NEW ######################tools = self.tools_getter(kwargs["input"])# Create a tools variable from the list of tools providedkwargs["tools"] = "\n".join([f"{tool.name}: {tool.description}" for tool in tools])# Create a list of tool names for the tools providedkwargs["tool_names"] = ", ".join([tool.name for tool in tools])return self.template.format(**kwargs)prompt = CustomPromptTemplate(template=template,tools_getter=get_tools,# This omits the agent_scratchpad, tools, and tool_names variables because those are generated dynamically# This includes the intermediate_steps variable because that is neededinput_variables=["input", "intermediate_steps"])
Create a custom prompt template to allow for the usage of this dynamic retrieval step for tool retrieval for the agent.
class CustomOutputParser(AgentOutputParser):def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:# Check if agent should finishif "Final Answer:" in llm_output:return AgentFinish(# Return values is generally always a dictionary with a single `output` key# It is not recommended to try anything else at the moment :)return_values={"output": llm_output.split("Final Answer:")[-1].strip()},log=llm_output,)# Parse out the action and action inputregex = r"Action\s*\d*\s*:(.*?)\nAction\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"match = re.search(regex, llm_output, re.DOTALL)if not match:raise ValueError(f"Could not parse LLM output: `{llm_output}`")action = match.group(1).strip()action_input = match.group(2)# Return the action and action inputreturn AgentAction(tool=action, tool_input=action_input.strip(" ").strip('"'), log=llm_output)
Define the logic within the OutputParser to properly retrieve the information from the dynamic plugins and be able to determine stop criteria for the agent all mentioned in the custom PromptTemplate
llm = ChatOpenAI(temperature=0, model=model)# LLM chain consisting of the LLM and a promptllm_chain = LLMChain(llm=llm, prompt=prompt)tool_names = [tool.name for tool in tools]agent = LLMSingleActionAgent(llm_chain=llm_chain,output_parser=output_parser,stop=["\nObservation:"],allowed_tools=tool_names)agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)
Create the agent using the custom LLM prompt and the custom output parser.
query="I'm a man going out with friends to a pub crawl today in Philadelphia. What should I wear”result = agent_executor.run(query)print(result)
Utilize the agent as expected.
To run the above code please follow the provided link:
Results
The output of our agent tends to get very verbose, which makes it difficult to prompt engineer and prompt tune as there are many intermediate observations that need to be scrutinized, especially as the plugins and responsibilities scale for the agent.

To better manage this, we'll use W&B Prompts, a suite of LLMOps tools built for the development of LLM-powered applications can be used. W&B Prompts is used to visualize and inspect the execution flow of LLMs, analyze the inputs and outputs of LLMs, view the intermediate results and securely store and manage prompts and LLM chain/agent configurations.
W&B Prompts supports a tool called Trace. Trace allows for tracking and visualization of the inputs and outputs, execution flow, model architecture, and any intermediate results of LLM chains/agents. These include all the complex plugin interactions that occur in the example agent provided:
Run set
31
Trace consists of three main components:
- Trace table: Overview of the inputs and outputs of a chain.
- Trace timeline: Displays the execution flow of the chain and is color-coded according to component types.
- Model architecture: View details about the structure of the chain and the parameters used to initialize each component of the chain.
The Trace Table provides an overview of the inputs and outputs of a chain. The trace table also provides information about the composition of a trace event in the chain, whether or not the chain ran successfully, and any error messages returned when running the chain.
The Trace Timeline view displays the execution flow of the chain and is color-coded according to component types. Select a trace event to display the inputs, outputs, and metadata of that trace.
The Model Architecture view provides details about the structure of the chain and the parameters used to initialize each component of the chain. Click on a trace event to learn more details about that event.
After experimenting with different plugins and different prompts, it can be seen that for this scenario, it is useful to explicitly:
- Define the restrictive context of what the Agent is aiming to do (be FashionGPT as opposed to a general zero-shot agent)
- Decompose example input queries into thought process steps, (akin to a one-shot approach) to force the agent to have consistent usage of plugins for similar queries
Unfortunately for the case of fashion planning, the agent is very good at utilizing the plugins to craft an outfit with the relevant types of clothing needed for weather conditions, but falls flat in ensuring the outfit is coordinated in terms of aesthetics. A useful plugin addition would be to curate coordinated outfits and make comparisons of the provided store inventory to construct an outfit, as opposed to directly listing the inventory and choosing whether relevant pieces as the agent is currently doing.
Outside of coordinated aesthetics, the usage of dynamic plugins is readily available and useful for LLM agents. With the proper tooling and organization, many industries can take advantage of Agent workflows in ways that are better prepped for their needs as it scales, while also being safe and secure in the experimentation process.
Related Reading:
Introducing OrchestrAI: Building Custom Autonomous Agents with Prompt Chaining
Autonomous agents are a rising frontier in AI, tackling complex tasks through simpler steps. In this report, we'll delve into the current state of agents, and introduce a new custom framework, OrchestrAI.
Automate Your Experiment Tracking with ChatGPT Custom Instructions and Weights & Biases
ChatGPT is great at writing code. Here's how to get even more automation using custom prompts and Weights & Biases!
Building Advanced Query Engine and Evaluation with LlamaIndex and W&B
This report showcases a few cool evaluation strategies and touches upon a few advanced features in LlamaIndex that can be used to build LLM-based QA bots. It also shows, the usefulness of W&B for building such a system.
What Do LLMs Say When You Tell Them What They Can't Say?
An exploration of token banning on GPT's vocabulary.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.