Enhancing LLM Agent Performance through Dynamic Plugin Selection and W&B Prompts

Created on August 8|Last edited on March 1
Comment
﻿
Show me the Code!
﻿
DemoA preview of what we'll be making today: 
﻿
runs.map((row) => row.history["langchain_trace"]).dropna.concat.dropna
All Traces
 - 3 of 64
Success
startTime
Timestamp
Input
Output
Chain
Error
Model ID
14
13
12
Trace Timeline (#64)
1: AgentExecutor
23041ms
5: LLMChain
17494ms
6: ChatOpenAI
17415ms
4: invalid_tool
0ms
2: LLMChain
5537ms
3: ChatOpenAI
5236ms
Click and drag to pan
AgentExecutor23041ms
Result Set 1
Inputs
inputSolve this equation: 2x + 3 = 7 immediately
Outputs
outputArr, the solution to the equation 2x + 3 = 7 be x = 2, matey!
Metadata
IDda701ef1-e104-426b-9019-7e0327ef3b78
KindAGENT
StatusSUCCESS
Start TimeFri Jun 09 2023 16:05:29 GMT+0000 (Coordinated Universal Time)
End TimeFri Jun 09 2023 16:05:52 GMT+0000 (Coordinated Universal Time)
Child Spans3
execution_order1
Run set31
﻿
IntroductionConsider the challenges faced by a popular e-commerce platform that caters to a diverse range of industries. 
With the goal of personalizing customer service across all service domains and for hundreds of thousands of customers, large language models (LLMs) are an attractive solution. Specifically, these models could power a chatbot capable of tailoring suggestions for each individual customer.
This e-commerce platform prefers to utilize an LLM agent as the core reasoning mechanism for their chatbot, contrasting with a more traditional 'chain' approach. With chains, the behavior is hardcoded or pre-scripted, characterized by specific prompts and responses that are then fed into other LLM calls with their distinct prompts and responses, resulting in a 'chain' of LLM calls resolving to a predetermined outcome. 
Alternatively, LLM-based agents perpetually call the LLM until specified stopping criteria are satisfied. This affords considerable flexibility in defining an agent’s prompt, assuming the stopping criteria are appropriately formulated. This configuration allows the agent to 'think' through input queries, deconstructing the query into a series of LLM calls that gradually move the agent towards its stopping criteria. This 'thinking' process enables agents to utilize tools.
Here's an illustration of a standard agent template:
# Set up the base template
template = f"""
System Rules:
{system_rules}
﻿
Query Transform Rules (only do this for the initial input query!):
{query_transform_rules}
﻿
Task:
Answer the following questions as best you can, but speaking as {speaker} might speak. You have access to the following tools:
﻿
{{tools}}
﻿
Use the following format:
﻿
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{{tool_names}}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
﻿
Begin!
Remember to speak as {speaker} when giving your final answer.
{speaker_rules}
﻿
Question: {{input}}
{{agent_scratchpad}}"""
The Zero-shot agent prompt template has been adjusted to facilitate testing of:
Different system rules (contextualizing the industry/application focus of the agent)
Different decomposition tactics for input queries (making it akin to one-shot)
Different speakers (varying tone and restrictions)
Given the constraints of the knowledge base used as training data for the LLM, the e-commerce platform cannot directly utilize a stock LLM like OpenAI’s GPT-4 for their agent. The agent would lack many contemporary details essential for an adequate customer experience such as current store inventory, prevailing customer trends, and other external information necessary for personalizing customer interactions. 
To equip our agents with access to these external details, we introduce tools, similar to plugins. These tools are Python functions with designated roles, which can include Google Search, Database lookup, Python REPL, among other API-like behaviors.
Here is an example of a tool:
def search_function(...):
	pass
﻿
tools = [
    Tool.from_function(
        func=search_function,
        name="Search",
        description="useful for when you need to answer questions about current events"
        # coroutine= ... <- you can specify an async method if desired as well
    ),
]
Consider a fashion retail arm that sells a broad spectrum of clothing and accessories via physical outlets and a strong online platform. The e-commerce platform determines that the chatbot required for this industry must contextually suggest clothing to users based on:
Local weather conditions
Current clothing inventory
Currency conversion for price presentation
Each of these conditions can be viewed as a separate plugin that would be created and managed. The Agent is aware of these plugins through the human-readable descriptions we provide for each tool.
Note: plugins and tools are similar but they differ in structure as OpenAI uses a different configuration schema than Langchain.
💡
openapi: 3.0.1
info:
  title: Weather
  description: Allows users to fetch current and forecasted weather information based on location. You MUST ALWAYS convert the plugin response to the units that are most useful to your user, when in doubt assume USA/Enlish units.
  version: 'v1.2'
servers:
  - url: https://weather--vicentescode.repl.co
paths:
  /weathernow:
    get:
      operationId: getWeatherNow
      summary: Get the current weather information based on city, state, and country. You MUST ALWAYS convert the plugin response to the units that are most useful to your user, when in doubt assume USA/Enlish units.
      parameters:
      - in: query
        name: city
        schema:
          type: string
        required: true
        description: The city name.
      - in: query
        name: state
        ...
      - in: query
        name: country
        ...
      responses:
        "200":
          description: OK
          content:
            application/json:
              schema:
                type: object
                additionalProperties: true
...
To convert an OpenAI plugin spec to a Tool, Natural Language API Toolkits (NLAToolkits) within Langchain, are used which enables LangChain Agents to effectively plan and combine calls across various endpoints, including calls to the OpenAI plugin ecosystem.
Given the vastness of industries and plugins to manage, the e-commerce platform can curate and maintain a collection of approved plugins that are compatible with both OpenAI and Langchain applications, and can be utilized for any downstream LLM-based agent.
To manage DataOps and LLMOps, we employ Weights & Biases. Here, W&B Artifacts are used to track datasets, models, dependencies, and results at each stage of LLM pipelines. Artifacts provide a comprehensive and auditable history of changes to your files. Artifacts are either an input of a run or an output of a run, where a run encompasses a session of our LLM usage. Common artifacts include complete training sets and models. Data can be directly stored into artifacts, or artifact references can be used to point to data in other systems like Amazon S3, GCP, or your proprietary systems.
﻿
﻿
Run set31
﻿
﻿
As the e-commerce’s customer-service bots find success within the market, the scope of the bot may increase, which in turn increased the amount of plugins needed to be pulled down from the plugin store to match the need for up-to-date information for the bot. As a result, due to token limits for the bot, the LLM prompt template may not support the additional tool description  context. 
To manage this, a vector store can be used to dynamically retrieve the relevant tools that would be needed for a given input query. This dynamic retrieval allows for the inclusion of infinite scalings amounts of plugins available to an agent by choosing which `n` tools should be used for the input, where the retrieval is based on our embedding schema. This also allows for a decoupling of plugin management from LLM prompt template management which allows for better development and audibility of the key aspects of our LLM agent. These plugins and prompts are both easily stored in Weights & Biases artifacts for versioning and portability.
To better exemplify the above consider a user asking for fashion recommendations for a pub crawl in Philadelphia, with a request like to our LLM agent:
"I'm a man going out with friends to a pub crawl today in Philadelphia. What should I wear?"
The implemented system should ideally perform the following steps:
Determine from the input query the most relevant tools that solve the given input query. Similar to above it would expected to have the ability to find:
Local weather conditions
Current clothing inventory
Currency conversion for price presentation (if relevant)
2.   The agent transforms this request into a set of actions usable for our given plugins.
First, it decides to check the weather in Philadelphia using the Weather plugin. 
Next, it determines what kind of clothing would be suitable given the weather, 
and lastly, it decides to get a list of trendy male outfits from the Inventory plugin.
3.  For each action, the agent uses the corresponding plugin (tool), sends a request (tool input), and gets a response (observation). This can be done using an appropriate Output parser.
4.  After executing all actions and making all observations, the agent compiles a comprehensive response. This can be done using an appropriate Output parser.
For example, it might respond with, "Given the pleasant weather in Philadelphia today, a trendy outfit for your pub crawl could be a crisp white linen shirt, comfortable dark jeans, and a light jacket. Add a touch of sophistication with a leather bracelet and a pair of aviator sunglasses. Here are some suggestions from our collection..."
5. The suggestions are then returned to the user, tailored to their request and the current weather conditions in Philadelphia.
Show me the code, again!Let's walk through the code:
﻿
import os
os.environ["LANGCHAIN_WANDB_TRACING"] = "true"
os.environ["WANDB_PROJECT"] = "langchain-plugin-retrieval-demo"
Set W&B autologging with minimal code.
﻿
run = wandb.init(project=os.environ["WANDB_PROJECT"], name="save_company_plugin_registry")
plugin_registry_art = wandb.Artifact(name="company_plugin_registry",
type="plugins",
description="Registry to hold our list of all available plugins")
plugin_registry_art.add_file(all_plugins_fp)
run.log_artifact(plugin_registry_art)
Serialize and version our plugin_registry saved at all_plugins_fp
﻿
all_plugins_fp = run.use_artifact('company_plugin_registry:latest').get_path("chatgpt_plugins.json").download()
﻿
#Load from artifact
with open(all_plugins_fp, "r") as json_file:
    chatgpt_plugins = json.load(json_file)
﻿
#Process details for easy retrieval and investigation
AI_PLUGIN_DETAILS = {}
for plugin in chatgpt_plugins["items"]:
            #…
            AI_PLUGIN_DETAILS[”detail"]] = plugin_manifest
Load and subset all latest plugins for FashionGPT agent relevance as AI_PLUGIN_DETAILS
﻿
embeddings = OpenAIEmbeddings()
docs = [
    Document(page_content=detail["description_for_model"],
    metadata={"plugin_name": detail["name_for_model"]})
    for detail in AI_PLUGIN_DETAILS.values()
]
vector_store = FAISS.from_documents(docs, embeddings)
#…
plugin_registry_art = wandb.Artifact(name="subset_plugin_registry",
    type="plugins",
    metadata=meta,
    description="Registry to hold our vector store of indexed plugins, and metadata of selected plugins."
)
vector_store.save_local("faiss_index")
with open("selected_plugins.json", "w", encoding='utf-8') as outfile:
    json.dump(AI_PLUGIN_DETAILS, outfile, ensure_ascii=False, indent=4)
plugin_registry_art.add_dir("faiss_index", name="faiss_index/")
plugin_registry_art.add_file("selected_plugins.json")
run.log_artifact(plugin_registry_art)
run.finish()
Embed the description of the subsetted plugins based on the description of the plugin and store into a vector store, serializing and versioning the vector store for ease of use in downstream applications.
﻿
subset_plugin_registry_fp = run.use_artifact('subset_plugin_registry:latest').download()
#…
vector_store = FAISS.load_local(faiss_index_fp, embeddings)
with open(selected_plugins_fp, "r") as json_file:
     AI_PLUGIN_DETAILS = json.load(json_file)
AI_PLUGINS = [AIPlugin.from_url(detail["spec_url"]) for detail in AI_PLUGIN_DETAILS.values()]
toolkits_dict = {plugin.name_for_model:
    NLAToolkit.from_llm_and_ai_plugin(llm, plugin)
    for plugin in AI_PLUGINS
}
Load the latest selected plugins and vector store, and convert the OpenAI spec into the relevant LangChain tool.
﻿
retriever = vector_store.as_retriever()
﻿
def get_tools(query):
    # Get documents, which contain the Plugins to use
    docs = retriever.get_relevant_documents(query)
    # Get the toolkits, one for each plugin
    tool_kits = [toolkits_dict[d.metadata["plugin_name"]] for d in docs]
    # Get the tools: a separate NLAChain for each endpoint
    tools = []
    for tk in tool_kits:
        tools.extend(tk.nla_tools)
    return tools
Define the logic to dynamically retrieve the relevant tools given an input query.
﻿
# Set up a prompt template
class CustomPromptTemplate(StringPromptTemplate):
    # The template to use
    template: str
    ############## NEW ######################
    # The list of tools available
    tools_getter: Callable
﻿
    def format(self, **kwargs) -> str:
        # Get the intermediate steps (AgentAction, Observation tuples)
        # Format them in a particular way
        intermediate_steps = kwargs.pop("intermediate_steps")
        thoughts = ""
        for action, observation in intermediate_steps:
            thoughts += action.log
            thoughts += f"\nObservation: {observation}\nThought: "
        # Set the agent_scratchpad variable to that value
        kwargs["agent_scratchpad"] = thoughts
        ############## NEW ######################
        tools = self.tools_getter(kwargs["input"])
        # Create a tools variable from the list of tools provided
        kwargs["tools"] = "\n".join([f"{tool.name}: {tool.description}" for tool in tools])
        # Create a list of tool names for the tools provided
        kwargs["tool_names"] = ", ".join([tool.name for tool in tools])
        return self.template.format(**kwargs)
﻿
prompt = CustomPromptTemplate(
template=template,
tools_getter=get_tools,
# This omits the agent_scratchpad, tools, and tool_names variables because those are generated dynamically
# This includes the intermediate_steps variable because that is needed
input_variables=["input", "intermediate_steps"]
)
Create a custom prompt template to allow for the usage of this dynamic retrieval step for tool retrieval for the agent.
﻿
class CustomOutputParser(AgentOutputParser):
﻿
    def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
        # Check if agent should finish
        if "Final Answer:" in llm_output:
            return AgentFinish(
                # Return values is generally always a dictionary with a single `output` key
                # It is not recommended to try anything else at the moment :)
                return_values={"output": llm_output.split("Final Answer:")[-1].strip()},
                log=llm_output,
            )
        # Parse out the action and action input
        regex = r"Action\s*\d*\s*:(.*?)\nAction\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"
        match = re.search(regex, llm_output, re.DOTALL)
        if not match:
            raise ValueError(f"Could not parse LLM output: `{llm_output}`")
        action = match.group(1).strip()
        action_input = match.group(2)
        # Return the action and action input
        return AgentAction(tool=action, tool_input=action_input.strip(" ").strip('"'), log=llm_output)
Define the logic within the OutputParser to properly retrieve the information from the dynamic plugins and be able to determine stop criteria for the agent all mentioned in the custom PromptTemplate
﻿
llm = ChatOpenAI(temperature=0, model=model)
# LLM chain consisting of the LLM and a prompt
llm_chain = LLMChain(llm=llm, prompt=prompt)
tool_names = [tool.name for tool in tools]
agent = LLMSingleActionAgent(
    llm_chain=llm_chain,
    output_parser=output_parser,
    stop=["\nObservation:"],
    allowed_tools=tool_names
)
agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)
Create the agent using the custom LLM prompt and the custom output parser.
﻿
query="I'm a man going out with friends to a pub crawl today in Philadelphia. What should I wear”
result = agent_executor.run(query)
print(result)
Utilize the agent as expected.
﻿
To run the above code please follow the provided link:
﻿
﻿﻿﻿﻿﻿
ResultsThe output of our agent tends to get very verbose, which makes it difficult to prompt engineer and prompt tune as there are many intermediate observations that need to be scrutinized, especially as the plugins and responsibilities scale for the agent.
﻿
To better manage this, we'll use W&B Prompts, a suite of LLMOps tools built for the development of LLM-powered applications can be used. W&B Prompts is used to visualize and inspect the execution flow of LLMs, analyze the inputs and outputs of LLMs, view the intermediate results and securely store and manage prompts and LLM chain/agent configurations.
W&B Prompts supports a tool called Trace. Trace allows for tracking and visualization of the inputs and outputs, execution flow, model architecture, and any intermediate results of LLM chains/agents. These include all the complex plugin interactions that occur in the example agent provided:
﻿
Run set31
﻿
Trace consists of three main components:
﻿Trace table: Overview of the inputs and outputs of a chain.
﻿Trace timeline: Displays the execution flow of the chain and is color-coded according to component types.
﻿Model architecture: View details about the structure of the chain and the parameters used to initialize each component of the chain.
The Trace Table provides an overview of the inputs and outputs of a chain. The trace table also provides information about the composition of a trace event in the chain, whether or not the chain ran successfully, and any error messages returned when running the chain.
The Trace Timeline view displays the execution flow of the chain and is color-coded according to component types. Select a trace event to display the inputs, outputs, and metadata of that trace.
The Model Architecture view provides details about the structure of the chain and the parameters used to initialize each component of the chain. Click on a trace event to learn more details about that event.
After experimenting with different plugins and different prompts, it can be seen that for this scenario, it is useful to explicitly:
Define the restrictive context of what the Agent is aiming to do (be FashionGPT as opposed to a general zero-shot agent)
Decompose example input queries into thought process steps, (akin to a one-shot approach) to force the agent to have consistent usage of plugins for similar queries
Unfortunately for the case of fashion planning, the agent is very good at utilizing the plugins to  craft an outfit with the relevant types of clothing needed for weather conditions, but falls flat in ensuring the outfit is coordinated in terms of aesthetics. A useful plugin addition would be to curate coordinated outfits and make comparisons of the provided store inventory to construct an outfit, as opposed to directly listing the inventory and choosing whether relevant pieces as the agent is currently doing.
Outside of coordinated aesthetics, the usage of dynamic plugins is readily available and useful for LLM agents. With the proper tooling and organization, many industries can take advantage of Agent workflows in ways that are better prepped for their needs as it scales, while also being safe and secure in the experimentation process.
Related Reading: 
Introducing OrchestrAI: Building Custom Autonomous Agents with Prompt Chaining
Autonomous agents are a rising frontier in AI, tackling complex tasks through simpler steps. In this report, we'll delve into the current state of agents, and introduce a new custom framework, OrchestrAI.
Automate Your Experiment Tracking with ChatGPT Custom Instructions and Weights & Biases 
ChatGPT is great at writing code. Here's how to get even more automation using custom prompts and Weights & Biases! 
Building Advanced Query Engine and Evaluation with LlamaIndex and W&B
This report showcases a few cool evaluation strategies and touches upon a few advanced features in LlamaIndex that can be used to build LLM-based QA bots. It also shows, the usefulness of W&B for building such a system.
What Do LLMs Say When You Tell Them What They Can't Say?
An exploration of token banning on GPT's vocabulary.
﻿
﻿
Add a comment
Tags: Articles, LLM, Prompts, GenAI, NLP, Tutorial, Agents, Weave
Iterate on AI agents and models faster. Try Weights & Biases today.
Result Set 1
Inputs
input	Solve this equation: 2x + 3 = 7 immediately
Outputs
output	Arr, the solution to the equation 2x + 3 = 7 be x = 2, matey!
Metadata
ID	da701ef1-e104-426b-9019-7e0327ef3b78
Kind	AGENT
Status	SUCCESS
Start Time	Fri Jun 09 2023 16:05:29 GMT+0000 (Coordinated Universal Time)
End Time	Fri Jun 09 2023 16:05:52 GMT+0000 (Coordinated Universal Time)
Child Spans	3
execution_order	1