A Deep Dive Into LangChain’s Generative Agents

In this article, which is part two, we'll be deep diving into LangChain's implementation of generative agents, to better understand what's changed in recent months.
Vincent Tu
Created on September 7|Last edited on March 1
Comment
First and foremost, we need to answer the question: what are generative agents? 
A generative agent is an AI system capable of generating some sort of output or content itself. That can mean anything from replying to contextual customer service requests to creating blog images from prompts.
This term was popularized in the paper "Generative Agents: Interactive Simulacra of Human Behavior". As of now, the paper is five or so months old, but I thought I'd do a deep dive into it, specifically through LangChain's implementation, to get a better understanding! 
This article is part of a 2-part series. I'll be wrapping that up soon and we'll link it here once we're done. 
Here's what we'll be covering today: 
Table of ContentsWhat is "Generative Agents: Interactive Simulacra of Human Behavior" about? Generative Agent Behavior and InteractionLangChain's ImplementationConclusionReferencesAppendix (click the .py sections below to expand)
﻿
﻿
Let's dive in! 
What is "Generative Agents: Interactive Simulacra of Human Behavior" about? In short, "Generative Agents: Interactive Simulacra of Human Behavior" is a research paper that simulates a virtual, Sims-like world — specifically a neighborhood where 25 Large Language Model (LLM)-backed agents go about their virtual lives.
﻿
I'll cover the main method sections and the evaluation, as well as briefly talk about the introduction, related work, and discussion.
Let's break down the paper!
Generative Agent Behavior and InteractionThis section covers how these generative agents behave and interact.
Agent Avatar and CommunicationEach of the 25 agents is defined by a system message describing their life, personality, and identity.
﻿
﻿Agents interact with the virtual world through their actions and converse through natural language. At every time step, all the agents have a short description of what to do next, and this description is reflected in their actions and movements in the interface. You can interact with the agent either as an anonymous agent talking to them or as their "inner voice."
Environmental InteractionThe sandbox world is equipped with houses and furniture. The user and agents can interact with appliances (think turning on or off a stove).
Example "Day in the Life"Agents start and plan their day from a small paragraph description. They develop memories, create new relationships, and interact with their environment and other agents. 
﻿
Emergent Social Behaviors
As agents engage in conversations, knowledge can spread from agent to agent. Agents also form relationship memories with other agents and agents can coordinate events together.
What Is a Generative Agent Architecture?At its core, a generative agent architecture is a framework for simulating behavior in the open world. They use an LLM to ingest input from the open world and output behavior in the form of text. There is infrastructure wrapped around this LLM to provide it with greater capabilities.
In this section, they list challenges and solutions.
﻿
Challenge #1: Simulating human behavior with agents needs the agent to reason about their experiences/memories. The authors use a memory stream, but using the entire memory stream is inefficient and distracting. How should we go about this?
Solution #1: The memory stream is composed of what the agent sees around him/herself. It can also be behaviors performed by the agent, and interactions with other agents synthesized into memories. The retrieval mechanism factors recency, relevancy/salience, and importance. 
Recency: assigning higher priority to memories recently accessed via exponential decay with decay rate 0.99
Importance: an absolute importance score is assigned to every memory; an important memory is like getting a job, and an unimportant memory is like eating breakfast
Relevancy/Salience: cosine similarity between the text embedding vectors of the memory stream and the query prompt; essentially, which memories are relevant to this query prompt?
The retriever retrieves from the memory stream by taking the memories with the highest combined score of recency, relevancy/salience, and importance.
The scores are summed together in the following way:
score=αrecency∗recency+αimportance∗importance+αrelevance∗relevancescore = \alpha_{recency} * recency + \alpha_{importance} * importance + \alpha_{relevance} * relevance   score=αrecency​∗recency+αimportance​∗importance+αrelevance​∗relevance﻿
In their implementation, they kept all alphas at 1.
﻿
Challenge #2: Agents struggle with performing inference with raw memories as context. The issue of having too many memories for inference also arises.
Solution #2: A second type of memory in the memory stream: a reflection. They exist alongside other memories in the memory stream, but they are more abstract and higher-level. Reflections are generated only when a threshold (sum of recent memories' importance scores) is exceeded. 
The process for reflecting looks like this:
Identifying what to reflect on: query the LLM for 100 most recent memories → generate 3 high-level salient questions about the memories
Getting context: 3-pronged retrieval from the memory stream for a set of memories to accompany each question
Simulating reflection: for each question, generate 5 novel insights
Updating the memory stream: append these to the memory stream
﻿
Challenge #3: Agents need to plan over long horizons. LLMs can't do this by just simply passing in lots and lots of context.
Solution #3: Plans are stored in the memory stream and they keep the agent's behavior consistent over time. They are included in the retrieval process. These plans outline a single day and are prompted by the agent's identity/summary description and a summary of their previous day. As the agent goes about its day, this plan will be recursively edited to include more and more detail.  
Sandbox Environment ImplementationAs LangChain's implementation mainly focuses on the agent and the memory, I won't cover much about how the sandbox environment is implemented. However, I did come across an interesting point. The appliances around the virtual environment are structured in a tree such that a "stove" is the child node of the "kitchen" node. Each agent is initialized with a starting tree such that they are aware of their surroundings. This tree is updated as the environment steps. Another interesting point here is that the tree is parsed and turned into natural language. 
Controlled EvaluationThe paper performs 2 evaluations. The controlled evaluation analyzes whether an agent's response is believable in a narrowly defined context. In the end-to-end evaluation, they analyze the emergent behaviors of the sandbox community after 2 full days of running.
The authors took on the role of an interviewer and interviewed an agent, assessing 5 categories:
Self-knowledge: ask agents basic questions about themselves (e.g. who are you?)
Memory: ask the agent to retrieve particular experiences
Plans: ask agents to retrieve long-term plans
Reactions: present the agent with hypothetical situations in which it needs to respond believably
Reflections: ask about its relationships and higher-level memories to test reflections
They ran ablations with this evaluation harness.
﻿
A worker was assigned to every agent to generate interview questions. 100 evaluators were hired from Prolific and each of them was tasked with ranking the agents' believability across the 5 conditions. They used these rankings to compute a TrueSkill Rank Rating, which is a generalization of the chess ELO rating system for a multi-player environment. 
From their ablation experiments, they found the full architecture to be most believable. They also found that agents were able to correctly retrieve memories but sometimes they would fail to retrieve certain memories. Fabricating entire memories was rare. Reflections were also crucial to synthesizing observations and interactions and acting upon them. The example they gave showed an agent failing to find a birthday gift for their friend without reflection. 
End-to-end EvaluationIn their end-to-end evaluation, they analyze 3 metrics: information diffusion, relationship formation, and agent coordination.
They track two pieces of information over the course of two days: Sam's candidacy for village mayor and Isabella's Valentine's Day party at the Hobbs Cafe. They interviewed every agent and double-checked for hallucinations. They also analyzed the relationships formed over two days and recorded them in a graph.
﻿
They discovered that the percentage of agents aware of Sam's candidacy increased from 4% to 32%, and the percentage of agents aware of Isabella's party increased from 4% to 48% over those two days. They found the relationship network density of the agents in the community increased from 0.167 to 0.74, meaning the agents were forming relationships (growing aware of other agents). The agents were able to coordinate for Isabella's Valentine's Day party. Out of the 453 agent responses, about 1.3% of them were hallucinations. 
The authors also conducted an inductive analysis of agent behavior and discovered 3 key takeaways:
As memory stream grows larger, it grows more difficult to determine the appropriate space to perform an action (in their example, agents supposed to have lunch at a cafe ended up going to a bar despite the bar being a get-together place)
Erratic behavior caused by misclassification of what is considered proper behavior (e.g. dorm bathrooms tend to have multiple rooms but in the sandbox they were single rooms; the agents believed dorm bathrooms had multiple so their would be multiple in the dorm bathroom at a time!)
Instruction tuning made agents more open-minded and cooperative overall; Isabella, for example, developed an interest in English literature despite this being against her characteristics
LangChain's ImplementationYou can find the LangChain implementation of Generative Agents here and their source code here. I have included my diagrams and they can be found here and on Imgur here.
As of writing this, there are 2 files involved: memory.py and generative_agent.py.
﻿
Let's first explore memory.py, then we can cover generative_agent.py.
memory.py
All attributes and functions.
I'll give a short summary of each attribute and function below.
# Main attributes.
llm: BaseLanguageModel = the LLM model
memory_retriever: TimeWeightedVectorStoreRetriever = the retriever with a vector store
verbose: bool = T/F flag if you want logging
reflection_threshold: float = threshold of importance sum scores before 
	starting reflection
current_plan: List[str] = <NOT IMPLEMENTED/UNUSED>
importance_weight: float = the alpha of the importance in the combined 
	score calculation
aggregate_importance: float = a count for the sum of importance scores of 
	recently added memories; if exceeds reflection_threshold, 
	then the agent reflects
﻿
# For loading the memory variables (LangChain's BaseMemory). 
max_tokens_limit: int = a counter for max token limit for _get_memories_until_limit
queries_key: str = key string for the input to load_memory_variables; for
	general-purpose loading relevant memories w.r.t. a query
most_recent_memories_token_key: str = key string for the input to 
	load_memory_variables; for general-purpose loading most recent memories
	from the memory stream 
add_memory_key: str = key string for the output variable in save_context
relevant_memories_key: str = key string for load_memory_variables; for
	loading relevant memories w.r.t. a query in load_memory_variables
relevant_memories_simple_key: str = key string for load_memory_variables; for
	loading relevant memories w.r.t. a query in load_memory_variables; simple
	basically means a different formatting
most_recent_memories_key: str = key string for load_memory_variables; for
	loading most recent memories in load_memory_variables
now_key: str = key string for the output variable in save_context
﻿
# Flag for whether or not the agent is currently reflecting.
reflecting: bool = True if the agent is reflecting and False otherwise
Now let's go through the methods. I've included a set of diagrams to help with organizing the methods. 
﻿
As you can see, there are a whole lot of utility and formatting methods (some for BaseMemory and some for formatting). There are a couple of methods for reflection, and calculating memory importance, and 4 main methods that you interact with regularly.  
Before we cover the functions, let's first understand what's exactly stored in the memory stream. They are stored as LangChain documents with the memory contents in the page_content attribute. The metadata dictionary for each Document will have 3 key values: importance, created_at, and current_time.
The document structure was a bit confusing to figure out, but taking a look at the TimeWeightedVectorStoreRetriever add_document source code helped! Note, the docs page link might not work. If so, navigate to LangChain's API Reference → langchain.retrievers → TimeWeightedVectorStoreRetriever → [source] → add_documents method.
Now let's briefly cover these functions in the following order:
Formatting & chain
Importance
Reflection (includes pause_to_reflect)
Main methods (add_memory, add_memories, and fetch_memories)
I'm providing short explanations of the function, but I've also formatted each of these into diagrams. There'll be a bit of code in them, but I provide explanations! These diagrams will be in the appendix. I also encourage you to, along with my diagrams, take a look at the source code.
As for the diagram convention, I have the input on the left outside of the container (the inner box). The output is always at the bottom with a type hint. I provide arrows to show how the input is processed and explanations. If a function calls another function, then I include the called function's name in the container encircled in a white rectangular box. 
Formatting_format_memory_detail: Given a LangChain Document memory and a prefix str, return a string f"{prefix}[{current_time}] {memory.page_content.strip()}".
format_memories_detail: Given relevant_memories, a list of LangChain Documents, format each document with _format_memory_detail and join all the str outputs with \n. An example string: "- <created_time> <page_content>\n- <created_time> <page_content>\n".
format_memories_simple: Given relevant_memories, a list of LangChain Documents, join their page contents with a semicolon and return this new string.
_parse_list: Given text, a str, parse new-line separated strings \n into a list of strings.
chain: Return an LLMChain with the GenerativeAgentMemory llm, input prompt, and GenerativeAgentMemory verbose flag.
Importance_score_memory_importance: Given memory, a str, prompt the LLMChain to generate a value from 1-10 on the importance of the memory. If the chain outputs no score, then return 0.0 else parse out the score, divide by 10, and multiply it by the importance weight.
_score_memories_importance: Given memory_content, a string of semicolon-separated memories, generate a score from 1-10 for each one. Parse and return this out to a list of floats.
Reflection_get_topics_of_reflection: Given last_k, an int, retrieve the last_k most recent memories from the retriever's memory stream. Take this list of Documents and join them with \n. Ask the LLMChain to generate 3 most salient/relevant questions based on these memories.
_get_insights_on_topic: Given a topic str and a datetime now, retrieve a list of Documents relevant/salient, important, and recent memories w.r.t the topic and now. Join this list of Documents with \n, formatting each Document with _format_memory_detail (format includes numbering the memories and including a created_at timestamp). 
pause_to_reflect: Given a datetime now, if our aggregate_importance (running sum of importance scores of added memories) exceeds our non-zero reflection threshold, then this function is called. First, call _get_topics_of_reflection, which returns a list of strings   (3 questions). For each question, call _get_insights_on_topic. This generates a list of strings (5 insights for every question). Add each of these 5 insights (15 total) to the memory and append to a list result which is returned.
Main Methodsfetch_memories: Given an observation string and a now datetime, if now is None, then retrieve from the memory stream relevant/salient, important, and recent Documents w.r.t. the observation. If now is not None, then do the same except use with mock_now(now) as the context manager (assumes the current time is whatever now is). Returns a list of Documents. 
add_memory: Given a string memory_content and a datetime now, first, score the memory's importance with _score_memory_importance. Add this score to the aggregate_importance. Create and add a document with the memory_content, importance_score, created_at, and current_time to the memory stream. Check if we can reflect. If we can then pause_to_reflect.
add_memories: Given memory_content, a string of semicolon delineated memories and now, run _score_memories_importance. Add the max importance score from its output to the aggregate_importance. For each memory, perform the same operations to add it to the memory stream like in add_memory. Check if we can reflect. If we can, then pause_to_reflect.
I hope that, with enough time, the functions were made clear in the diagrams. All of these methods are to support the following control flow.
﻿
Hopefully, by now, you have a somewhat solid understanding of how the memory behind a generative agent works, how memories are scored on the importance scale, and how reflection is performed. 
As of now, we have very powerful LLMs, but limited methods in mimicking human processes like reflecting, experiencing and remembering new memories, being able to retrieve relevant memories in a conversation, etc. Thus, most of this infrastructure is built manually. That is, despite the versatility LLMs display in handling questions and overall conversation, they still require these additional components to better mimic human behavior. 
This class GenerativeAgentMemory encapsulates the agent's memory. This was the hard part. Let's move on to the agent class itself!
Note, that I won't directly cover the BaseMemory methods I skipped over. I'll briefly explain what's going on behind the scenes, but these methods are for utility.
generative_agent.py﻿
﻿
We can ignore the inner Config class. Let's cover the const or attributes first. 
name: str = name of the agent
age: int = age of agent
traits: str = permanent traits of the agent
status: str = traits you wish not to change (still unclear to me)
memory: GenerateAgentMemory = the agent's memory class
llm: BaseLanguageModel = the LLM
verbose: bool = T/F if you want verbose logging
summary: str = stateful summary for self-reflection; internal variable
summary_refresh_seconds: int = how frequently to regenerate the summary (in sec); internal variable
last_refreshed: datetime = last time the agent's summary was generated; internal variable
daily_summaries: List[str] = summary of agent's daily events undertaken so far; internal variable
Now let's cover the methods. _parse_list and chain are the same from before.
﻿
Let's cover them in this order:
Utility
Get Entity
Summary
Generate
Utility_clean_response: Given a text string, remove the agent's name from the string and return this new string.
_parse_list: same as in GenerativeAgentMemory
_chain: same as in GenerativeAgentMemory
Get Entity_get_entity_from_observation: Given an observation string, ask an LLMChain what entity is in the observation. Returns a string.
_get_entity_action: Given an observation string and the entity name (from _get_entity_from_observation), extract what the entity is doing from the observation. Returns a string. 
Summary_compute_agent_summary: No input. Ask an LLMChain what the core characteristics of the agent are given a set of relevant memories (retrieved from the memory stream and queried by f"{self.name}'s core characteristics"). Returns a string of the agent's core characteristics.
get_summary: Given force_refresh a bool and now datetime, get the current time (now) and the last time since the agent's summary was refreshed. If the agent does not have a summary (internal string variable) or the next refresh time is overdue or we are forcing a refresh with force_refresh, then call _compute_agent_summary. Return a string of the agent's summarized core characteristics, traits, name, and age.  
get_full_header: Given force_refresh a bool and now datetime, call get_summary to get a summary of the agent. Return a string of the agent's core characteristics summary, current time, and agent's name and status.
summarize_related_memories: Given an observation string, call _get_entity_from_observation and _get_entity_action. Prompt the LLMChain to summarize memories (from its memory stream) related to the entity and entity action present in the observation. Returns a string. 
Generate_generate_reaction: Given an observation string, a suffix string (a call to action), and a now datetime, call get_summary and summarize_related_memories to get a summary of the agent and summarized related memories w.r.t. the observation. Create a dictionary kwargs:
 kwargs = {
  "agent_summary_description": <str>,
  "current_time": <str>,
  "relevant_memories": <str>,
  "agent_name": <str>,
  "observation": <str>,
  "agent_status": <str>,
  "recent_memories_token": int,  # number of tokens used in the prompt (w/o including recent memories)
  "most_recent_memories": <str>  # most recent memories (up till a specified token limit)
}
Prompt an LLMChain supplied with a prompt containing all the context information above. The provided kwargs is context information. The actual call to action is in the suffix.
generate_reaction: Given an observation string and a now datetime, call _generate_reaction with the suffix being:
call_to_action_template = (
	"Should {agent_name} react to the observation, and if so,"
        + " what would be an appropriate reaction? Respond in one line."
        + ' If the action is to engage in dialogue, write:\nSAY: "what to say"'
        + "\notherwise, write:\nREACT: {agent_name}'s reaction (if anything)."
        + "\nEither do nothing, react, or say something but not both.\n\n"
)
The chain outputs 3 possibilities: do nothing, say, or react (but not both). Save the output of _generate_reaction to memory and return the output with its format conditioned on whether the output is a say or react. 
generate_dialogue_response: Given an observation string and a now datetime, call _generate_reaction with the suffix being:
call_to_action_template = (
	"What would {agent_name} say? To end the conversation, write:"
        ' GOODBYE: "what to say". Otherwise to continue the conversation,'
        ' write: SAY: "what to say next"\n\n'
)
Save the output of _generate_reaction to memory and return its result with its format conditioned on whether the output is GOODBYE or SAY.
For convenience, here's the link to the diagrams again. In case, the share link is blurry, I also have it on Imgur here.
Small NoteIf you do take a look at the source code, specifically in summarize_related_memories and _compute_agent_summary, you'll notice that relevant_memories in the prompt is never explicitly defined. Below is an example. Also, you may be asking, what is queries?
def _compute_agent_summary(self) -> str:
        """"""
        prompt = PromptTemplate.from_template(
            "How would you summarize {name}'s core characteristics given the"
            + " following statements:\n"
            + "{relevant_memories}"
            + "Do not embellish."
            + "\n\nSummary: "
        )
        # The agent seeks to think about their core characteristics.
        return (
            self.chain(prompt)
            .run(name=self.name, queries=[f"{self.name}'s core characteristics"])
            .strip()
        )
I did a bit of investigative work and discovered the mysterious queries and the relevant_memories tie to self.memory.
Here's what's happening behind the scenes when you call .run(name=self.name, queries=[f"{self.name}'s core characteristics"]).
﻿Calls run in langchain.chains.base﻿
Because we are only passing in kwargs and not args, this line runs. 
return self() is invoking the __call__﻿
__call__ invokes self.prep_input(input)﻿
﻿self.prep_input receives input which is a dictionary (remember input is forwarded all the way from the initial run; input is the kwargs dictionary)
﻿if not isinstance(inputs, dict) will evaluate to False so the code within that if statement won't execute
﻿if self.memory is not None will execute and this will call load_memory_variables from our memory retriever
At this stage, inputs look like: inputs = {"name": <name>, "queries": [<queries>]}. 
﻿if queries is not None will evaluate to True and we will fetch all relevant memories with L265-L267 based on all queries (in the case of _compute_agent_summary, we have 1 query: f"{self.name}'s core characteristics")
Then we return a new dictionary: that looks something like: {"relevant_memories": <relevant memories in regular format>, "relevant_memories_simple": <relevant memories in simple format>}
Then after we finish load_memory_variables, we go back to base.py, where if self.memory is not None is True, saving the output of load_memory_variables to external_context
Finally, the inputs dictionary with structure shown in step #8 is updated with the external_context dictionary with the structure shown in step #10 above 
The final dictionary exiting prep_inputs is:
inputs = {
	"name": <str>,
	"queries": [<queries>],
	"relevant_memories": <relevant_memories>,
	"relevant_memories_simple": <relevant_memories in simple format>,
}
This same behind-the-scenes behavior can be seen in:
L85 in summarize_related_memories﻿
L215 in _compute_agent_summary﻿
L120 in _generate_reaction﻿
Basically, since GenerativeAgentMemory is a subclass of BaseMemory, we can plug it in to our LLMChain's. Including these unique keys in our GenerativeAgentMemory class and extra keywords when we call .run allow us to dynamically retrieve memory for our prompts from the TimeWeightedVectorStoreRetriever through passing it through the LLMChain. 
ConclusionIn this article, I covered "Generative Agents: Interactive Simulacra of Human Behavior" and walked through the LangChain implementation of Generative Agents. What's all this for? Well, it's to provide you a more comprehensive understanding of the paper! Of course, there are still lots to implement when it comes to these 2 files. That's why I'll be writing a Part 2 to this article where I improve these 2 files! Check it out below!
Thanks for reading! 👋
ReferencesPaper: https://arxiv.org/abs/2304.03442﻿
LangChain GitHub: https://github.com/langchain-ai/langchain﻿
LangChain Generative Agents Source Code: https://github.com/langchain-ai/langchain/tree/master/libs/experimental/langchain_experimental/generative_agents﻿
LangChain Docs Page on Generative Agents: https://python.langchain.com/docs/use_cases/more/agents/agent_simulations/characters﻿
My Article on 2 Multi-Agent libraries: https://api.wandb.ai/links/vincenttu/nen3t5lx﻿
My Diagrams: https://drive.google.com/file/d/1tHkTPP1OE8Du2E8XvO-Q6TzhDfsdty0Y/view?usp=sharing﻿
My Diagrams on Imgur: https://imgur.com/a/u1P9SSX 
Exploring 2 Multi-Agent LLM Libraries: Camel & Langroid
In this post, we'll explore two multi-agent LLM libraries - Camel and Langroid!
Improving LangChain's Generative Agents Implementation
In this article, I'll be improving LangChain's generative agents implementation.
﻿
Appendix (click the .py sections below to expand)
memory.py
generative_agent.py﻿
Add a comment
Tags: LLM, Articles, Experiment, GenAI, Intermediate, Agents
Iterate on AI agents and models faster. Try Weights & Biases today.