Building a chatbot with Gemma, Langchain and Chroma DB
Learn how to create a chatbot with Gemma, LangChain, and ChromaDB. This guide walks you through setup, data processing, and response generation.
Created on February 27|Last edited on February 20
Comment
Chatbots are becoming essential tools across industries, helping businesses automate interactions, provide instant support, and streamline operations. Whether for customer service, virtual assistants, or specialized applications like restaurant ordering, a well-built chatbot can enhance user experience and efficiency.
In this guide, we’ll walk through building a chatbot using Gemma, LangChain, and ChromaDB - three powerful tools that simplify the process of developing an intelligent, context-aware assistant. Our example use case will be a restaurant menu-ordering chatbot, but the same principles can be applied to other domains.
Why use Gemma, LangChain, and ChromaDB?
To create a chatbot that understands user queries, retrieves relevant information, and responds naturally, we’ll leverage the strengths of:
- Gemma: A lightweight, open-source large language model from Google, designed for efficient text-based tasks.
- LangChain: A framework that simplifies working with LLMs, enabling easy orchestration of components like retrieval and memory.
- ChromaDB: A vector database for storing and retrieving relevant context, improving chatbot accuracy.
By combining these technologies, we’ll develop a chatbot that can answer customer questions, provide menu recommendations, and handle inquiries dynamically.
Let's get going ...
Setting up the environment
Before we start coding, we need to install the required dependencies:
!pip install langchain!pip install chromadb!pip install wandb # For logging and monitoring
After installation, log in to Weights & Biases to track your experiments:
wandb login
Loading and preparing your data
For this example, our chatbot will use a restaurant menu stored in a JSON file. We’ll first load this data and convert it into a format suitable for retrieval.
import jsonimport osjson_file = "menu.json"with open(json_file, "r") as f:json_data = json.load(f)count = 0folder_path = "/content/Data" #You will need to create this before writing files.for dish in json_data:file_path = os.path.join(folder_path,"{}.txt".format(count))f = open(file_path, "w")for key, value in dish.items():f.write(f"{key}: {value}\n")f.close()count+=1
Each dish is stored as a separate text file, making it easy to load and process with LangChain.
Next, we’ll use LangChain’s TextLoader to load the data into documents.
from langchain_community.document_loaders import TextLoaderloaders = []for i in range(12):file_path = os.path.join(folder_path,"{}.txt".format(i))loaders.append(TextLoader(file_path))docs = []for loader in loaders:docs.extend(loader.load())
Creating the vector database
We’ll use Hugging Face’s Inference API to generate embeddings and store them in ChromaDB for efficient retrieval.
from langchain.vectorstores import Chromafrom langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddingsinference_api_key = "key"embeddings = HuggingFaceInferenceAPIEmbeddings(api_key=inference_api_key, model_name="sentence-transformers/all-mpnet-base-v2")vectordb = Chroma.from_documents(documents=docs,embedding=embeddings)
This allows the chatbot to retrieve relevant context when answering user queries.
Loading the language model
We’ll use Gemma-2B-IT, a lightweight model designed for efficiency. To keep responses focused and accurate, we set temperature=0.1 and limit the output length.
from langchain_community.llms import HuggingFaceHubllm = HuggingFaceHub(repo_id="google/gemma-2b-it",task="text-generation",model_kwargs={"max_new_tokens": 512,"top_k": 5,"temperature": 0.1,"repetition_penalty": 1.03,},huggingfacehub_api_token = "API_TOKEN")
Defining the chatbot’s behavior
We’ll use a structured prompt to guide the chatbot’s responses.
from langchain.prompts import PromptTemplatetemplate = """You are a Chatbot at a Restaurant. Help the customer pick the right dish to order. The items in the context are dishes. The field below the item is the cost of the dish. About is the description of the dish. Use the context below to answe the questions{context}Question: {question}Helpful Answer:"""QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,)
Adding memory and retrieval
To maintain conversation history, we’ll use LangChain’s ConversationBufferMemory.
from langchain.memory import ConversationBufferMemorymemory = ConversationBufferMemory(memory_key="chat_history",return_messages=True)
Now, we’ll define a retriever to fetch relevant menu details and initialize a Conversational Retrieval Chain.
from langchain.chains import ConversationalRetrievalChainretriever = vectordb.as_retriever()qa = ConversationalRetrievalChain.from_llm(llm,retriever=retriever,memory=memory,)
Improving query handling
To ensure context-aware responses, we’ll preprocess user queries using a reformulation chain.
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholderfrom langchain_core.output_parsers import StrOutputParserfrom langchain_core.runnables import RunnablePassthroughcontextualize_q_system_prompt = """Given a chat history and the latest user question \which might reference context in the chat history, formulate a standalone question \which can be understood without the chat history. Do NOT answer the question, \just reformulate it if needed and otherwise return it as is."""contextualize_q_prompt = ChatPromptTemplate.from_messages([("system", contextualize_q_system_prompt),MessagesPlaceholder(variable_name="chat_history"),("human", "{question}"),])contextualize_q_chain = contextualize_q_prompt | llm | StrOutputParser()
Improving query handling for better conversations
To make our chatbot context-aware, we need a way to reformulate follow-up questions. Users often ask follow-ups without repeating full context, such as:
- "What about vegetarian options?"
- "Can you suggest something spicy?"
Since the chatbot doesn’t inherently remember the context of each question, we need a method to rewrite user queries before retrieving relevant data.
To do this, we define a contextualization function that checks if chat history exists. If the user’s question depends on previous messages, the function restructures it to make sense in isolation.
def contextualized_question(input: dict):if input.get("chat_history"):return contextualize_q_chain.invoke(input)else:return input["question"]
Now, we can integrate this into the retrieval step, ensuring the chatbot fetches relevant information even for follow-up queries.
rag_chain = (RunnablePassthrough.assign(context=contextualized_question | retriever)| QA_CHAIN_PROMPT| llm)
This ensures that before sending a query to the database, the chatbot automatically restructures it if needed. This is particularly useful for multi-turn conversations, where users might not provide full context in each message.
Running the chatbot
We’ll set up a conversation loop, allowing users to interact with the chatbot until they type "exit".
from langchain_core.messages import AIMessage, HumanMessageos.environ["LANGCHAIN_WANDB_TRACING"] = "true"os.environ["WANDB_PROJECT"] = "Restaurant_ChatBot"print("Welcome to the Restaurant. How can I help you today?")chat_history = []def predict(message, history):ai_msg = rag_chain.invoke({"question": message, "chat_history": chat_history})idx = ai_msg.find("Answer")chat_history.extend([HumanMessage(content=message), ai_msg])return ai_msg[idx:]
This completes the chatbot setup. Users can now ask questions, receive recommendations, and interact in a context-aware manner .
Play with the demo

References
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.