Building a chatbot with Gemma, Langchain and Chroma DB

Learn how to create a chatbot with Gemma, LangChain, and ChromaDB. This guide walks you through setup, data processing, and response generation.
Usha Rengaraju
Created on February 27|Last edited on February 20
Comment
Chatbots are becoming essential tools across industries, helping businesses automate interactions, provide instant support, and streamline operations. Whether for customer service, virtual assistants, or specialized applications like restaurant ordering, a well-built chatbot can enhance user experience and efficiency.
In this guide, we’ll walk through building a chatbot using Gemma, LangChain, and ChromaDB - three powerful tools that simplify the process of developing an intelligent, context-aware assistant. Our example use case will be a restaurant menu-ordering chatbot, but the same principles can be applied to other domains.
Why use Gemma, LangChain, and ChromaDB?To create a chatbot that understands user queries, retrieves relevant information, and responds naturally, we’ll leverage the strengths of:
Gemma: A lightweight, open-source large language model from Google, designed for efficient text-based tasks.
LangChain: A framework that simplifies working with LLMs, enabling easy orchestration of components like retrieval and memory.
ChromaDB: A vector database for storing and retrieving relevant context, improving chatbot accuracy.
By combining these technologies, we’ll develop a chatbot that can answer customer questions, provide menu recommendations, and handle inquiries dynamically.
Let's get going ...
Setting up the environmentBefore we start coding, we need to install the required dependencies:
!pip install langchain
!pip install chromadb
!pip install wandb # For logging and monitoring
﻿
After installation, log in to Weights & Biases  to track your experiments:
wandb login
Loading and preparing your dataFor this example, our chatbot will use a restaurant menu stored in a JSON file. We’ll first load this data and convert it into a format suitable for retrieval.
import json
import os
﻿
json_file = "menu.json"
with open(json_file, "r") as f:
    json_data = json.load(f)
﻿
count = 0
folder_path = "/content/Data" #You will need to create this before writing files.
﻿
for dish in json_data:
  file_path = os.path.join(folder_path,"{}.txt".format(count))
  f = open(file_path, "w")
  for key, value in dish.items():
      f.write(f"{key}: {value}\n")
  f.close()
  count+=1
Each dish is stored as a separate text file, making it easy to load and process with LangChain.
Next, we’ll use LangChain’s TextLoader to load the data into documents.
from langchain_community.document_loaders import TextLoader
﻿
loaders = []
﻿
for i in range(12):
  file_path = os.path.join(folder_path,"{}.txt".format(i))
  loaders.append(TextLoader(file_path))
﻿
docs = []
for loader in loaders:
    docs.extend(loader.load())
﻿
Creating the vector databaseWe’ll use Hugging Face’s Inference API to generate embeddings and store them in ChromaDB for efficient retrieval.
from langchain.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
﻿
inference_api_key = "key"
﻿
embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=inference_api_key, model_name="sentence-transformers/all-mpnet-base-v2"
)
﻿
vectordb = Chroma.from_documents(
    documents=docs,
    embedding=embeddings
)
This allows the chatbot to retrieve relevant context when answering user queries.
Loading the language modelWe’ll use Gemma-2B-IT, a lightweight model designed for efficiency. To keep responses focused and accurate, we set temperature=0.1 and limit the output length.
from langchain_community.llms import HuggingFaceHub
﻿
llm = HuggingFaceHub(
    repo_id="google/gemma-2b-it",
    task="text-generation",
    model_kwargs={
        "max_new_tokens": 512,
        "top_k": 5,
        "temperature": 0.1,
        "repetition_penalty": 1.03,
    },
    huggingfacehub_api_token = "API_TOKEN"
)
Defining the chatbot’s behaviorWe’ll use a structured prompt to guide the chatbot’s responses.
from langchain.prompts import PromptTemplate
﻿
template = """You are a Chatbot at a Restaurant. Help the customer pick the right dish to order. The items in the context are dishes. The field below the item is the cost of the dish. About is the description of the dish. Use the context below to answe the questions
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"],template=template,)
Adding memory and retrievalTo maintain conversation history, we’ll use LangChain’s ConversationBufferMemory.
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)
Now, we’ll define a retriever to fetch relevant menu details and initialize a Conversational Retrieval Chain.
from langchain.chains import ConversationalRetrievalChain
﻿
retriever = vectordb.as_retriever()
qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory,
)
Improving query handlingTo ensure context-aware responses, we’ll preprocess user queries using a reformulation chain.
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
﻿
contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{question}"),
    ]
)
contextualize_q_chain = contextualize_q_prompt | llm | StrOutputParser()
Improving query handling for better conversationsTo make our chatbot context-aware, we need a way to reformulate follow-up questions. Users often ask follow-ups without repeating full context, such as:
"What about vegetarian options?"
"Can you suggest something spicy?"
Since the chatbot doesn’t inherently remember the context of each question, we need a method to rewrite user queries before retrieving relevant data.
To do this, we define a contextualization function that checks if chat history exists. If the user’s question depends on previous messages, the function restructures it to make sense in isolation.
def contextualized_question(input: dict):
    if input.get("chat_history"):
        return contextualize_q_chain.invoke(input)
    else:
        return input["question"]
Now, we can integrate this into the retrieval step, ensuring the chatbot fetches relevant information even for follow-up queries.
rag_chain = (
    RunnablePassthrough.assign(
        context=contextualized_question | retriever
    )
    | QA_CHAIN_PROMPT
    | llm
)
This ensures that before sending a query to the database, the chatbot automatically restructures it if needed. This is particularly useful for multi-turn conversations, where users might not provide full context in each message.
Running the chatbotWe’ll set up a conversation loop, allowing users to interact with the chatbot until they type "exit".
from langchain_core.messages import AIMessage, HumanMessage
﻿
os.environ["LANGCHAIN_WANDB_TRACING"] = "true"
os.environ["WANDB_PROJECT"] = "Restaurant_ChatBot"
﻿
print("Welcome to the Restaurant. How can I help you today?")
chat_history = []
﻿
def predict(message, history):
  ai_msg = rag_chain.invoke({"question": message, "chat_history": chat_history})
  idx = ai_msg.find("Answer")
  chat_history.extend([HumanMessage(content=message), ai_msg])
  return ai_msg[idx:]
This completes the chatbot setup. Users can now ask questions, receive recommendations, and interact in a context-aware manner .﻿﻿
Play with the demoYou can head over to the Hugging Face demo hosted here.
﻿
﻿
﻿
References﻿https://docs.wandb.ai/guides/integrations/langchain﻿
﻿https://huggingface.co/docs/transformers/en/model_doc/gemma﻿
﻿https://python.langchain.com/docs/get_started/introduction﻿
﻿
Add a comment
Tags: Articles, LLM, Community Posts
Iterate on AI agents and models faster. Try Weights & Biases today.