Creating a Q&A Bot for W&B Documentation
In this article, we run through a description of how to build a question-and-answer (Q&A) bot for Weights & Biases documentation.
Created on February 13|Last edited on March 17
Comment
Introduction
This article is a description of the documentation Q&A bot I built as part of the Replit x Weights & Biases ML Hackathon. The bot uses OpenAI's GPT3 to answer natural language questions and developer queries related to Weights & Biases documentation. I use Langchain, Openai Embeddings, and FAISS to create the Q&A backend, and the bot is served as a Gradio application.
This is a very simple and rudimentary proof-of-concept to get a feel for what Q&A over documentation might look like. There is a lot of scope for improvement in various parts of the pipeline to make this a production ready application.
💡
Here's a quick preview of what's in this article:
IntroductionCreating the Documentation DatasetData CollectionPreprocessingData IngestionCreating the DocumentsCreating the FAISS indexCreating the Q&A BotDesigning a Robust Prompt for the LLMCreating the Q&A PipelineCreating the Chat InterfaceThe User InterfaceFinal Thoughts and Future Work
Before we dive in, here's what the repl looks like:

Credit: This bot was largely inspired by the following tweet:
💡
Creating the Documentation Dataset
Data Collection
The W&B documentation can be found at docs.wandb.ai. It contains guides, API references, and examples.
Scraping this was more challenging than I had initially thought. I took an alternate route by collecting the documentation from the W&B/docodile GitHub repository instead. In the repository, each webpage is represented as a markdown file organized in sub-directories that represent the website's tree structure. This also made it easier to parse the documentation. For completeness, I also added data from top forum questions, support-rotation tickets, and API developer references. This additional data can be found organized in the following google spreadsheet - wandb_bot finetune data.
Preprocessing
The only preprocessing I did was to remove multiple new lines (think \n) from the documentation text and to convert the spreadsheet data into a single document. The data was finally stored as a JSONL file with a source key to represent the source of the text. Here's the artifact containing the dataset:
docs_dataset
Version overview
Full Name
parambharat/wandb_docs_bot/docs_dataset:v1
Aliases
v1
Tags
Digest
cc8d0cc556fc5f399cda1dca42ce468f
Created By
Created At
February 8th, 2023 05:58:10
Num Consumers
19
Num Files
2
Size
2.7KB
TTL Remaining
Inactive
Upstream Artifacts
Description
We can retrieve the dataset artifact by running the following code:
PROJECT = "wandb_docs_bot"run = wandb.init(project=PROJECT)def download_raw_dataset():dataset_artifact_path = 'parambharat/wandb_docs_bot/docs_dataset:latest'artifact = run.use_artifact(dataset_artifact_path, type='dataset')artifact_path = artifact.get_path("wandb_docs.json")file = artifact_path.download()return file
Data Ingestion
Creating the Documents
The next step was to store the documents, metadata, and their corresponding OpenAI embeddings for search and retrieval. Langchain, by default, uses the text-embedding-ada-002 model to embed the documents. The model generates 1536 dimensional embedding for documents up to 8191 tokens in length.
However, at query time, we will also be passing the retrieved documents through the text-davinci-003 model. This model has a context length of 4096. Therefore, I split the text into chunks of 1024 characters and stored each chunk as a Langchain Document along with the corresponding source file stored as metadata.
Note: I used the CharacterTextSplitter class from langchain to do the chunking. Should have used TokenTextSplitter.from_tiktoken_encoder instead.
💡
Here's the code I used for this step:
import jsonfrom langchain.docstore.document import Documentfrom langchain.text_splitter import CharacterTextSplitterdef load_documents(fname):source_chunks = []splitter = CharacterTextSplitter(separator=" ", chunk_size=1024, chunk_overlap=0)for line in open(fname, "r"):line = json.loads(line)for chunk in splitter.split_text(line["reference"]):source_chunks.append(Document(page_content=chunk, metadata={"source": line["source"]}))return source_chunks
Creating the FAISS index
Finally, we are ready to call the Openai Embeddings API Endpoint and store the documents as indexed on the embeddings for dense vector search and retrieval. While there are many choices for storing the vectors, I used faiss-gpu since it could be easily installed via pip and run on replit. A more production-ready version should use a more resilient vector store database or services like qdrant or weaviate.
Here's the code to create and store the document embeddings using Langchain.
from langchain.embeddings.openai import OpenAIEmbeddingsfrom langchain.vectorstores.faiss import FAISSdef create_and_save_index(documents):store = FAISS.from_documents(documents,OpenAIEmbeddings())artifact = wandb.Artifact("faiss_store", type="search_index")faiss.write_index(store.index, "docs.index")artifact.add_file("docs.index")store.index = Nonewith artifact.new_file("faiss_store.pkl", "wb") as f:pickle.dump(store, f)wandb.log_artifact(artifact, "docs_index", type="embeddings_index")return store
The code above stores the documents index and embeddings separately as files in a single artifact. Checkout the artifact for this below.
💡
faiss_store
Checkout the ingest.py file in the repl to know how all of the above code was put together for data ingestion.
💡
Creating the Q&A Bot
With the data ready and in the right format, we are almost ready to create our documentation bot.
Keep in mind: we want our bot to be a conversation agent. While GPT-3 has shown to be a reasonably good zero-shot performance for in-context question answering, we need to design a prompt that is robust and ensures that the model hallucinations are kept to a minimum. This brings us to prompt designing.
Designing a Robust Prompt for the LLM
While prompt designing has evolved from an art to a science in recent days, I still tend to treat it like an art. Case in point: I drew inspiration from and imitated other prompt engineers in the field to design a prompt that worked well for the use case. Here's the final prompt that I created for the bot:
While the prompt is quite long, it describes exactly how we want the language model to behave and provides few-shot examples to ensure that the model generates the response in the desired fashion.
NOTE: Langchain prompts uses jinja templates. Where a {xxx} is used for placeholder text. If you have code blocks in your prompt you can escape { with a double {{ . See the above prompt for an example.
💡
The above prompt template can be easily downloaded using the following code snippet.
def load_prompt():dataset_artifact_path = 'parambharat/wandb_docs_bot/docs_dataset:latest'artifact = run.use_artifact(dataset_artifact_path, type='dataset')artifact_path = artifact.get_path("combine_prompt.txt")file = artifact_path.download()prompt_template = (open(file, "r").read())prompt = PromptTemplate(input_variables=["question", "summaries"],template=prompt_template)return prompt
Creating the Q&A Pipeline
To create the Q&A pipeline that references the documents at query time, I made use of the VectorDBQAWithSourcesChain in Langchain. As the name suggests, the chain uses a vector store to first query the nearest embeddings of a given query. The retrieved documents are then inserted into the prompt along with the query and passed to the LLM to generate a response. Here's the code snippet used to achieve this:
def load_chain(openai_api_key):if validate_openai_key(openai_api_key):vectorstore = load_vectostore()prompt = load_prompt()chain = VectorDBQAWithSourcesChain.from_chain_type(llm=OpenAI(temperature=0, openai_api_key=openai_api_key),chain_type="map_reduce",vectorstore=vectorstore,combine_prompt=prompt,)return chaindef get_answer(question, chain):if chain is not None:result = chain({"question": question,},return_only_outputs=True,)response = f"Answer:\t{result['answer']}\n\nSources:\t{result['sources']}\n"return response
Creating the Chat Interface
When creating a chat interface, it's important to ensure that the user inputs, data, and model responses are stored in a stateful way. This ensures that follow-up queries make use of the state of the chat and that the chat can be rendered fully along with the user query and model response in the UI. We achieve this by creating a wrapper class that makes sure that the above Q&A chain is initialized at the beginning of a chat. Here's the code:
class Chat:def __init__(self):self.chain = Nonedef __call__(self, message, history, openai_api_key):if self.chain is None:self.chain = load_chain(openai_api_key)history = history or []message = message.lower()response = get_answer(message, self.chain)if response is None:response = "Please enter a valid Openai API Key and try again. "history.append((message, response))return history, history
Note that we maintain the chat state in the history variable and use the class to store an initialized chain upon the first call.
💡
The User Interface
Using Gradio was incredibly easy to create a simple UI for the application. The library even provides a Chatbot class that implements a text chatbot interface. I created a very minimal and simple interface that takes the user question and their open-API-key as text input and displays the LLMs output in response. Here's the code to do this:
with gr.Blocks() as demo:with gr.Row():question = gr.Textbox(label='Type in your questions about wandb here and press Enter!',placeholder='How do i log images with wandb ?')openai_api_key = gr.Textbox(type='password',label="Enter your OpenAI API key here",)state = gr.State()chatbot = gr.Chatbot()question.submit(Chat(), [question, state, openai_api_key], [chatbot, state])
I also added a simple HTML block in the code above to introduce the bot and its usage. The full code can be seen in the main.py file of the repl.
💡
The final application can be seen below:
Final Thoughts and Future Work
This hackathon was a really cool and fun opportunity that I thoroughly enjoyed. I learned that LLMs can be used to create many interesting applications over existing data and resources. I was also able to understand how it is possible to overcome the prompt-length limitations in LLMs using embeddings and semantic search. While the chatbot developed was quite simple and has a lot of scope for improvement, I still think it's a powerful way to use LLMs to automate mundane tasks and create rich user experiences.
The project also inspired me to work on more applications of LLMs. One such idea I'm currently exploring as a side project is to generate chapters and summaries for Gradient Dissent episodes using LLM embeddings and LangChain. I'll post a report and update you soon!
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.