Creating a Q&A Bot for W&B Documentation

In this article, we run through a description of how to build a question-and-answer (Q&A) bot for Weights & Biases documentation.
Bharat Ramanathan
Created on February 13|Last edited on March 17
Comment
﻿
IntroductionThis article is a description of the documentation Q&A bot I built as part of the Replit x Weights & Biases ML Hackathon.  The bot uses OpenAI's GPT3 to answer natural language questions and developer queries related to Weights & Biases documentation. I use Langchain, Openai Embeddings, and FAISS to create the Q&A backend, and the bot is served as a Gradio application.
This is a very simple and rudimentary proof-of-concept to get a feel for what Q&A over documentation might look like. There is a lot of scope for improvement in various parts of the pipeline to make this a production ready application. 
💡
Here's a quick preview of what's in this article: 
IntroductionCreating the Documentation DatasetData CollectionPreprocessingData IngestionCreating the DocumentsCreating the FAISS indexCreating the Q&A BotDesigning a Robust Prompt for the LLMCreating the Q&A PipelineCreating the Chat InterfaceThe User InterfaceFinal Thoughts and Future Work
﻿
﻿
 Before we dive in, here's what the repl looks like: 
﻿
Checkout the repl on replit.com here: 🪄🐝 Documentation Q&A bot With LangChain and OpenAI﻿
Credit: This bot was largely inspired by the following tweet:
💡
﻿
﻿
﻿
Creating the Documentation Dataset
Data CollectionThe W&B documentation can be found at docs.wandb.ai. It contains guides, API references, and examples. 
Scraping this was more challenging than I had initially thought. I took an alternate route by collecting the documentation from the W&B/docodile GitHub repository instead. In the repository, each webpage is represented as a markdown file organized in sub-directories that represent the website's tree structure. This also made it easier to parse the documentation. For completeness, I also added data from top forum questions, support-rotation tickets, and API developer references. This additional data can be found organized in the following google spreadsheet - wandb_bot finetune data. 
PreprocessingThe only preprocessing I did was to remove multiple new lines (think \n) from the documentation text and to convert the spreadsheet data into a single document. The data was finally stored as a JSONL file with a source key to represent the source of the text. Here's the artifact containing the dataset:
﻿
project("parambharat", "wandb_docs_bot").artifact("docs_dataset")
docs_datasetVersion 1
All Versions
Aliases
latest
Versions
v8
v7
v6
v5
v4
v3
v2
v1
v0
VersionMetadataUsageFilesLineage
Version overview
Full Name
parambharat/wandb_docs_bot/docs_dataset:v1
Aliases
v1
Tags
Digest
cc8d0cc556fc5f399cda1dca42ce468f
Created By
azure-moon-3
Created At
February 8th, 2023 05:58:10
Num Consumers
19
Num Files
2
Size
2.7KB
TTL Remaining
Inactive
Upstream Artifacts
faiss_store:v0
Description
We can retrieve the dataset artifact by running the following code:
PROJECT = "wandb_docs_bot"
run = wandb.init(project=PROJECT)
﻿
def download_raw_dataset():
    dataset_artifact_path = 'parambharat/wandb_docs_bot/docs_dataset:latest'
    artifact = run.use_artifact(dataset_artifact_path, type='dataset')
    artifact_path = artifact.get_path("wandb_docs.json")
    file = artifact_path.download()
    return file
Data Ingestion
Creating the DocumentsThe next step was to store the documents, metadata, and their corresponding OpenAI embeddings for search and retrieval. Langchain, by default, uses the text-embedding-ada-002 model to embed the documents. The model generates 1536 dimensional embedding for documents up to 8191 tokens in length.
However, at query time, we will also be passing the retrieved documents through the text-davinci-003 model. This model has a context length of 4096. Therefore, I split the text into chunks of 1024 characters and stored each chunk as a Langchain Document along with the corresponding source file stored as metadata.
Note: I used the CharacterTextSplitter class from langchain to do the chunking. Should have used TokenTextSplitter.from_tiktoken_encoder instead.
💡
Here's the code I used for this step:
import json
from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter
﻿
def load_documents(fname):
    source_chunks = []
    splitter = CharacterTextSplitter(separator=" ", chunk_size=1024, chunk_overlap=0)
    for line in open(fname, "r"):
        line = json.loads(line)
        for chunk in splitter.split_text(line["reference"]):
            source_chunks.append(Document(page_content=chunk, metadata={"source": line["source"]}))
    return source_chunks
Creating the FAISS indexFinally, we are ready to call the Openai Embeddings API Endpoint and store the documents as indexed on the embeddings for dense vector search and retrieval. While there are many choices for storing the vectors, I used faiss-gpu since it could be easily installed via pip and run on replit. A more production-ready version should use a more resilient vector store database or services like qdrant or weaviate.
Here's the code to create and store the document embeddings using Langchain.
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.faiss import FAISS
﻿
def create_and_save_index(documents):
    store = FAISS.from_documents(documents,OpenAIEmbeddings())
    artifact = wandb.Artifact("faiss_store", type="search_index")
    faiss.write_index(store.index, "docs.index")
    artifact.add_file("docs.index")
    store.index = None
    with artifact.new_file("faiss_store.pkl", "wb") as f:
        pickle.dump(store, f)
    wandb.log_artifact(artifact, "docs_index", type="embeddings_index")
    return store
The code above stores the documents index and embeddings separately as files in a single artifact.  Checkout the artifact for this below. 
💡
﻿
project("parambharat", "wandb_docs_bot").artifact("faiss_store")
faiss_storeVersion 1
All Versions
Aliases
latest
Versions
v7
v6
v5
v4
v3
v2
v1
v0
VersionMetadataUsageFilesLineage
> root
Directory
Directory
Object
docs.index
7.4MB
faiss_store.pkl
1.3MB
﻿
Checkout the ingest.py file in the repl to know how all of the above code was put together for data ingestion.
💡
Creating the Q&A BotWith the data ready and in the right format, we are almost ready to create our documentation bot. 
Keep in mind: we want our bot to be a conversation agent. While GPT-3 has shown to be a reasonably good zero-shot performance for in-context question answering, we need to design a prompt that is robust and ensures that the model hallucinations are kept to a minimum. This brings us to prompt designing.
Designing a Robust Prompt for the LLMWhile prompt designing has evolved from an art to a science in recent days, I still tend to treat it like an art. Case in point: I drew inspiration from and imitated ﻿other prompt engineers in the field to design a prompt that worked well for the use case. Here's the final prompt that I created for the bot:
﻿
project("parambharat", "wandb_docs_bot").artifact("docs_dataset").membershipForAlias("v1").artifactVersion.file("combine_prompt.txt")
You are an AI assistant for the open source library wandb. The documentation is located at https://docs.wandb.ai.
You are given the following extracted parts of a long document and a question. Provide a conversational answer with a hyperlink to the documentation.
You should only use hyperlinks that are explicitly listed as a source in the context. Do NOT make up a hyperlink that is not listed.
If the question includes a request for code, provide a code block directly from the documentation.
If you don't know the answer, just say "Hmm, I'm not sure." Don't try to make up an answer.
If the question is not about wandb, politely inform them that you are tuned to only answer questions about wandb.

QUESTION: How to log audio with wandb?
=========
Content: Weights & Biases supports logging audio data arrays or file that can be played back in W&B. You can log audio with `wandb.Audio()`
Source: 28-pl
Content: # Log an audio array or file
wandb.log({{"my whale song": wandb.Audio(
    array_or_path, caption="montery whale 0034", sample_rate=32)}})

# OR  

# Log your audio as part of a W&B Table
my_table = wandb.Table(columns=["audio", "spectrogram", "bird_class", "prediction"])
for (audio_arr, spec, label) in my_data:
       pred = model(audio)
       
       # Add the data to a W&B Table
       audio = wandb.Audio(audio_arr, sample_rate=32)
       img = wandb.Image(spec)
       my_table.add_data(audio, img, label, pred) 

# Log the Table to wandb
 wandb.log({{"validation_samples" : my_table}})'
Source: 30-pl
=========
FINAL ANSWER: Here is an example of how to log audio with wandb:

```
import wandb

# Create an instance of the wandb.data_types.Audio class
audio = wandb.data_types.Audio(data_or_path="path/to/audio.wav", sample_rate=44100, caption="My audio clip")

# Get information about the audio clip
durations = audio.durations()
sample_rates = audio.sample_rates()

# Log the audio clip
wandb.log({{"audio": audio}})
```
SOURCES: 28-pl 30-pl

QUESTION: How to eat vegetables using pandas?
=========
Content: ExtensionArray.repeat(repeats, axis=None) Returns a new ExtensionArray where each element of the current ExtensionArray is repeated consecutively a given number of times. 

Parameters: repeats int or array of ints. The number of repetitions for each element. This should be a positive integer. Repeating 0 times will return an empty array. axis (0 or ‘index’, 1 or ‘columns’), default 0 The axis along which to repeat values. Currently only axis=0 is supported.
Source: 0-pl
=========
FINAL ANSWER: You can't eat vegetables using pandas. You can only eat them using your mouth.
SOURCES:

Question: {question}
=========
{summaries}
=========
Answer in Markdown:
While the prompt is quite long, it describes exactly how we want the language model to behave and provides few-shot examples to ensure that the model generates the response in the desired fashion.
NOTE: Langchain prompts uses jinja templates. Where a {xxx} is used for placeholder text. If you have code blocks in your prompt you can escape { with a double {{ . See the above prompt for an example.
💡
The above prompt template can be easily downloaded using the following code snippet.
def load_prompt():
  dataset_artifact_path = 'parambharat/wandb_docs_bot/docs_dataset:latest'
  artifact = run.use_artifact(dataset_artifact_path, type='dataset')
  artifact_path = artifact.get_path("combine_prompt.txt")
  file = artifact_path.download()
  prompt_template = (open(file, "r").read())
  prompt = PromptTemplate(input_variables=["question", "summaries"],
                          template=prompt_template)
  return prompt
Creating the Q&A PipelineTo create the Q&A pipeline that references the documents at query time, I made use of the VectorDBQAWithSourcesChain in Langchain. As the name suggests, the chain uses a vector store to first query the nearest embeddings of a given query. The retrieved documents are then inserted into the prompt along with the query and passed to the LLM to generate a response. Here's the code snippet used to achieve this:
def load_chain(openai_api_key):
  if validate_openai_key(openai_api_key):
    vectorstore = load_vectostore()
    prompt = load_prompt()
    chain = VectorDBQAWithSourcesChain.from_chain_type(
      llm=OpenAI(temperature=0, openai_api_key=openai_api_key),
      chain_type="map_reduce",
      vectorstore=vectorstore,
      combine_prompt=prompt,)
    return chain
﻿
def get_answer(question, chain):
  if chain is not None:
    result = chain(
      {
        "question": question,
      },
      return_only_outputs=True,
    )
    response = f"Answer:\t{result['answer']}\n\nSources:\t{result['sources']}\n"
    return response
Creating the Chat InterfaceWhen creating a chat interface, it's important to ensure that the user inputs, data, and model responses are stored in a stateful way. This ensures that follow-up queries make use of the state of the chat and that the chat can be rendered fully along with the user query and model response in the UI. We achieve this by creating a wrapper class that makes sure that the above Q&A chain is initialized at the beginning of a chat. Here's the code:
class Chat:
  def __init__(self):
    self.chain = None
﻿
  def __call__(self, message, history, openai_api_key):
    if self.chain is None:
      self.chain = load_chain(openai_api_key)
﻿
    history = history or []
    message = message.lower()
    response = get_answer(message, self.chain)
    if response is None:
      response = "Please enter a valid Openai API Key and try again. "
    history.append((message, response))
    return history, history
Note that we maintain the chat state in the history variable and use the class to store an initialized chain upon the first call. 
💡
The User InterfaceUsing Gradio was incredibly easy to create a simple UI for the application. The library even provides a Chatbot class that implements a text chatbot interface. I created a very minimal and simple interface that takes the user question and their open-API-key as text input and displays the LLMs output in response. Here's the code to do this:
with gr.Blocks() as demo:
  with gr.Row():
    question = gr.Textbox(
      label='Type in your questions about wandb here and press Enter!',
      placeholder='How do i log images with wandb ?')
    openai_api_key = gr.Textbox(type='password',
                                label="Enter your OpenAI API key here",)
  state = gr.State()
  chatbot = gr.Chatbot()
  question.submit(Chat(), [question, state, openai_api_key], [chatbot, state]) 
﻿
I also added a simple HTML block in the code above to introduce the bot and its usage. The full code can be seen in the main.py file of the repl.
💡
﻿
The final application can be seen below:
﻿Check out the final application hosted on replit by clicking on this description﻿
﻿
Final Thoughts and Future WorkThis hackathon was a really cool and fun opportunity that I thoroughly enjoyed. I learned that LLMs can be used to create many interesting applications over existing data and resources. I was also able to understand how it is possible to overcome the prompt-length limitations in LLMs using embeddings and semantic search. While the chatbot developed was quite simple and has a lot of scope for improvement, I still think it's a powerful way to use LLMs to automate mundane tasks and create rich user experiences.
The project also inspired me to work on more applications of LLMs. One such idea I'm currently exploring as a side project is to generate chapters and summaries for Gradient Dissent episodes using LLM embeddings and LangChain. I'll post a report and update you soon!
﻿
Add a comment
Tags: Articles, Intermediate, Experiment, GPT, Panels, NLP, Question Answering
Iterate on AI agents and models faster. Try Weights & Biases today.