Building a Q&A Bot for Weights & Biases' Gradient Dissent Podcast

In this article, we explore how to utilize OpenAI's ChatGPT and LangChain to build a Question-Answering bot for Weights & Biases' podcast series, Gradient Dissent.
Atharva Ingle
Created on April 25|Last edited on April 29
Comment
With the uptick in large language model (LLMs) adoption, we're seeing exciting possibilities for plenty of interesting applications. With just a few lines of code, we can now accomplish tasks that would have been incredibly challenging a few years ago. These breakthroughs have been made possible by the release of various APIs, such as the OpenAI API for ChatGPT and GPT-4, and open-source libraries like LangChain. 
I decided to make my own. 
I was inspired by the WandBot developed by the W&B team which is a GPT-4 powered Chat Support bot for answering technical questions about Weights & Biases. Although I have been a power user of W&B for the past three years and anticipated the areas where it could potentially struggle to answer questions, I was surprised that the WandBot can answer those questions with ease! It's currently deployed on W&B's Discord in the #wandbot channel. 
Check it out here and take a moment to check out the details behind its build and implementation (the most recent work is on the right below).
Creating a Q&A Bot for W&B Documentation
In this article, we run through a description of how to build a question-and-answer (Q&A) bot for Weights & Biases documentation.
WandBot: GPT-4 Powered Chat Support
This article explores how we built a support bot, enriched with documentation, code, and blogs, to answer user questions with GPT-4, Langchain, and Weights & Biases.
﻿
I also want to credit the following tweet by Andrej Karpathy as an inspiration to build this bot.
﻿
Here's what we'll cover today: 
Table of ContentsTable of ContentsWhat We're Building TodayData CollectionSummarizing the Podcast EpisodesWeights & Biases PromptsExtracting Questions from the TranscriptsCreating the Embeddings for the TranscriptsCreating the BotConclusion
﻿
﻿
What We're Building TodayThe Gradient Dissent podcast by Weights & Biases is an excellent source of information for those interested in staying up-to-date with the latest advancements and trends in AI and machine learning. With insightful interviews with leading experts and practitioners, it offers valuable insights and perspectives on the field. I decided to make this information more accessible and interactive. So I built a question answering bot for the podcast series.﻿﻿
This report walks you through how I built this bot using ChatGPT, LangChain and OpenAI embeddings. If want to take it for a spin, it's deployed on the HuggingFace Spaces here. I've also open-sourced the code, so that you can hack it out easily for your similar needs. You can find the code on the GitHub Repo.
Before we dive in, here's how the Gradio App looks like:
The Gradio App is deployed on HuggingFace Spaces here﻿
﻿
Data CollectionThe first step was to get the raw transcripts from the Gradient Dissent podcast playlist. I initially planned to transcribe all the episodes with OpenAI's Whisper but fortunately the episodes are already transcribed accurately. 
I scraped all the episode's transcriptions with LangChain's YoutubeLoader class which uses youtube-transcript-api python package in the backend. I also used pytube to get some extract information from the video. I stored the data simply in a CSV file again inspired by a Andrej reply:
﻿
Here's a short code snippet of the data collection part (you can view the complete script here).
playlist = Playlist(config.playlist_url)
playlist_video_urls = playlist.video_urls
print(f"There are total {len(playlist_video_urls)} videos in the playlist.")
﻿
video_data = []
for video in tqdm(playlist_video_urls, total=len(playlist_video_urls)):
    try:
        curr_video_data = {}
        yt = retry_access_yt_object(video, max_retries=25, interval_secs=2)
        curr_video_data["title"] = yt.title
        curr_video_data["url"] = video
        curr_video_data["duration"] = yt.length
        curr_video_data["publish_date"] = yt.publish_date.strftime("%Y-%m-%d")
        loader = YoutubeLoader.from_youtube_url(video)
        transcript = loader.load()[0].page_content
        transcript = " ".join(transcript.split())
        curr_video_data["transcript"] = transcript
        curr_video_data["total_words"] = len(transcript.split())
        video_data.append(curr_video_data)
    except:
        print(f"Failed to scrape {video}")
print(f"Total podcast episodes scraped: {len(video_data)}")
﻿
# save the scraped data to a csv file
df = pd.DataFrame(video_data)
data_path = config.root_data_dir / "yt_podcast_transcript.csv"
df.to_csv(data_path, index=False)
﻿
And here's a W&B Table showing the scraped data:
﻿
﻿
Finally, here's the corresponding artifact. W&B's Artifacts enabled me to conveniently store different versions of the data while I was quickly iterating and experimenting.
﻿
project("gladiator", "gradient_dissent_qabot").artifact("yt_podcast_transcript")
yt_podcast_transcriptlatest
All Versions
Aliases
latest
Versions
v1
v0
VersionMetadataUsageFilesLineage
Version overview
Full Name
gladiator/gradient_dissent_qabot/yt_podcast_transcript:v1
Aliases
latest
v1
Tags
Digest
1b63033f0666edbbf87ab000619f61af
Created By
fanciful-aardvark-4
Created At
April 24th, 2023 11:24:06
Num Consumers
7
Num Files
1
Size
3.9MB
TTL Remaining
Inactive
Upstream Artifacts
job-https___github.com_Gladiator07_wandb-gradient-dissent-bot.git_src_podcast_data.py:v3
Description
Summarizing the Podcast EpisodesIn order to help app users get a better understanding of podcast content and facilitate their ability to ask relevant questions, I decided to provide a brief summary of the podcast within the app. 
For summarizing the podcast, I used ChatGPT (or OpenAI's gpt-3.5-turbo model). Alas, the transcripts couldn't be fed into the LLM because of the limited context length of the LLM. To mitigate that, I split the document into chunks where each chunk was around 1000 tokens. This was done by LangChain's TokenTextSplitter which uses OpenAI's tiktoken for tokenizing the text to get the token length. There are several other splitters supported by LangChain and I found the TokenTextSplitter to be more robust for this application. However, I would suggest to try out various splitters and chose the one most suitable for your application. 
To create a final summary of the complete transcript, I used a technique called as map_reduce in LangChain. The process involved summarizing each section (aka a chunk of the transcript) individually, then merging the summaries of each section, and finally summarizing the merged summaries to obtain the ultimate summary of the entire transcript. Custom prompts were used for each stage of the summarization pipeline. 
To get a better understanding of the process, here's a function that was used to summarize a single podcast episode. You can view the complete script of the summarization pipeline here.
﻿
﻿
Notice, how I've used different prompt templates for summarizing each chunk and to combine the summaries of the chunks to get the final summary. 
Initially, I tried to use the same prompt (as used for summarizing a chunk) for merging the intermediate summaries but got sub-optimal results. The LLM was not able to summarize all the important events from the podcast thus producing a very short and incomplete summary. I tried several variations of the combine_prompt and the following prompt worked very well and the LLM was able to get all the main important points from the podcast.
﻿
﻿
Weights & Biases PromptsIn the development of LLM applications, understanding the inner workings and identifying issues in the workflow is essential. To debug and see the intermediate results of the LLM flow I used a very recent feature that W&B team shipped called as Weights & Biases Prompts. 
Essentially, it's a suite of LLMOps tools built for the development of LLM-powered applications. We can visualize and inspect the execution flow of the LLMs, analyze the inputs and outputs of the LLMs, view the intermediate results and securely store and manage the prompts and LLM chain configurations.
To use it there's only one line of code that needs to be added to your existing pipeline:
from wandb.integration.langchain import WandbTracer
﻿
WandbTracer.init({"project": "gradient_dissent_qabot"})
﻿
# ----------
# your code
# ----------
﻿
WandbTracer.finish()
Just adding the above lines of code W&B keeps track of all the things that took place during the execution of your LLM flow, your inputs, intermediate results and the outputs.
Note that the WandbTracer is similar to wandb.init() however it is specifically tailored towards tracking LLM flows.
You can see your W&B dashboard populated with something like this after using the WandbTracer magic:
﻿
﻿
The trace that you see above consists of three main components:
﻿Trace table: Overview of the inputs and outputs of a chain.
﻿Trace timeline: Displays the execution flow of the chain and is color-coded according to component types.
﻿Model architecture: View details about the structure of the chain and the parameters used to initialize each component of the chain.
Trace Table provides a detailed breakdown of the input, output, and intermediate steps of your LLM flow. This can be particularly helpful when using multi-step flows like map_reduce or refine in LangChain, and can also identify any errors that occurred during execution.
W&B Prompts also provides a Trace Timeline that displays how your LLM chain was executed. It also allows you to select a specific trace event for more detailed information.
Model architecture provides details about the parameters used to initialize each component of the chain. This includes details about the model itself, the prompt template used, and specifics about each individual component.
I would suggest that you explore the trace in the above interactive Weave Panel to get a better understanding of W&B prompts. You can also learn about W&B prompts in the docs here.
Extracting Questions from the TranscriptsIn addition to providing a summary for each podcast episode, I also decided to provide a set of potential questions that a user may ask to delve deeper into the topic. These questions serve as a starting point for users to explore the material in greater detail, and they can follow up with additional inquiries to gain even more insight.
To achieve this I split each transcript into chunks of texts using TokenTextSplitter similar to summarization pipeline above.
I designed a prompt to extract questions from each chunk and later all the questions from the chunks were concatenated to get the final set of questions for a particular podcast episode.
Here's a short snippet of code demonstrating the process. You can view the complete script here.
﻿
﻿
Notice how the prompt was designed to get 3 questions from a chunk of transcript.
Here's a W&B table showing the results:
﻿
﻿
And here's the LangChain trace:
﻿
Run set1
﻿
Creating the Embeddings for the TranscriptsThe penultimate stage of the development process was to embed the transcripts for efficient retrieval for building our QA bot. Creating index over the data (aka the embeddings) is necessary to access the most relevant documents for a given question to avoid having to pass all the documents to the LLM (saving time and money).
For creating the embeddings, I used OpenAI's text-embedding-ada-002 model. I used Chroma embedding database to store the embeddings.
Here's a short code snippet demonstrating the embedding process. You can view the complete script here.﻿
﻿
﻿
This is the resulting artifact of the embeddings:
﻿
project("gladiator", "gradient_dissent_qabot").artifact("transcript_embeddings")
transcript_embeddingslatest
All Versions
Aliases
latest
Versions
v1
v0
VersionMetadataUsageFilesLineage
Version overview
Full Name
gladiator/gradient_dissent_qabot/transcript_embeddings:v1
Aliases
latest
v1
Tags
Digest
eab10e3d5acd776de7e6ff58c572b9ae
Created By
brisk-cloud-17
Created At
April 25th, 2023 04:53:04
Num Consumers
0
Num Files
492
Size
21.9MB
TTL Remaining
Inactive
Upstream Artifacts
job-https___github.com_Gladiator07_wandb-gradient-dissent-bot.git_src_embed.py:v3summarized_que_podcasts:v1
Description
﻿
You can also view the complete data lineage below. This graphical representation showcases the origin of each dataset produced by a specific run, as well as the subsequent runs that utilized the dataset at different stages of the pipeline. You can get a clear understanding of the relationships between the datasets and runs, as well as the flow of data throughout the pipeline. You can learn more about the lineage and how you can analyze it in your projects here.
﻿
project("gladiator", "gradient_dissent_qabot").artifact("transcript_embeddings")
transcript_embeddingslatest
All Versions
Aliases
latest
Versions
v1
v0
VersionMetadataUsageFilesLineage
Direct lineage view
Expanded graph
Include generated artifacts
Artifact - dataset
summarized_que_podcasts:v1
Artifact - dataset
summarized_podcasts:v5
Artifact - dataset
yt_podcast_transcript:v1
Artifact - dataset
transcript_embeddings:v1
Run - extract_questions
dandy-wave-13
Run - summarize
spring-pine-12
Run - dataset
fanciful-aardvark-4
Run - embed_transcripts
brisk-cloud-17
React Flow
Creating the BotThe final step was to actually create the Question-Answering bot and a Gradio app to stitch everything together.
To answer a user's question, first the question is embedded via the same model used above for embedding the transcripts. Next, the question's embeddings are compared to document's embeddings (chunks of transcripts) via cosine similarity. 
The top two most relevant documents (chunks of transcripts) are retrieved with the highest cosine similarity score. The retrieved chunks are passed in a prompt to get the final answer from the LLM. The prompt designing was quite crucial for this stage to get the relevant answers and avoid LLM hallucinations.
Here's a code snippet demonstrating the above process. The complete script for the Gradio app can be found here.
﻿
﻿
﻿
A screenshot of final app can be seen below. You can interact with it on the HuggingFace Space.
﻿
ConclusionAnd that is how I built a Q&A bot for Gradient Dissent podcast. This was a fun weekend project for me and there are a lot of potential improvements and future work. For now, this bot doesn't have any evaluation pipeline. Although evaluating LLMs can be quite challenging, an evaluation pipeline can be added to this bot similar to Wandbot evaluation pipeline built by the W&B team to get a quantifiable sense of how this bot performs in the real world.
If you are interested in leveraging Weights & Biases for your own machine learning projects, sign up for a free account and explore the wide array of features and capabilities that can help you build, track, and improve your models with ease. I would also strongly recommend you to try out W&B Prompts to streamline your LLM workflows.
﻿
Creating a Q&A Bot for W&B Documentation
In this article, we run through a description of how to build a question-and-answer (Q&A) bot for Weights & Biases documentation.
WandBot: GPT-4 Powered Chat Support
This article explores how we built a support bot, enriched with documentation, code, and blogs, to answer user questions with GPT-4, Langchain, and Weights & Biases.
Prompt Engineering LLMs with LangChain and W&B
Join us for tips and tricks to improve your prompt engineering for LLMs. Then, stick around and find out how LangChain and W&B can make your life a whole lot easier.
Understanding LLMOps: Large Language Model Operations
This article explores how large language models (LLMs) are changing the way we build AI-powered products and the landscape of machine learning operations (MLOps). 
﻿
﻿
﻿
Add a comment
Tags: Articles, Gradient Dissent, LLM, Experiment, Tutorial, Intermediate, Text Generation, GenAI
Iterate on AI agents and models faster. Try Weights & Biases today.