Training a chatbot on personal data with LlamaIndex and W&B
In this article, we'll go over about how we can create a chatbot on personal data using Llamaindex and local models with a Weights & Biases integration
Created on June 15|Last edited on June 20
Comment

In this article, we'll go over creating a simple chatbot on some private data using local models (both llm and embedding model) using Llamaindex and Weights & Biases.
Table of Contents
Example data
In this example, Let's try to create a system which we can use to ask questions about the original Jurassic Park Screenplay 🦖 !!! (available online at jurassicpost) llamaindex has great modules which can enable us to easily process and use the PDF as a dataset, viz.
from llama_index.core import Documentfrom llama_index.readers.file import PyMuPDFReaderdocuments = PyMuPDFReader().load(file_path='/content/JurassicPark-Final.pdf', metadata=True)doc_text = "\n\n".join([d.get_content() for d in documents])docs = [Document(text=doc_text)]
Configuring Llamaindex to use local models
Since in this example we want to use local models for both creating the embeddings and conversing with the model we need to update the global settings and change the embed_model and llm. We can use the BAAI/bge-base-en-v1.5 embedding model as follows:
from llama_index.core import Settingsfrom llama_index.embeddings.huggingface import HuggingFaceEmbeddingSettings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
Similarly we need to modify the llm setting as well (otherwise it defaults to OpenAI). Let's use the Writer/camel-5b-hf model
import torchfrom llama_index.core import Settingsfrom llama_index.core import PromptTemplatefrom llama_index.llms.huggingface import HuggingFaceLLMquery_wrapper_prompt = PromptTemplate("Below is an instruction that describes a task. ""Write a response that appropriately completes the request.\n\n""### Instruction:\n{query_str}\n\n### Response:")Settings.llm = HuggingFaceLLM(context_window=2048,max_new_tokens=256,generate_kwargs={"do_sample": False},query_wrapper_prompt=query_wrapper_prompt,tokenizer_name=config.model,model_name=config.model,device_map="auto",tokenizer_kwargs={"max_length": 2048},model_kwargs={"torch_dtype": torch.float16})
Creating and using a index
Creating and Using a Vector Index is comparatively very easy, viz.
from llama_index.core import VectorStoreIndexindex = VectorStoreIndex.from_documents(documents)# We can log the index as a wandb artifact !!wandb_callback.persist_index(index, index_name="camel-5b-hf-index")
The following vector index, is available as a public artifact here. To directly download, use the following snippet
from llama_index.callbacks.wandb import WandbCallbackHandlerwandb_callback = WandbCallbackHandler(run_args=run_args)storage_context = wandb_callback.load_storage_context(artifact_url="sauravmaheshkar/llamaindex-local-models/camel-5b-hf-index:v0")# Load the index and initialize a query engineindex = load_index_from_storage(storage_context,)
To query the model on any given user provided question, we can simply use it as a query index viz.
query_engine = index.as_query_engine()response = query_engine.query("Are Velociraptors pack hunters?")
Conclusion
In this article, you read through a brief overview of using a model on private data using locally generated embeddings and how we can use Weights & Biases to log and store the various artifacts.
To see the full suite of W&B features, please check out this short 5-minute guide. If you want more reports covering the math and "from-scratch" code implementations, let us know in the comments down below or on our forum ✨!
Building Advanced Query Engine and Evaluation with LlamaIndex and W&B
This report showcases a few cool evaluation strategies and touches upon a few advanced features in LlamaIndex that can be used to build LLM-based QA bots. It also shows, the usefulness of W&B for building such a system.
Building a RAG-Based Digital Restaurant Menu with LlamaIndex and W&B Weave
Powered by RAG, we will transform the traditional restaurant PDF menu into an AI powered interactive menu!
Add a comment