Building a RAG system with Gemini Pro for healthcare queries
Learn how W&B can help you build a RAG System with Gemini Pro to handle healthcare queries efficiently.
Created on June 1|Last edited on March 1
Comment

Introduction
Today, with the changing landscape in health care, it is not just a need but a critical necessity to be able to have the retrieval of information and its application into practice fast. Medical professionals and researchers find themselves entwined in a massive sea of data ranging from clinical trials and patient records to the latest findings in medical research. The problem is not that neither the information to be retrieved is great, but the complexity and variability of the information that has to be accessed must be accessed timely and accurately.
This article presents an advanced framework that powers retrieval-augmented generation, putting sophistication into language models designed for information retrieval and language generation. This paper explores the ways an advanced generative language model, Gemini, can be useful in the creation of powerful healthcare-oriented RAG systems. With the development of the RAG system enabled by Gemini, the system will now be able to revolutionize the information-retrieval landscape in the domain of medicine by furnishing answers that are more precise and contextually relevant due to the nuanced comprehension and its resultant generating capabilities.
This article describes the basics of the Gemini LLM and its integration with the RAG systems, the practical construction of a RAG system using Gemini, and a use case within a medical information system. Continuing further with this reading, we are going to give you a hands-on tutorial with step-by-step instructions on developing your own RAG system powered by Gemini, with code examples and performance tracking.
By the end of this reading, a comprehensive understanding of how these technologies could be used to enhance health information retrieval considerably will be in place.
Understanding Gemini LLM

Gemini LLM is good at understanding and generating language, particularly in tasks requiring a deep understanding of context and detail. We need that in healthcare. Gemini is designed to include the latest advances in natural language processing, and this gives it the ability to understand complex medical jargon and patient data much better than many of its predecessors.
Enhanced retrieval-augmented generation in healthcare
In healthcare, retrieved information can be a matter of life and death. Accuracy is paramount.
Gemini LLM improves systems of retrieval-augmented generation by marrying the potential of deep learning capabilities shown by generative models and precision of traditional search techniques.
We're hoping that we can combine automatic database reading, identification of the most relevant information, and retrieval in a way to provide contextual, correct answers and information. This is especially needed as medical questions are multifaceted and require nuanced answers.
Why we're using Gemini

While models such as ChatGPT have set precedents in understanding and generating language, Gemini LLM goes one step further by directly building enhanced retrieval mechanisms into its architecture.
That allows Gemini to pull in text not only based on what it would be aware of—like ChatGPT—but, more importantly, pull in the most recent and relevant information from databases that are constantly being updated.
In healthcare, new research is being published every day, and the fact that Gemini's responses can always be up to date with the information taken from external sources gives Gemini the edge here.
Furthermore, during the design of the algorithm, a primary consideration is the concern for data sensitivity and privacy issues. Gemini can be fine-tuned to satisfy the health regulations and privacy standards of the HIPAA for deployment of such technologies, ensuring that all the ethical boundaries between doctors and patients are kept intact.
What is RAG?
A retrieval-augmented generation (RAG) system is an advanced framework that combines information retrieval and generative language processing. The RAG system increases the capability of language models by fetching relevant information from external sources before text generation begins. It helps to build a bridge between large data stores and the required context-sensitive response in real-time.
The big advantage of the RAG system is that it can lead to a substantial increase in the accuracy of predictions. The fact that retrieval is included in the generation process helps ensure that the responses generated are rich in context, precise, and informative. This is especially important for areas like healthcare, where the accuracy of the information directly influences the diagnosis and treatment decisions.
Here is a simple diagram to illustrate how a RAG system operates:

How combining retrieval and generation improves prediction accuracy
One of the main differences that distinguishes these models from the others is the traditional generative model, where the generated responses are based solely on the training data and inner knowledge. On the other hand, RAG systems add a retrieval step where the model queries some external database to retrieve relevant pieces of information before formulating the response. It makes the model able to enhance the result in terms of accuracy and relevance with the latest up-to-date data.
Applications in Healthcare Forecasting
- Diagnosis: With current medical research and information from patients' data, RAG systems can assist doctors in making some diagnoses.
- Therapeutic recommendations: Through state-of-the-art treatment guidelines and scientific research, RAG systems can formulate well-thought-out treatment recommendations with a treatment plan that is tailored to the patient, along with a warning against the repercussions of drug interaction.
- Epidemiological forecasting: They can model the spread patterns of the disease and the tendency for outbreaks using the integration of real-time data from diverse health monitoring systems.
These applications not only demonstrate the utility of RAG systems in health care but also illuminate the potential that such technologies bring to lead to more informed and effective medical practice.
The use case: Retrieving key medical information

With the potential of rummaging through immense stores of medical records, research papers, and data from clinical trials, among others, searching for support for medical decisions is significantly simplified with RAG systems.
For example, it could be a query from a healthcare professional on treatment for a rare disease; the RAG system would not just pull related medical journal articles but also patient data to create personalized treatment insights. Therefore, this will save one's time and ensure that the information gathered will be not only comprehensive but also specific to the needs of the patients.
Description of our healthcare dataset
For practical application in this article, our RAG system will focus on the orthopedic PDFs. We could also use:
Clinical trial data: This forms detailed data on historical clinical trials that may involve, inter alia, patient demographic data, treatment protocols used during the clinical trials, outcome data, and side effects data recorded during the trials.
Medical journals: This is a database that has greatly addressed various medical scopes by providing a collection of articles and publications. It offers insight into current research, studies, and findings in the field of medicine.
Patient records: Anonymized information on the patient may include symptoms, diagnosis, treatment given, or the outcome after follow-up. This sort of information can identify patient-specific response variations in different treatments.
Such functionalities for clinical decision-making and ongoing medical research can thus be supported in a RAG system that can integrate varied sources of information in such medical information system environments.
Setting the stage for practical application
A detailed, step-by-step procedure to create an RAG setup in a healthcare database is given in the following section of this article.
We'll describe the procedure, including a step-by-step of implementing a RAG system over Gemini, with all code snippets and practical hints, in order to show how you could improve retrieval and processing using this system in a medical scenario.
Step-by-step tutorial: Developing your own RAG system
First, we need to set up our environment by installing the necessary libraries. Here are the key libraries required for this tutorial:
- PyPDF2: For extracting text from PDF documents.
- Langchain-Google-GenAI: Integration with Google's Generative AI models.
- FAISS: A library for efficient similarity search and clustering of dense vectors.
- Weights & Biases: For logging and tracking experiments.
- Python-Dotenv: For managing environment variables.
Install these libraries using the following command:
!pip install weave PyPDF2 langchain langchain-google-genai faiss-gpu wandb python-dotenv
W&B Weave
Weave is an awesome tool for building RAG systems. It integrates seamlessly with Weights & Biases to log and visualize the performance of your models. By using Weave, you can track how effectively the RAG system retrieves and generates accurate, contextually relevant medical information.
This integration simplifies managing LLM experiments, allowing real-time analysis of how the RAG systems process and utilize large datasets, such as medical records and clinical trial data. Weave ensures transparency, which is crucial for developing sophisticated, reliable healthcare solutions using RAG systems. Weave can be integrated by simply adding the @weave.op() decorator above the function that involves generating your responses, which log inputs, outputs, and internal traces of the function.
Loading the Gemini model and data
Next, we will load the Gemini LLM and our data. The data in this case will be a PDF containing medical information.
We will use the PyPDF2 library to extract text from the PDF and then split the text into manageable chunks using Langchain's RecursiveCharacterTextSplitter.
Here's the code for extracting and splitting the text:
Step 1: Importing the necessary libraries
import osfrom PyPDF2 import PdfReaderfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom dotenv import load_dotenv
Step 2: Loading the environment variables
load_dotenv()os.getenv("GOOGLE_API_KEY")
Step 3: Define the extracted text from the PDF function
def get_pdf_text(pdf_path):with open(pdf_path, "rb") as file:reader = PdfReader(file)text = "".join(page.extract_text() for page in reader.pages if page.extract_text())return text
Step 4: Split text into manageable chunks
def get_text_chunks(text):splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=1000)return splitter.split_text(text)
Step 5: Example use
pdf_path = "/content/L1-Introduction To Orthopedics.pdf" # Change this to your actual PDF file pathraw_text = get_pdf_text(pdf_path)text_chunks = get_text_chunks(raw_text)
Integrating the RAG system
Now that we have our text data in chunks, we can create a vector store using FAISS for efficient similarity search. We'll use GoogleGenerativeAIEmbeddings to create embeddings for our text chunks and then save the vector store locally.
Here's the code for creating and saving the vector store:
Step 1: Importing the necessary libraries
from langchain_google_genai import GoogleGenerativeAIEmbeddingsfrom langchain_community.vectorstores import FAISS
Step 2: Creating and saving the vector store
def create_vector_store(chunks):embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")vector_store = FAISS.from_texts(chunks, embedding=embeddings)vector_store.save_local("faiss_index")
Step 3: Creating the vector store with text chunks
create_vector_store(text_chunks)
Writing queries and logging results
With our vector store in place, we can now set up a conversational chain using the ChatGoogleGenerativeAI model. This chain will take a context and a question as input and generate a relevant answer. We will use Weights & Biases for logging the results.
Here's the complete code for setting up the QA chain and executing queries:
Step 1: Importing the necessary libraries
from langchain.chains.question_answering import load_qa_chainfrom langchain.prompts import PromptTemplateimport asyncioimport weave
Step 2: Initializing Weights & Biases and Weave
weave.init("medical-data-chatbot")
Step 3: Defining a function load the conversational chain
def get_conversational_chain():prompt_template = """Answer the following question based on the context provided:Question: {question}Context: {context}"""model = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.5)prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])return load_qa_chain(llm=model, prompt=prompt)
Step 4: Defining an asynchronous function to get answers
Here, we add the @weave.op() decorator, which allows us to automatically log the inputs and outputs to the function! This will allow us to store questions, RAG context, and model responses!
@weave.op() # add Weave Decoratorasync def get_answer(question, context):embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")vector_store = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)docs = vector_store.similarity_search(context)chain = get_conversational_chain()input_data = {"context": context, "question": question, "input_documents": docs}response = await chain.ainvoke(input_data) # Using ainvoke methodformatted_response = format_response(response, context)return formatted_response
Step 5: Defining the format of the response
def format_response(response, context):output_text = response['output_text'] if 'output_text' in response else str(response)formatted_response = "\n".join(line.strip() for line in output_text.split("\n"))return f"Context: {context[:500]}...\nResponse: {formatted_response}"
Step 6: Defining the main execution function
def main(question, pdf_path):raw_text = get_pdf_text(pdf_path)text_chunks = get_text_chunks(raw_text)create_vector_store(text_chunks)response = asyncio.get_event_loop().run_until_complete(get_answer(question, raw_text))print(response)
Step 7: Example usage of the model
question = "What are the diseases discussed and their treatment?"main(question, pdf_path)
In this tutorial, we've walked through setting up a RAG system using Gemini LLM, extracting and processing text data from PDFs, creating a vector store with FAISS, and generating answers to queries. This system can be adapted for various applications beyond healthcare, providing a robust framework for building intelligent, context-aware chatbots.


Analyzing and comparing results
With W&B Weave, we have automated logging of responses from our model. In a RAG system, this is particularly helpful for analyzing context that our system gathered, along with how are model utilized that context in order to create an answer! Here is what it looks like inside Weave after running our code:

Future development
In the future, these intelligent medical information systems driven by artificial intelligence are going to take a great leap forward. We see a much stronger integration of AI technologies with Gemini and, for that matter, many similar AI technologies in the area of personalized medicine where AI can actually modulate the treatment based on genetic results and individual medical histories.
RAG systems will keep carrying the scalability factor. More real-time, data-driven dynamic resources, along with enhanced natural language understanding and processing, are expected to be better equipped to deal with more intricate and less structured data, which will considerably lift their value in medical specialties.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.