GraphRAG: Enhancing LLMs with knowledge graphs for superior retrieval
This article introduces GraphRAG, a novel approach that combines knowledge graphs and hierarchical community detection to enable scalable, query-focused summarization and global sensemaking over large datasets.
Created on December 18|Last edited on December 18
Comment
Large language models (LLMs) have revolutionized how we interact with information, but they often stumble when faced with complex, overarching questions that require connecting the dots across an entire dataset. Traditional retrieval-augmented generation (RAG) systems are good at finding individual pieces of information but they can struggle to synthesize insights and understand the bigger picture. That's because they treat data like isolated islands, unable to bridge the gaps and reveal the hidden relationships that give meaning to the whole. This limitation hinders our ability to gain a truly comprehensive understanding of vast amounts of data, leaving us with fragmented answers and a sense that something is missing.
In this article we'll explore GraphRAG, a method designed to overcome this challenge by transforming data into interconnected knowledge graphs. We'll explore how GraphRAG empowers LLMs to see beyond individual data points and unlock a new level of global sensemaking, providing richer, more insightful answers to even the most complex queries.
We'll end with a tutorial and the code you'll need to work with it yourself.
Already familiar with GraphRAG? Just to the tutorial below for the code and implementation by clicking the blue button below.
Jump to the tutorial

Table of contents
What is GraphRAG?Understanding the core concepts: RAG and knowledge graphsGraphRAG vs. baseline RAG: A comparative analysisThe GraphRAG processIndexing: Transforming text into knowledge graphsQuerying: Search modes in GraphRAGPrompt tuningTutorial: Implementing GraphRAG with Weave loggingStep 1: Environment setup, data preparation, and project initializationStep 2: Indexing the datasetStep 3: Running a global search queryStep 4: Running a local search queryConclusion
What is GraphRAG?
GraphRAG restructures large datasets into a knowledge graph of entities and relationships, enabling LLMs to generate coherent, thematic answers. It surpasses traditional RAG by capturing broad context and integrating insights across entire corpora, excelling at global sensemaking.
Traditional retrieval-augmented generation (RAG) is effective at retrieving and synthesizing information from large datasets but struggles with global sensemaking, which involves answering broad, thematic queries that require synthesizing insights across an entire corpus. This is because RAG systems typically retrieve individual text chunks based on their semantic similarity to the query and processes them in isolation. While sufficient for queries with explicit, localized answers, this fragmented approach falls short when addressing questions that require understanding high-level patterns, themes, or overarching relationships across the dataset.
GraphRAG addresses these challenges by introducing a knowledge-graph-based approach that restructures the dataset as a network of interconnected entities and relationships. Instead of treating the dataset as isolated chunks, GraphRAG uses community detection algorithms to group related entities into structured, modular communities. These communities are summarized independently, allowing for scalable and efficient processing of large datasets. At query time, relevant communities are retrieved, and their summaries are synthesized into a holistic response, making it particularly suited for global sensemaking tasks. By focusing on connections and themes across the dataset rather than isolated evidence, GraphRAG delivers coherent, comprehensive, and contextually rich answers to complex global queries.
Understanding the core concepts: RAG and knowledge graphs
Before going deeper into GraphRAG, it’s important to understand its foundations. RAG was designed to help large language models (LLMs) give more accurate answers by grounding their responses in external data, which reduces the risk of inventing facts (“hallucinations”).
Retrieval-augmented generation (RAG) combines the power of LLMs with external data. It uses a vector database to store and retrieve chunks of text that closely match the user’s query. Each chunk in the dataset is turned into a high-dimensional vector, and when a user asks a question—also converted into a vector—RAG finds chunks whose vectors are closest in meaning. This approach is great at pulling out bits of information that match a query but treats each piece as separate, missing any connections between them.
A knowledge graph, on the other hand, structures information as a network of entities—like people, places, or ideas—and their relationships. Each entity is a node, and edges show how these nodes relate or influence each other. With a knowledge graph, it’s not just about finding similar pieces of text; it’s about understanding how different parts fit together. For example, in a knowledge graph of A Christmas Carol, nodes might be “Scrooge” or “Jacob Marley,” linked by edges that show who warns or guides whom. This interconnected structure supports more complex reasoning, like following chains of influence through the story.
GraphRAG takes advantage of these structured relationships. While traditional vector-based retrieval only looks for chunks that match the query, graph-based retrieval leverages the knowledge graph’s network of nodes and edges. This lets the system reason more deeply, following multi-step connections and delivering more context-rich answers.
By blending RAG and knowledge graphs, GraphRAG isn’t just pulling relevant information; it’s also tapping into the relationships that provide depth and coherence. This makes graph-based approaches like GraphRAG especially powerful for complex questions that demand understanding, not just retrieval.
GraphRAG vs. baseline RAG: A comparative analysis
Compared to baseline RAG methods that rely on isolated chunks drawn from semantic similarity alone, GraphRAG takes a more holistic approach. It arranges the dataset into a structured network of entities and relationships, forming coherent groups known as communities. Each community represents a tightly connected cluster of related information—a set of concepts or items linked by meaningful ties.
For instance, suppose the dataset includes a wide range of dessert recipes. A traditional RAG system receiving a query like, “What are the defining characteristics and preparation methods for popular desserts?” might return a few isolated fragments—perhaps a snippet about a fruit tart’s ingredients, another explaining how to bake a chocolate cake, and a third describing steps for making custard. While relevant, these pieces remain scattered and don’t come together as a cohesive overview.
GraphRAG, on the other hand, would have pre-organized these desserts into communities - maybe one centered on cakes, another on pastries, and another on creamy desserts. Each community already has its own summary, created at indexing time. When asked the same question, GraphRAG retrieves these summaries and fuses them into a single, well-rounded answer. Instead of leaving the user to mentally assemble information from separate fragments, GraphRAG presents an integrated perspective that highlights shared characteristics, differences, and preparation methods across multiple categories.
By merging insights from various communities, GraphRAG overcomes the limitations of traditional RAG. Rather than offering scattered pieces that demand user interpretation, it provides a unified, context-rich response—ideal for understanding overarching themes and drawing meaningful connections.
The GraphRAG process
GraphRAG combines large language models (LLMs) and knowledge graphs to index and query large datasets, enabling precise answers to complex, multi-faceted questions. The process involves two main stages: indexing (preparing the data) and querying (answering user questions).
Indexing: Transforming text into knowledge graphs
The indexing process converts raw documents into a structured knowledge graph that captures entities (like people, events, or concepts) and their relationships. It also generates hierarchical summaries and vector embeddings that make retrieval fast and contextually rich.
The steps involved in creating a graph index are:
1. Text chunking
Large source documents are split into smaller, manageable text units (chunks) to ensure better granularity. These chunks are configured to align with document boundaries, preserving context while optimizing the size for processing.
2. Entity and relationship extraction
Each text chunk is processed using an LLM to extract entities (such as names, places, or concepts) and their relationships (connections between entities). For example, in A Christmas Carol, entities might include "Scrooge" and "Jacob Marley," while relationships could include "warns" or "guides."
3. Graph summarization
Once the entities and relationships are extracted, their descriptions are summarized into concise, unified explanations. The LLM merges multiple mentions of the same entity into a single, coherent description, reducing redundancy and creating a streamlined graph.
4. Graph augmentation with communities
Using the Leiden community detection algorithm, which efficiently groups nodes into densely connected communities by optimizing a measure called modularity - assessing how well nodes are grouped by comparing the number of connections within communities to those between different communities - the graph is hierarchically clustered into communities of closely connected entities and relationships. Each community represents a modular group of related information.
This hierarchical structure allows the graph to represent both high-level overviews and detailed subtopics, enabling scalable and organized analysis of the dataset at different granularities.
💡
5. Community summarization and embeddings
To enable efficient query processing and retrieval, each community is summarized using the LLM, capturing its key entities, relationships, and themes. These summaries are further transformed into vector embeddings, which serve as compact representations of the information at different levels of granularity:
Community embeddings: Represent the semantic content of entire communities, facilitating efficient retrieval of related groups of information.
Graph Node embeddings: Capture the structure and context of individual entities and their relationships within the graph.
This multi-layered embedding strategy bridges the gap between high-level overviews and detailed insights, enhancing the system’s ability to process complex queries effectively.
Querying: Search modes in GraphRAG
GraphRAG provides three search modes—Global, Local, and DRIFT—each tailored to specific query needs. Global Search synthesizes the entire dataset, Local Search zooms in on individual entities, and DRIFT Search enriches local insights with broader community context.
These modes work together to cover a spectrum of query types, from high-level thematic inquiries to pinpointed, detail-oriented questions. By leveraging the underlying knowledge graph structure, GraphRAG ensures each mode can access the right granularity of information, whether you’re seeking a broad overview or a closer look at a single element’s connections.
Global search
Global search retrieves and processes all pre-generated community reports to answer questions requiring a global understanding of the dataset. It excels at answering big-picture questions that span multiple topics, ensuring the final response reflects broad patterns and shared themes.
The process begins by dividing the community summaries into smaller, token-sized chunks, ensuring no relevant information is missed. These chunks are processed in parallel, and the LLM generates intermediate answers for each. Each response is evaluated with a helpfulness score between 0 and 100, filtering out irrelevant or low-scoring answers.
The highest-ranked responses are combined into a final context, where the LLM produces a global answer through a summarization step. This approach ensures the final response reflects the entire dataset, offering a cohesive perspective across multiple communities. Questions like identifying overarching themes or summarizing broad concepts across the corpus rely on this method, as it integrates insights from all communities.
It’s the go-to mode for queries like, “What overarching themes emerge across all the stories in this corpus?”
Local search
Local search, on the other hand, focuses on specific entities and their immediate connections within the knowledge graph. By retrieving relevant nodes, relationships, and claims from the graph, Local Search allows the system to answer queries grounded in precise details.
Local search combines this localized information with the corresponding raw text chunks, ensuring that answers remain both specific and contextually grounded. For example, if a question asks for the healing properties of chamomile, Local Search will extract the entity "chamomile" and its associated relationships or claims, delivering an answer that directly addresses the query.
DRIFT search
DRIFT search extends local search by incorporating nearby communities. Rather than just examining one node, it taps into its neighboring context, weaving a richer narrative that reveals how related entities influence and shape each other.
This approach significantly enriches the response by drawing on the relationships and summaries within the surrounding graph structure. For instance, understanding how the spirits influence Scrooge’s transformation would not focus solely on a single spirit but instead draw insights from the entire “spirits” community. By blending localized precision with community-level understanding, DRIFT Search ensures responses are detailed, interconnected, and comprehensive.
Prompt tuning
Prompt tuning is an optional but powerful addition to the GraphRAG system that can enhance the quality of the generated knowledge graph and improve query performance. By leveraging an LLM to generate domain-specific in-context examples, users can ensure more relevant and accurate extractions of entities, relationships, and community summaries.
GraphRAG provides auto tuning, which automatically generates prompts tailored to the input dataset. The process analyzes the data, splits it into smaller text chunks, and selects representative samples using methods like random sampling or semantic similarity. These samples are used to produce fine-tuned prompts that better guide the LLM during the indexing process. By aligning the prompts with the dataset's domain, auto tuning can improve both the precision of the graph and the overall performance of the system when answering queries.
To ensure accuracy, users can review and validate the generated examples before incorporating them into the system. Verified prompts help maintain consistency and relevance, especially for specialized or nuanced data. This tuning option allows GraphRAG to adapt to unique use cases without requiring complex manual configuration, offering a scalable way to enhance system outputs.
While GraphRAG works out of the box with default prompts, adding auto tuning can optimize extraction quality, generate cleaner graphs, and improve query responses, particularly for datasets with specialized terminology or concepts. In this tutorial, we will use the default prompts provided by GraphRag.
Tutorial: Implementing GraphRAG with Weave logging
This tutorial will guide you through setting up GraphRAG, a system that uses large language models and knowledge graphs to index and query datasets. We will initialize the system, index a sample dataset (A Christmas Carol), and run both local and global searches.
To monitor and visualize OpenAI calls and pipeline execution, we will integrate Weave for telemetry and logging. By the end, you will have a functioning GraphRAG pipeline capable of answering broad and specific queries.
Step 1: Environment setup, data preparation, and project initialization
First, install the required dependencies, including the GraphRAG library and Weave:
pip install graphrag weave
Next, set up the workspace and prepare the data. Let’s use A Christmas Carol as our sample dataset.
mkdir -p ./ragtest/input curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt -o ./ragtest/input/book.txt
Now, initialize the GraphRAG workspace. This will generate configuration files that the system needs. Run the following command to initialize the project:
graphrag init --root ./ragtest
At this point, you’ll have two key files:
.env: Contains environment variables, including your OpenAI API key. Update it with your key: GRAPHRAG_API_KEY=<YOUR_OPENAI_API_KEY>.
settings.yaml: Contains configuration settings for the indexing pipeline, which you can customize as needed.
For this tutorial, I chose to use gpt-4o-mini as the LLM and the text-embedding-3-small embeddings model from OpenAI, however, the GraphRAG supports a wide variety of models, and I recommend checking out the official docs to see if your model is supported. Here's a section of my settings.yaml file:
llm:api_key: ${GRAPHRAG_API_KEY} # set this in the generated .env filetype: openai_chat # or azure_openai_chatmodel: gpt-4o-minimodel_supports_json: true # recommended if this is available for your model.parallelization:stagger: 1.0num_threads: 1async_mode: threaded # or asyncioembeddings:async_mode: threaded # or asynciovector_store:type: lancedbdb_uri: 'output/lancedb'container_name: defaultoverwrite: truellm:api_key: ${GRAPHRAG_API_KEY}type: openai_embedding # or azure_openai_embeddingmodel: text-embedding-3-small
Step 2: Indexing the dataset
In this step, we will run the indexing pipeline to build the knowledge graph for our book we downloaded earlier. The indexing process transforms the text into smaller chunks, extracts entities and their relationships using the LLM, and clusters them into hierarchical communities.
Summaries for each community are generated, and Weave is integrated to log OpenAI calls, enabling visualization of the indexing process. Since Weave has a native integration with OpenAI, you just need to import weave and call weave.init("your_project_name") inside the script which will log all OpenAI calls to Weave.
To start indexing, ensure your settings.yaml and environment variables are set up correctly. The indexing script will process the text, extract meaningful components, and output results into the specified directory. If errors occur, Weave will log the issues, providing transparency and traceability for troubleshooting. At the end of the process, the knowledge graph and community summaries will be stored in the output folder, ready to be queried.
Here is the code for indexing (you can add this script inside your previously created ragtest directory):
from pathlib import Pathfrom graphrag.logger.types import LoggerTypefrom graphrag.cli.index import index_cliimport weave# Initialize Weave for logging/telemetryweave.init("graph_rag_index_pipeline")def main():# Hardcoded parametersroot_dir = Path("./")config_path = Path("./settings.yaml")verbose = Truememprofile = Falseresume = None # Resume an existing run; set to None for fresh runslogger = LoggerType.RICH # Logger type, e.g., RICH or SIMPLEdry_run = False # If True, pipeline will validate but not executecache = True # Enable or disable LLM cacheskip_validation = False # Skip validation stepsoutput_dir = Path("./output") # Output directory for indexing resultsprint("Starting the indexing pipeline with Weave telemetry...")try:index_cli(root_dir=root_dir,config_filepath=config_path,verbose=verbose,resume=resume,memprofile=memprofile,cache=cache,logger=logger,dry_run=dry_run,skip_validation=skip_validation,output_dir=output_dir,)print("Indexing pipeline completed successfully.")except Exception as e:print(f"Error during indexing: {e}")weave.log("index_pipeline_error", {"error": str(e)})if __name__ == "__main__":main()
After running the index script, you will see a few new folders, called “output” and “cache” directories, which contain the vector database and logs of the OpenAI calls, respectively.
Note that I ran into a few issues to rate limiting with the OpenAI API, and this issue can be solved by adjusting the settings.yaml to increase the stagger value and reduce the num_threads parameter in the parallelization section of the settings.yaml file, which will reduce the rate at which OpenAI calls are made.
Weave enabled me to quickly notice this issue, as the Weave traces dashboard clearly shows the completion status of every call.
Here's a screenshot of the Weave dashboard that brought awareness to the issue:

Weave is the perfect tool for a framework like GraphRAG because so much of GraphRAG revolves around using LLM's to construct a graph, which relies on hundred of API calls, and the quality of these responses ultimately determines the quality of the system. Being able to quickly and easily visualize each call allows you to ensure the integrity of the graph and make informed decisions around which prompts yield the best graphs!
Step 3: Running a global search query
Once the dataset is indexed, we can perform a Global Search to answer broad, holistic questions that require insights across the entire dataset. Global Search uses pre-generated community summaries in a map-reduce fashion, where each summary contributes to a final, comprehensive answer.
Here is the code for a Global Search:
from pathlib import Pathfrom graphrag.cli.query import run_global_searchimport weave; weave.init("graph_rag_global_search")def main():# Hardcoded parametersroot_dir = Path("./")data_dir = Path("./output")config_path = Path("./settings.yaml")query = "What is the role of Scrooge in the narrative?"community_level = 2dynamic_community_selection = Falseresponse_type = "Multiple Paragraphs"streaming = Falseprint(f"Running global search with query: '{query}'...")try:run_global_search(config_filepath=config_path, # Adjust if needed to include configsdata_dir=data_dir,root_dir=root_dir,community_level=community_level,dynamic_community_selection=dynamic_community_selection,response_type=response_type,streaming=streaming,query=query,)print("Query completed successfully.")except Exception as e:print(f"Error while running global search: {e}")if __name__ == "__main__":main()
The query script initiates a global search, retrieving summaries across multiple communities. It evaluates the relevance of each summary, combines the most useful responses, and synthesizes them into a detailed global answer.
You can use Weave to track the LLM calls and observe how the system interacts with the graph data, ensuring the query logic is transparent and verifiable.
I also used Weave to log the calls made to the OpenAI model, by importing Weave and calling weave.init(). After running the script, you will see the series of LLM calls inside Weave:

Step 4: Running a local search query
For more specific queries, a local search focuses on particular entities and their immediate relationships within the knowledge graph. It retrieves relevant nodes, edges, and associated raw text, combining them to generate precise answers tailored to the query.
from pathlib import Pathfrom graphrag.cli.query import run_local_searchimport weaveweave.init("graph_rag_local_search")def main():# Hardcoded parametersroot_dir = Path("./")data_dir = Path("./output")config_path = Path("./settings.yaml")query = "How does Scrooge's reaction to the Ghost of Christmas Yet to Come illustrate his fear of the future?"community_level = 2response_type = "Multiple Paragraphs"streaming = Falseprint(f"Running local search with query: '{query}'...")try:run_local_search(config_filepath=config_path,data_dir=data_dir,root_dir=root_dir,community_level=community_level,response_type=response_type,streaming=streaming,query=query,)print("Query completed successfully.")except Exception as e:print(f"Error while running local search: {e}")if __name__ == "__main__":main()
The Local Search script initiates the process by querying a specific entity, such as “Scrooge,” and retrieves all closely related information.
Weave logs the OpenAI interactions, helping you visualize the entities accessed and the query execution. Local Search is ideal for questions where fine-grained details are required, as it narrows the focus to specific parts of the graph.
Conclusion
Utilizing GraphRAG and Weave, you have created a pipeline that can handle both detailed and broad questions. Global Search leverages the full dataset to answer high-level questions, while Local Search targets specific entities and relationships. The integration of Weave provides valuable transparency into the process, enabling you to visualize LLM interactions and monitor performance. You now have a powerful and extensible GraphRAG system to explore and analyze large datasets.
Thanks for following along, and happy querying!
Building and evaluating a RAG system with DSPy and W&B Weave
A guide to building a RAG system with DSPy, and evaluating it with W&B Weave.
How to evaluate a Langchain RAG system with RAGAs
A guide to evaluating a Langchain RAG system with RAGAs and Weights & Biases.
LLaVA-o1: Advancing structured reasoning in vision-language models
Discover how LLaVA-o1 tackles reasoning challenges in multimodal AI with structured problem-solving. Learn about its dataset, capabilities, and performance analysis using W&B Weave.
Building reliable apps with GPT-4o and structured outputs
Learn how to enforce consistency on GPT-4o outputs, and build reliable Gen-AI Apps.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.