Skip to main content

The role of in-context retrieval in intelligent systems

In this article, we will dig into the role of in-context retrieval in intelligent systems and how they work
Created on April 30|Last edited on March 7
As artificial intelligence (AI) systems become more advanced, the ability to retrieve and integrate relevant information efficiently has become a key capability. Traditional retrieval methods, such as Retrieval-Augmented Generation (RAG), rely on explicit database lookups and keyword matching. However, these methods can be limiting in dynamic environments.
In-context retrieval (ICR) is an alternative approach that allows AI models to dynamically retrieve and use relevant information during the generation or decision-making process, making outputs more relevant, coherent, and personalized.
In this article, we will:
  • Define in-context retrieval and its differences from RAG and in-context learning.
  • Explore types of retrieval mechanisms in AI systems.
  • Examine the role of prompts in context generation.
  • Provide a practical implementation using a lightweight retrieval system.

What is in-context retrieval?

In-context retrieval (ICR) is a technique in which AI models retrieve and integrate relevant information based on real-time context instead of relying solely on pre-indexed databases. Unlike traditional retrieval methods that retrieve information first and then use it, ICR retrieves and integrates information dynamically within the same process.

How in-context retrieval differs from RAG and in-context learning

FeatureIn-Context Retrieval (ICR)Retrieval-Augmented Generation (RAG)In-Context Learning (ICL)
DefinitionDynamically retrieves relevant data during model executionFetches relevant data from an external source before text generationUses provided examples to generate relevant responses without explicit retrieval
Data SourceExternal APIs, databases, embeddings, real-time contextPre-indexed knowledge bases, vector storesNo retrieval; uses in-prompt examples
Use CaseAI-powered search engines, adaptive chatbotsFact-checking, document Q&A, research assistantsFew-shot learning, text classification


How AI models integrate retrieved data

  • ICR: Adjusts responses based on live information retrieval (e.g., retrieving breaking news).
  • RAG: Merges pre-indexed data with a query before generating text.
  • ICL: Does not retrieve external data but uses in-prompt examples to guide model outputs.

Types of Retrieval in AI

Source: Generated from infographic
AI retrieval processes can take various forms, each optimized for different tasks.

1. Explicit Retrieval

  • Directly queries external databases or APIs.
  • Example: Google Search retrieves and displays web results.
  • Real-world application: Legal AI assistants retrieving case laws

2. Implicit Retrieval

  • Uses internal knowledge stored in pre-trained models.
  • Example: GPT models completing a sentence based on prior context.
  • Real-world application: GPT models generating text using internal knowledge

3. Contextual Retrieval

  • Dynamically retrieves relevant information based on real-time user context.
  • Example: AI chatbots retrieving past user interactions to personalize responses.
  • Real-world application: Personalized AI search engines adjusting results based on user preferences

The role of prompts in context generation

Prompts serve as the entry point for AI to extract and interpret context. The quality of prompts directly influences how effectively AI retrieves and integrates information.

Prompt engineering in context-aware AI

  • Guided Retrieval: Using structured prompts to extract only relevant details.
    • Example: "Retrieve only the latest company financial reports.
  • Context Expansion: Using follow-up prompts to refine queries.
    • Example: "Summarize the key takeaways from the retrieved data."

Context in Computer Vision and NLP

  • NLP Models: Context helps AI understand user intent in chatbots.
  • Computer Vision: CNNs use surrounding pixels as context to improve object detection.

Practical Implementation

We will implement a retrieval-based AI system using the 20 Newsgroups dataset, which contains ~20,000 documents across 20 topics, including computer science, politics, and sports. Our goal is to efficiently index and search through this dataset using Elasticsearch and enhance system monitoring with Weights & Biases.

1. Setting Up the Environment

Libraries and Tools Needed
  • Elasticsearch for indexing and querying documents.
  • Natural Language Processing (NLP) libraries (spaCy, NLTK, or Hugging Face Transformers) for text processing.
  • Weights & Biases for tracking and visualizing retrieval performance.
pip install elasticsearch nltk wandb

2. Load the Dataset

We import the sklearn.datasets module from the sci-kit-learn library, which provides convenient access to various datasets, including the 20 Newsgroups dataset.
We load the 20 Newsgroups dataset using the fetch_20newsgroups function. The subset='all' parameter ensures that we load the entire dataset, and remove=('headers', 'footers', 'quotes') removes any headers, footers, or quoted text from the documents, keeping only the main content.
The fetch_20newsgroups function returns a dataset object newsgroups_data with various attributes, including data, which is a list containing the text content of each document.
import sklearn.datasets
# Load the 20 Newsgroups dataset
newsgroups_data = sklearn.datasets.fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))

3. Create a List of Dictionaries for Indexing

To prepare the data for indexing in Elasticsearch, we create a list of dictionaries, where each dictionary represents a document. The dictionary has two keys: title and content. The title value is a simple string identifying the document (e.g., "Document 1", "Document 2", etc.), while the content value is the actual text content of the document, taken from newsgroups_data.data.
# Create a list of dictionaries for indexing
documents = [{'title': f'Document {i}', 'content': newsgroups_data.data[i]} for i in range(len(newsgroups_data.data))]
By creating this list of dictionaries, we structure the data in a format that Elasticsearch can easily index and search through. Each dictionary represents a single document, with the title and content fields mapped to the appropriate fields in the Elasticsearch index.
After this step, we have the 20 Newsgroups dataset loaded and structured in a way that makes it ready for indexing in Elasticsearch. In the next step, we'll create the Elasticsearch index and index the documents.

4. Install and start Elasticsearch

!pip install Elasticsearch -q
# download elasticsearch
!wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.1-linux-x86_64.tar.gz -q
!tar -xzf elasticsearch-7.9.1-linux-x86_64.tar.gz
!chown -R daemon:daemon elasticsearch-7.9.1

5. Start the OS Server

First, we import the necessary modules to run external programs and interact with the operating system. Then, we use the Popen function from the subprocess module to start the Elasticsearch server. We provide the path to the Elasticsearch executable as an argument (elasticsearch-7.9.1/bin/elasticsearch). We also specify some additional options:
  • stdout=PIPE and stderr=STDOUT redirect the output and error messages from Elasticsearch to the Python script.
  • preexec_fn=lambda: os.setuid(1) ensures that Elasticsearch runs with a non-root user for security reasons.
  • After starting the Elasticsearch server, we use the !curl -X GET "localhost:9200/" command to send an HTTP GET request to http://localhost:9200/, which is the default address and port where the Elasticsearch server listens for requests.
This curl command is used to test if the Elasticsearch server is running and responding correctly. If the server is running and accessible, you should see a JSON response with information about the Elasticsearch cluster.
# start server
import os
from subprocess import Popen, PIPE, STDOUT
es_server = Popen(['elasticsearch-7.9.1/bin/elasticsearch'],
stdout=PIPE, stderr=STDOUT,
preexec_fn=lambda: os.setuid(1)
)
# wait a bit then test
!curl -X GET "localhost:9200/"

6. Connect to ElasticSearch

# client-side
!pip install elasticsearch==7.9.1
import elasticsearch
from elasticsearch import Elasticsearch


# Connect to Elasticsearch
from datetime import datetime
es = Elasticsearch([{'host': 'localhost' , 'port': 9200, "scheme": "http"}])
es.ping()

7. Define Index Mapping and Create Index

Define the mapping, and index the documents from the 20 Newsgroups dataset.

# Define the index mapping
index_name = "newsgroups_index"
mapping = {
"properties": {
"title": {"type": "text"},
"content": {"type": "text"}
}
}

8. Create Index

We create a list of dictionaries containing the document titles and contents, which will be indexed using Elasticsearch.
# Create the index
es.indices.create(index=index_name, ignore=400)
es.indices.put_mapping(index=index_name, body=mapping)
# Index the documents
for doc in documents:
es.index(index=index_name, body=doc)

9. Download Required NLTK Resources

Before indexing the documents, it's crucial to perform data preprocessing and cleaning steps to ensure better search quality and relevance. This typically involves tasks such as tokenization, stopword removal, and stemming/lemmatization.
nltk.download('punkt')
nltk.download('stopwords')
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

10. Initialize StopWords and Stemmer

# Initialize stopwords and stemmer
stop_words = set(stopwords.words('english'))
stemmer = PorterStemmer()

11. Preprocess the Data

# Preprocess the data
preprocessed_documents = []
for document in newsgroups_data.data:
# Tokenize the document
tokens = nltk.word_tokenize(document.lower())

12. Remove Stopwords

# Remove stopwords and stem the tokens
filtered_tokens = [stemmer.stem(token) for token in tokens if token not in stop_words]

13. Join the Tokens

# Join the tokens back into a single string
preprocessed_text = ' '.join(filtered_tokens)
preprocessed_documents.append(preprocessed_text)

14. Create a List of Dictionary for Indexing

# Create a list of dictionaries for indexing
documents = [{'title': f'Document {i}', 'content': preprocessed_documents[i]} for i in range(len(preprocessed_documents))]

15. Querying the Index

We connect to Elasticsearch and specify the index we want to query (newsgroups_index). We define a basic query that searches for documents containing the word "comput" (note that we use the stemmed version of "computer" since we preprocessed the data).
from elasticsearch import Elasticsearch


# Connect to Elasticsearch
es = Elasticsearch()
index_name = "newsgroups_index"


# Basic query
query = {
"query": {
"match": {
"content": "comput"
}
}
}
We execute the search query using es.search(index=index_name, body=query) and print the number of results. We define a more complex query using Elasticsearch's Query DSL. This query searches for documents with the title "Document" and containing the word "scienc" (stemmed version of "science"). We execute the complex query and print the number of results.
results = es.search(index=index_name, body=query)
print(f"Number of results: {len(results['hits']['hits'])}")


# More complex query using Elasticsearch's Query DSL
query = {
"query": {
"bool": {
"must": [
{"match": {"title": "Document"}},
{"match": {"content": "computer science"}}
]
}
}
}
results = es.search(index=index_name, body=query)
print(f"Number of results: {len(results['hits']['hits'])}")

17. Using Weights & Biases for System Monitoring

!pip install wandb
import wandb


# Initialize Weights & Biases
wandb.init(project="newsgroups-retrieval-system")


# Log system metrics
for i in range(100):
query = {
"query": {
"match": {
"content": "comput"
}
}
}
metrics = {
"query_time": results["took"],
"num_results": len(results["hits"]["hits"])
}
wandb.log(metrics)


# Visualize system performance
wandb.finish()

After logging the metrics to visualize the performance, we can observe that the query time converges to 0 as the number of steps increases.

Conclusion

In-context retrieval is transforming AI by enabling models to dynamically access and integrate relevant information, improving adaptability, coherence, and personalization. These techniques offer significant advantages over traditional retrieval methods, making AI systems more effective across diverse tasks and environments.
As AI continues to evolve, in-context retrieval will play a growing role in natural language processing, decision support systems, and real-time information retrieval. By experimenting with the methods and code presented in this article, you can explore the potential of this powerful approach and build more intelligent, context-aware AI solutions.
Iterate on AI agents and models faster. Try Weights & Biases today.