The role of in-context retrieval in intelligent systems

In this article, we will dig into the role of in-context retrieval in intelligent systems and how they work
Created on April 30|Last edited on March 7
Comment
As artificial intelligence (AI) systems become more advanced, the ability to retrieve and integrate relevant information efficiently has become a key capability. Traditional retrieval methods, such as Retrieval-Augmented Generation (RAG), rely on explicit database lookups and keyword matching. However, these methods can be limiting in dynamic environments.
In-context retrieval (ICR) is an alternative approach that allows AI models to dynamically retrieve and use relevant information during the generation or decision-making process, making outputs more relevant, coherent, and personalized.
In this article, we will:
Define in-context retrieval and its differences from RAG and in-context learning.
Explore types of retrieval mechanisms in AI systems.
Examine the role of prompts in context generation.
Provide a practical implementation using a lightweight retrieval system.
What is in-context retrieval?In-context retrieval (ICR) is a technique in which AI models retrieve and integrate relevant information based on real-time context instead of relying solely on pre-indexed databases. Unlike traditional retrieval methods that retrieve information first and then use it, ICR retrieves and integrates information dynamically within the same process.
How in-context retrieval differs from RAG and in-context learning




























FeatureIn-Context Retrieval (ICR)Retrieval-Augmented Generation (RAG)In-Context Learning (ICL)
DefinitionDynamically retrieves relevant data during model executionFetches relevant data from an external source before text generationUses provided examples to generate relevant responses without explicit retrieval
Data SourceExternal APIs, databases, embeddings, real-time contextPre-indexed knowledge bases, vector storesNo retrieval; uses in-prompt examples
Use CaseAI-powered search engines, adaptive chatbotsFact-checking, document Q&A, research assistantsFew-shot learning, text classification
﻿
How AI models integrate retrieved dataICR: Adjusts responses based on live information retrieval (e.g., retrieving breaking news).
RAG: Merges pre-indexed data with a query before generating text.
ICL: Does not retrieve external data but uses in-prompt examples to guide model outputs.
Types of Retrieval in AI
Source: Generated from infographic
AI retrieval processes can take various forms, each optimized for different tasks.
1. Explicit RetrievalDirectly queries external databases or APIs.
Example: Google Search retrieves and displays web results.
Real-world application: Legal AI assistants retrieving case laws
2. Implicit RetrievalUses internal knowledge stored in pre-trained models.
Example: GPT models completing a sentence based on prior context.
Real-world application: GPT models generating text using internal knowledge
3. Contextual RetrievalDynamically retrieves relevant information based on real-time user context.
Example: AI chatbots retrieving past user interactions to personalize responses.
Real-world application: Personalized AI search engines adjusting results based on user preferences
The role of prompts in context generationPrompts serve as the entry point for AI to extract and interpret context. The quality of prompts directly influences how effectively AI retrieves and integrates information.
Prompt engineering in context-aware AIGuided Retrieval: Using structured prompts to extract only relevant details.
Example: "Retrieve only the latest company financial reports.
Context Expansion: Using follow-up prompts to refine queries.
Example: "Summarize the key takeaways from the retrieved data."
Context in Computer Vision and NLPNLP Models: Context helps AI understand user intent in chatbots.
Computer Vision: CNNs use surrounding pixels as context to improve object detection.
Practical ImplementationWe will implement a retrieval-based AI system using the 20 Newsgroups dataset, which contains ~20,000 documents across 20 topics, including computer science, politics, and sports. Our goal is to efficiently index and search through this dataset using Elasticsearch and enhance system monitoring with Weights & Biases.
1. Setting Up the EnvironmentLibraries and Tools Needed
Elasticsearch for indexing and querying documents.
Natural Language Processing (NLP) libraries (spaCy, NLTK, or Hugging Face Transformers) for text processing.
Weights & Biases for tracking and visualizing retrieval performance.
pip install elasticsearch nltk wandb
2. Load the DatasetWe import the sklearn.datasets module from the sci-kit-learn library, which provides convenient access to various datasets, including the 20 Newsgroups dataset.
We load the 20 Newsgroups dataset using the fetch_20newsgroups function. The subset='all' parameter ensures that we load the entire dataset, and remove=('headers', 'footers', 'quotes') removes any headers, footers, or quoted text from the documents, keeping only the main content.
The fetch_20newsgroups function returns a dataset object newsgroups_data with various attributes, including data, which is a list containing the text content of each document.
import sklearn.datasets
# Load the 20 Newsgroups dataset
newsgroups_data = sklearn.datasets.fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))
3. Create a List of Dictionaries for IndexingTo prepare the data for indexing in Elasticsearch, we create a list of dictionaries, where each dictionary represents a document. The dictionary has two keys: title and content. The title value is a simple string identifying the document (e.g., "Document 1", "Document 2", etc.), while the content value is the actual text content of the document, taken from newsgroups_data.data.
# Create a list of dictionaries for indexing
documents = [{'title': f'Document {i}', 'content': newsgroups_data.data[i]} for i in range(len(newsgroups_data.data))]
By creating this list of dictionaries, we structure the data in a format that Elasticsearch can easily index and search through. Each dictionary represents a single document, with the title and content fields mapped to the appropriate fields in the Elasticsearch index.
After this step, we have the 20 Newsgroups dataset loaded and structured in a way that makes it ready for indexing in Elasticsearch. In the next step, we'll create the Elasticsearch index and index the documents.
4. Install and start Elasticsearch!pip install Elasticsearch -q
# download elasticsearch
!wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.1-linux-x86_64.tar.gz -q
!tar -xzf elasticsearch-7.9.1-linux-x86_64.tar.gz
!chown -R daemon:daemon elasticsearch-7.9.1
5. Start the OS ServerFirst, we import the necessary modules to run external programs and interact with the operating system. Then, we use the Popen function from the subprocess module to start the Elasticsearch server. We provide the path to the Elasticsearch executable as an argument (elasticsearch-7.9.1/bin/elasticsearch). We also specify some additional options:
stdout=PIPE and stderr=STDOUT redirect the output and error messages from Elasticsearch to the Python script.
preexec_fn=lambda: os.setuid(1) ensures that Elasticsearch runs with a non-root user for security reasons.
After starting the Elasticsearch server, we use the !curl -X GET "localhost:9200/" command to send an HTTP GET request to http://localhost:9200/, which is the default address and port where the Elasticsearch server listens for requests.
This curl command is used to test if the Elasticsearch server is running and responding correctly. If the server is running and accessible, you should see a JSON response with information about the Elasticsearch cluster.
# start server
import os
from subprocess import Popen, PIPE, STDOUT
es_server = Popen(['elasticsearch-7.9.1/bin/elasticsearch'],
                 stdout=PIPE, stderr=STDOUT,
                 preexec_fn=lambda: os.setuid(1)
                )
# wait a bit then test
!curl -X GET "localhost:9200/"
6. Connect to ElasticSearch# client-side
!pip install elasticsearch==7.9.1
import elasticsearch
from elasticsearch import Elasticsearch
﻿
﻿
# Connect to Elasticsearch
from datetime import datetime
es = Elasticsearch([{'host': 'localhost' , 'port': 9200, "scheme": "http"}])
es.ping()
7. Define Index Mapping and Create Index Define the mapping, and index the documents from the 20 Newsgroups dataset.
﻿
# Define the index mapping
index_name = "newsgroups_index"
mapping = {
   "properties": {
       "title": {"type": "text"},
       "content": {"type": "text"}
   }
}
8. Create Index We create a list of dictionaries containing the document titles and contents, which will be indexed using Elasticsearch. 
# Create the index
es.indices.create(index=index_name, ignore=400)
es.indices.put_mapping(index=index_name, body=mapping)
# Index the documents
for doc in documents:
   es.index(index=index_name, body=doc)
9. Download Required NLTK ResourcesBefore indexing the documents, it's crucial to perform data preprocessing and cleaning steps to ensure better search quality and relevance. This typically involves tasks such as tokenization, stopword removal, and stemming/lemmatization.
nltk.download('punkt')
nltk.download('stopwords')
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer    
10. Initialize StopWords and Stemmer# Initialize stopwords and stemmer
stop_words = set(stopwords.words('english'))
stemmer = PorterStemmer()
11. Preprocess the Data# Preprocess the data
preprocessed_documents = []
for document in newsgroups_data.data:
   # Tokenize the document
   tokens = nltk.word_tokenize(document.lower())
12. Remove Stopwords# Remove stopwords and stem the tokens
   filtered_tokens = [stemmer.stem(token) for token in tokens if token not in stop_words]
13. Join the Tokens# Join the tokens back into a single string
   preprocessed_text = ' '.join(filtered_tokens)
   preprocessed_documents.append(preprocessed_text)
14. Create a List of Dictionary for Indexing# Create a list of dictionaries for indexing
documents = [{'title': f'Document {i}', 'content': preprocessed_documents[i]} for i in range(len(preprocessed_documents))]
15. Querying the IndexWe connect to Elasticsearch and specify the index we want to query (newsgroups_index). We define a basic query that searches for documents containing the word "comput" (note that we use the stemmed version of "computer" since we preprocessed the data).
from elasticsearch import Elasticsearch
﻿
﻿
# Connect to Elasticsearch
es = Elasticsearch()
index_name = "newsgroups_index"
﻿
﻿
# Basic query
query = {
   "query": {
       "match": {
           "content": "comput"
       }
   }
}
16. Execute Query SearchWe execute the search query using es.search(index=index_name, body=query) and print the number of results. We define a more complex query using Elasticsearch's Query DSL. This query searches for documents with the title "Document" and containing the word "scienc" (stemmed version of "science"). We execute the complex query and print the number of results.
results = es.search(index=index_name, body=query)
print(f"Number of results: {len(results['hits']['hits'])}")
﻿
﻿
# More complex query using Elasticsearch's Query DSL
query = {
   "query": {
       "bool": {
           "must": [
               {"match": {"title": "Document"}},
               {"match": {"content": "computer science"}}
           ]
       }
   }
}
results = es.search(index=index_name, body=query)
print(f"Number of results: {len(results['hits']['hits'])}")
17. Using Weights & Biases for System Monitoring!pip install wandb
import wandb
﻿
﻿
# Initialize Weights & Biases
wandb.init(project="newsgroups-retrieval-system")
﻿
﻿
# Log system metrics
for i in range(100):
   query = {
       "query": {
           "match": {
               "content": "comput"
           }
       }
   }
      metrics = {
       "query_time": results["took"],
       "num_results": len(results["hits"]["hits"])
   }
   wandb.log(metrics)
﻿
﻿
# Visualize system performance
wandb.finish()
﻿
After logging the metrics to visualize the performance, we can observe that the query time converges to 0 as the number of steps increases.
ConclusionIn-context retrieval is transforming AI by enabling models to dynamically access and integrate relevant information, improving adaptability, coherence, and personalization. These techniques offer significant advantages over traditional retrieval methods, making AI systems more effective across diverse tasks and environments.
As AI continues to evolve, in-context retrieval will play a growing role in natural language processing, decision support systems, and real-time information retrieval. By experimenting with the methods and code presented in this article, you can explore the potential of this powerful approach and build more intelligent, context-aware AI solutions.
﻿
Feature	In-Context Retrieval (ICR)	Retrieval-Augmented Generation (RAG)	In-Context Learning (ICL)
Definition	Dynamically retrieves relevant data during model execution	Fetches relevant data from an external source before text generation	Uses provided examples to generate relevant responses without explicit retrieval
Data Source	External APIs, databases, embeddings, real-time context	Pre-indexed knowledge bases, vector stores	No retrieval; uses in-prompt examples
Use Case	AI-powered search engines, adaptive chatbots	Fact-checking, document Q&A, research assistants	Few-shot learning, text classification
Add a comment
Tags: RAG, Articles, Community Posts
Iterate on AI agents and models faster. Try Weights & Biases today.