Skip to main content

AI agents in retail and e-commerce

This article explores how AI agents are transforming retail by automating customer interactions, optimizing decision-making, and enhancing product recommendations using LLM-driven vector search.
Created on February 11|Last edited on April 9
AI is transforming retail, automating key processes from customer support to personalized shopping experiences. Businesses are increasingly using AI-driven systems to optimize efficiency, enhance engagement, and provide more intelligent, data-driven interactions. As these technologies continue to evolve, they are redefining how retailers operate and how customers experience shopping, both online and in-store.
We will explore how AI agents are being used to streamline decision-making, automate workflows, and personalize customer interactions. Additionally, we will build an AI-powered email triage system to manage customer inquiries more efficiently and a recommendation engine that delivers personalized product suggestions using vector search and LLM-generated queries. Alongside these implementations, we will examine the broader impact of AI in retail, including the benefits, challenges, and evolving role of intelligent automation.


Table of contents



What are AI agents and why they matter in retail?

AI agents, in retail, are intelligent systems designed to automate and enhance various customer-facing and backend processes. They power chatbots, voice-enabled assistants, and personal shopping experiences, providing seamless interactions that adapt to customer needs in real-time. Retailers can use these agents for personalized product recommendations, inventory management, customer support, and even dynamic pricing strategies.
The evolution of AI in the retail space has moved from basic recommender systems to sophisticated, context-aware agents capable of predicting customer preferences, engaging in meaningful conversations, and automating decision-making. Advances in natural language processing, memory integration, and tool usage allow AI agents to act as intelligent intermediaries, bridging the gap between data-driven insights and real-world actions. Overall, the trajectory of AI agents suggests a future where they seamlessly integrate into business operations and customer interactions, driving both sales and customer satisfaction.
At the core of their functionality are several key components:
  • Tools: AI agents connect with APIs, databases, and software systems to access product catalogs, order management platforms, and logistics networks, ensuring seamless operations.
  • Memory: They retain customer preferences and past interactions, enabling personalized recommendations, better engagement, and a more consistent shopping experience.
  • Continual Learning: By analyzing customer behavior, market trends, and business performance, AI agents refine their strategies over time, improving accuracy and effectiveness.
  • Orchestration: These agents manage complex retail processes, automating order handling, supplier coordination, and real-time decision-making to enhance efficiency.
As advancements in natural language processing, multimodal AI, and adaptive learning continue, AI agents will become even more integral to retail operations - driving sales, improving customer satisfaction, and streamlining business processes.

Key benefits of AI agents for retailers

AI agents simplify retail operations by reducing the complexity of traditional systems. Previously, retailers relied on manual rule-setting, extensive engineering, and frequent adjustments to keep up with market changes. AI-powered systems, driven by large language models, dynamically interpret data, generate insights, and automate decision-making with minimal human intervention. This adaptability allows retailers to respond quickly to shifting trends, customer behavior, and operational challenges without the need for constant reprogramming.
One of the most significant advantages of AI is personalized customer interactions. AI-driven recommendations, chatbots, and virtual assistants tailor product suggestions, promotions, and support responses to individual preferences. Unlike older recommendation engines that relied solely on transactional data, modern AI models analyze browsing history, purchase patterns, and even natural language interactions to deliver context-aware recommendations. This enhanced personalization improves conversion rates, increases customer retention, and strengthens brand loyalty.
AI also transforms customer service by automating responses to common inquiries and prioritizing critical issues. AI-powered email triage systems can categorize messages, assess urgency, and escalate pressing concerns in real time. Chatbots and voice assistants handle routine questions, reducing the workload for human agents while ensuring customers receive fast, accurate, and consistent support. This automation improves response times, lowers operational costs, and allows human employees to focus on high-value interactions that require empathy and problem-solving.
By automating these essential functions, AI reduces operational overhead, improves efficiency, and enhances data-driven decision-making across retail businesses.

Core components and technology behind AI agents

AI agents process data, make decisions, and adapt using large language models (LLMs), memory, external tools, and orchestration systems. At their core, LLMs enable AI agents to understand and generate human-like language, making them essential for customer interactions and decision-making. These agents operate within interconnected systems that ensure adaptability and accuracy.
Key components of AI agents include:
  • Tool Integration: AI agents connect to external systems to access real-time data and perform tasks like answering customer queries. For example, an AI agent integrated with Shopify’s API can instantly check product availability, while one connected to Salesforce can retrieve customer purchase history to provide personalized recommendations.
  • Memory: Enables agents to recall past interactions, ensuring consistent and personalized responses. A virtual shopping assistant, for instance, can remember a customer’s preferences from a previous session, allowing for seamless product recommendations without requiring the customer to start over.
  • Orchestration: Manages complex retail workflows by coordinating multiple AI functions. This includes handling orders, supplier coordination, and real-time decision-making, streamlining operations and reducing manual effort.
  • Observability: Ensures transparency, debugging, and continuous optimization of AI agents. Platforms like Weights & Biases Weave allow retailers to track response accuracy, detect hallucinations, and refine AI behavior in real time, improving both performance and trust.
With these components in place, AI agents deliver more reliable, context-aware, and scalable automation, transforming retail operations and enhancing customer experiences.

The role of LLMs in retail agents

Large language models are the driving force behind modern AI agents, enabling them to process language, generate responses, and adapt to real-world interactions with remarkable flexibility. Unlike traditional rule-based AI, LLMs leverage deep learning and vast datasets to predict, understand, and generate text dynamically. Their ability to handle natural conversations, summarize information, and personalize recommendations has transformed AI agents from simple automation tools into intelligent decision-makers.
Advancements in LLMs have made AI agents more responsive and effective in retail. Techniques like few-shot learning, retrieval-augmented generation, and fine-tuning allow these models to adapt to business needs, provide personalized customer interactions, and automate support without human intervention.
As LLMs continue to evolve, they will push the boundaries of retail technology, driving smarter automation, deeper personalization, and more efficient operations.

Security, compliance, and scalability

In retail use-cases, AI agents often handle sensitive customer data, including purchase histories and payment details, making security and regulatory compliance critical. Retailers must adhere to GDPR, CCPA, and other data protection laws, implementing encryption, secure storage, and user consent mechanisms to maintain trust and avoid legal risks. Strict access controls are essential to prevent unauthorized use of AI-driven systems in areas like personalization, fraud detection, and payment processing.
Beyond compliance, AI systems face cyber threats that can compromise decision-making. Adversarial attacks can manipulate models into making incorrect predictions, while prompt injections can lead to misinformation or unauthorized data access. To mitigate these risks, real-time anomaly detection and automated query filtering are crucial. AI monitoring tools can flag unusual behavior, identify anomalies, and prevent security breaches before they escalate.
To build trust, AI agents must be transparent and auditable. Advanced monitoring systems track decision-making, log interactions, and detect inconsistencies in real time. Retailers can even use AI agents to oversee other AI agents, ensuring responses remain accurate, policy-compliant, and free from hallucinations. Platforms like W&B Weave enhance oversight by tracking model performance, refining responses, and maintaining system reliability. Without proper monitoring, AI agents could generate misleading responses, make biased recommendations, or expose sensitive information, damaging brand reputation.
Scalability is another major challenge, especially during peak sales events like Black Friday. AI agents must handle traffic surges, process transactions, and manage inventory in real time. Without efficient scaling, retailers risk slow systems, failed transactions, and poor customer experiences. Solutions like retrieval-augmented generation optimize efficiency by reducing reliance on full-scale model queries, making responses faster and more cost-effective. Model quantization further enhances performance by compressing AI models without sacrificing accuracy, ensuring they can scale efficiently under heavy demand.
For retailers deploying AI at scale, security, compliance, and performance are not optional - they’re essential. AI-driven systems must be continuously monitored and optimized to ensure they remain secure, compliant, and scalable. Without these safeguards, AI agents could become liabilities rather than assets, undermining the very efficiency and personalization they are designed to provide.

Tutorial: Automating email prioritization with AI

Retailers handle a high volume of customer emails, including technical issues, billing disputes, feature requests, and general inquiries. Manually reviewing and categorizing these emails is inefficient and can delay responses to critical issues. AI-powered automation streamlines this process by analyzing emails, assessing urgency, and ensuring the most pressing matters are addressed first.
In this tutorial, we will build an AI-driven email prioritization system that:
  • Detects urgency by analyzing language cues and business impact, assigning a priority level of HIGH, MEDIUM, or LOW.
  • Categorizes emails into predefined business areas such as Technical Issues, Billing, Integration, or Customer Service.
  • Prioritizes and ranks emails based on urgency and timestamps, ensuring critical issues are addressed first.
  • Generates reports on common customer concerns, helping businesses optimize their products and services.
With real-time monitoring through W&B Weave, this system maintains transparency, refines prioritization logic dynamically, and scales efficiently for retail operations.
The following code demonstrates how an AI-driven system can dynamically analyze, categorize, and prioritize emails. It uses OpenAI’s GPT-4o for inference and Weave for real-time monitoring. While this tutorial uses a mock dataset, in practice, it can be connected to a live inbox via an API.
import json
from datetime import datetime, timedelta
from litellm import completion
import os
import weave; weave.init('retail-email-agent')


os.environ["OPENAI_API_KEY"] = "your api key"


# Mock email database
MOCK_EMAILS = [
{
"id": "1",
"from": "customer@example.com",
"subject": "App keeps crashing",
"body": "The mobile app has been crashing constantly for the past week. This is frustrating as I rely on it for work.",
"timestamp": (datetime.now() - timedelta(hours=2)).isoformat(),
"urgency": None,
"category": None
},
{
"id": "2",
"from": "support@partner.com",
"subject": "Urgent: Service integration issue",
"body": "We're experiencing problems with the API integration. Several customers are affected. Need immediate assistance.",
"timestamp": (datetime.now() - timedelta(hours=1)).isoformat(),
"urgency": None,
"category": None
},
{
"id": "3",
"from": "user123@example.com",
"subject": "Billing overcharge",
"body": "I was charged twice for my monthly subscription. Please refund the extra charge. Your billing system needs better validation.",
"timestamp": (datetime.now() - timedelta(minutes=30)).isoformat(),
"urgency": None,
"category": None
},
{
"id": "4",
"from": "enterprise@bigcorp.com",
"subject": "Missing enterprise features",
"body": "We need better team management features and role-based access control. Currently managing large teams is very manual.",
"timestamp": (datetime.now() - timedelta(minutes=45)).isoformat(),
"urgency": None,
"category": None
}
]

# Urgency levels mapped to descriptors
URGENCY_LEVELS = {
2: "HIGH", # Critical issues, system down, multiple users affected
1: "MEDIUM", # Important but not critical
0: "LOW" # Regular requests, no immediate impact
}

# Categories mapped to integers
CATEGORIES = {
0: "Technical Issues", # App crashes, bugs
1: "Performance", # Speed, reliability
2: "Billing/Pricing", # Payment problems
3: "User Experience", # UI/UX issues
4: "Feature Requests", # New features
5: "Security/Privacy", # Security concerns
6: "Integration", # API issues
7: "Customer Service", # Support general
8: "Documentation", # Help docs
9: "Enterprise" # Large customer needs
}

class EmailSystem:
def __init__(self, model_id="openai/gpt-4o"):
self.emails = MOCK_EMAILS.copy()
self.model_id = model_id
def _run_inference(self, prompt):
"""Run inference using the specified model"""
try:
response = completion(
model=self.model_id,
messages=[{"role": "user", "content": prompt}],
temperature=0.1
)
return response["choices"][0]["message"]["content"].strip()
except Exception as e:
print(f"Inference error: {str(e)}")
return "7" # Default to Customer Service
def get_category(self, email):
"""Get category as integer 0-9"""
categories = "\n".join(f"{k}: {v}" for k, v in CATEGORIES.items())
prompt = f"""Categorize this email into exactly ONE category by responding with ONLY its number (0-9):

Categories:
{categories}

Email subject: {email['subject']}
Email body: {email['body']}

Respond with ONLY a single number 0-9:"""
try:
response = self._run_inference(prompt)
category = int(response)
if category not in CATEGORIES:
return 7 # Default to Customer Service
return category
except:
return 7
@weave.op
def get_urgency(self, email):
"""Get urgency as integer 0-2"""
prompt = f"""Rate this email's urgency with ONLY ONE number:
2: HIGH - Emergency, system down, multiple users affected
1: MEDIUM - Important issue but not critical
0: LOW - Regular request, no immediate impact

Email subject: {email['subject']}
Email body: {email['body']}

Respond with ONLY the number (2, 1, or 0):"""
try:
response = self._run_inference(prompt)
urgency = int(response)
if urgency not in [0, 1, 2]:
return 0
return urgency
except:
return 0

def process_emails(self):
"""Process all emails and return summary"""
urgency_map = {2: "HIGH", 1: "MEDIUM", 0: "LOW"}
results = []
for email in self.emails:
# Get category
category_num = self.get_category(email)
category_name = CATEGORIES[category_num]
# Get urgency
urgency_num = self.get_urgency(email)
urgency_name = urgency_map[urgency_num]
results.append({
"subject": email["subject"],
"category_num": category_num,
"category": category_name,
"urgency_num": urgency_num,
"urgency": urgency_name
})
return results

class UrgencyRater:
def __init__(self, model_id="openai/gpt-4o"):
self.model_id = model_id
def _run_inference(self, prompt):
"""Run inference using the specified model"""
try:
response = completion(
model=self.model_id,
messages=[{"role": "user", "content": prompt}],
temperature=0.1
)
return response["choices"][0]["message"]["content"].strip()
except Exception as e:
print(f"Inference error: {str(e)}")
return "0" # Default to LOW urgency
@weave.op
def rate_urgency(self, email):
"""Rate email urgency from 0-2"""
prompt = f"""Analyze this email's urgency and respond with ONLY ONE number:

2: HIGH URGENCY - Critical issues:
- System down/major outage
- Multiple users/customers affected
- Security incidents
- Significant revenue impact
- Words like "urgent", "emergency", "immediate"

1: MEDIUM URGENCY - Important issues:
- Single user blocked
- Bug affecting functionality
- Billing problems
- Integration issues
- Performance problems

0: LOW URGENCY - Regular requests:
- Feature requests
- Documentation
- General questions
- Non-blocking issues
- Future planning

Email subject: {email['subject']}
Email body: {email['body']}

Respond with ONLY the number (2, 1, or 0):"""
try:
response = self._run_inference(prompt)
urgency = int(response)
if urgency not in URGENCY_LEVELS:
return 0
return urgency
except:
return 0
def batch_rate_emails(self, emails):
"""Rate urgency for a batch of emails"""
results = []
for email in emails:
urgency_num = self.rate_urgency(email)
results.append({
"subject": email["subject"],
"urgency_num": urgency_num,
"urgency": URGENCY_LEVELS[urgency_num],
"timestamp": email["timestamp"]
})
# Sort by urgency (high to low) and then by timestamp
return sorted(results,
key=lambda x: (-x["urgency_num"], x["timestamp"]))

if __name__ == "__main__":
# Test both systems
print("=== Category System Test ===")
category_system = EmailSystem()
category_results = category_system.process_emails()
for r in category_results:
print(f"\nSubject: {r['subject']}")
print(f"Category: {r['category_num']} ({r['category']})")
print(f"Urgency: {r['urgency_num']} ({r['urgency']})")
print("\n=== Urgency Rater Test ===")
urgency_system = UrgencyRater()
urgency_results = urgency_system.batch_rate_emails(MOCK_EMAILS)
for r in urgency_results:
print(f"\nSubject: {r['subject']}")
print(f"Urgency: {r['urgency_num']} ({r['urgency']})")
print(f"Timestamp: {r['timestamp']}")
This system processes customer emails by categorizing them and assigning urgency levels using an LLM. When an email arrives, the system generates a prompt containing the email's subject and body, which is passed to the LLM. The model responds with a category ID (e.g., Billing, Technical Issues, or Customer Service) and an urgency score (LOW, MEDIUM, or HIGH) based on predefined criteria.
The EmailSystem class handles categorization and urgency scoring by calling the model with structured prompts. The UrgencyRater class refines urgency classification by detecting key phrases and context indicative of high-priority issues. Once categorized, emails are sorted by urgency and timestamp, ensuring that pressing concerns like billing disputes or system outages receive immediate attention.
To improve efficiency, the system integrates Weave for real-time monitoring, allowing businesses to track AI performance, detect anomalies, and refine prioritization logic dynamically. While this tutorial uses a mock dataset, the system can be connected to a live inbox via an API to process real customer emails automatically, reducing manual workload and improving response times.
By leveraging AI for email analysis, retailers can ensure efficient and proactive customer service, ultimately driving higher satisfaction and operational efficiency.

AI-powered product recommendation system for retail

Retailers use personalized recommendations to increase customer engagement and sales. AI-driven recommendation systems analyze product similarities, browsing behavior, and past interactions to suggest relevant items. In this tutorial, we will build a mock retail website featuring Burberry’s product catalog, complete with an AI-powered recommendation system.
Here's a screenshot of what the website will look like:

The website will allow users to browse categories, view individual product pages, and receive dynamic recommendations based on their interactions. To achieve this, we will use Burberry’s product dataset, which includes product titles, images, prices, and category information. The recommendation engine will track user browsing behavior, generate search queries using a large language model, and retrieve similar products through a vector database.
To power these recommendations, we will first create a vector database that stores product embeddings generated using OpenAI’s text-embedding model. This enables us to compare products based on their semantic meaning rather than just keywords. Then, we will integrate this into a Flask-based web application where users can explore products and receive personalized recommendations.
We will also use Weave to track how the AI selects and ranks recommendations, ensuring transparency and continuous optimization. By combining LLM-generated search queries, vector-based similarity search, and real-time tracking, this system will provide a highly adaptive and personalized shopping experience.

Building the vector database

To make effective recommendations, the system first needs a way to compare products beyond simple keyword matching. This is achieved by converting product descriptions into embeddings using OpenAI’s text-embedding model. These embeddings are stored in an in-memory vector database, allowing for efficient similarity-based retrieval.
The process involves loading product data, transforming descriptions into numerical vectors, and saving the metadata for later use. Once stored, these embeddings enable the system to find related products based on their semantic meaning rather than exact text matches.
import os
import pandas as pd
import pickle
from datasets import load_dataset
from langchain_openai import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_core.documents import Document

# Set up OpenAI API Key (Replace with your actual API key)
os.environ["OPENAI_API_KEY"] = "your api key"

# Load dataset from Hugging Face
dataset = load_dataset('DBQ/Burberry.Product.prices.United.States')
df = pd.DataFrame(dataset['train'])

# Drop missing values
df = df.dropna(subset=["title", "imageurl", "category2_code", "category3_code", "product_code"])

# Convert product descriptions into embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
documents = [
Document(
page_content=f"{row['title']} - Category: {row['category2_code']} > {row['category3_code']}",
metadata={
"title": row["title"],
"imageurl": row["imageurl"],
"price": row["price"],
"category": row["category2_code"],
"product_code": str(row["product_code"])
}
) for _, row in df.iterrows()
]

# Store only document data (not the full vector store)
with open("documents.pkl", "wb") as f:
pickle.dump(documents, f)

# Save metadata separately
with open("metadata.pkl", "wb") as f:
pickle.dump(df, f)

print("Documents and metadata saved successfully! You can now run the Flask app.")


Building the website and recommendation system

With the vector database in place, the system powers a personalized recommendation engine.
We'll build a simple front-end using Flask that allows users to browse product categories, view individual product pages, and receive AI-generated recommendations. The website dynamically organizes Burberry’s product catalog, displaying images, prices, and details in an intuitive interface. When a user views a product, the system retrieves similar items by generating a search query using a language model. This query is based on either all previously viewed products or just the currently selected item. The system then searches the vector database for the most relevant products and ranks them for display on the product detail page, ensuring personalized and seamless product discovery.
Here’s the code for the website:
from flask import Flask, request, render_template_string
import pickle
import os
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings
import random
from langchain.chat_models import ChatOpenAI
import weave; weave.init("retail-agent")

# Initialize LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)

# Set up OpenAI API Key
os.environ["OPENAI_API_KEY"] = "your api key"


app = Flask(__name__)

# Load stored documents (precomputed embeddings)
with open("documents.pkl", "rb") as f:
documents = pickle.load(f)

# Recreate the vector store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = InMemoryVectorStore(embeddings)
_ = vector_store.add_documents(documents)

# Load metadata (original product details)
with open("metadata.pkl", "rb") as f:
df = pickle.load(f)

# Track viewed products
viewed_products = []

# Home page template
home_template = """
<!DOCTYPE html>
<html>
<head>
<title>Product Categories</title>
<style> body { font-family: Arial, sans-serif; } </style>
</head>
<body>
<h1>Product Categories</h1>
<ul>
{% for category in categories %}
<li><a href="{{ url_for('subcategory_page', category=category) }}">{{ category }}</a></li>
{% endfor %}
</ul>
</body>
</html>
"""

# Subcategory page template
subcategory_template = """
<!DOCTYPE html>
<html>
<head>
<title>{{ category }} Subcategories</title>
<style> body { font-family: Arial, sans-serif; } </style>
</head>
<body>
<h1>{{ category }} Subcategories</h1>
<ul>
{% for subcategory in subcategories %}
<li><a href="{{ url_for('product_page', category=category, subcategory=subcategory) }}">{{ subcategory }}</a></li>
{% endfor %}
</ul>
</body>
</html>
"""

# Product listing page template
product_template = """
<!DOCTYPE html>
<html>
<head>
<title>{{ subcategory }} Products</title>
<style>
body { font-family: Arial, sans-serif; }
.product { display: flex; align-items: center; margin-bottom: 20px; }
.product img { width: 100px; margin-right: 10px; }
</style>
</head>
<body>
<h1>{{ subcategory }} Products</h1>
<ul>
{% for product in products %}
<li class="product">
<a href="{{ url_for('product_detail', category=category, subcategory=subcategory, product_id=product['product_code']) }}">
<img src="{{ product['imageurl'] }}" alt="{{ product['title'] }}">
</a>
<div>
<p><strong>{{ product['title'] }}</strong></p>
<p>Price: ${{ product['price'] }}</p>
</div>
</li>
{% endfor %}
</ul>
</body>
</html>
"""
product_detail_template = """
<!DOCTYPE html>
<html>
<head>
<title>{{ product['title'] }}</title>
<style>
body {
font-family: Arial, sans-serif;
margin: 0;
padding: 20px;
}
.page-container {
display: flex;
gap: 30px;
max-width: 1400px;
margin: 0 auto;
}
.product-container {
flex: 0 0 600px;
position: sticky;
top: 20px;
align-self: flex-start;
}
.product-container img {
width: 100%;
max-width: 500px;
height: auto;
}
.recommendations {
flex: 1;
min-width: 0;
}
.recommendations-grid {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(200px, 1fr));
gap: 20px;
margin-top: 20px;
}
.recommendation-item {
text-align: center;
}
.recommendation-item img {
width: 100%;
height: 200px;
object-fit: contain;
}
.recommendation-item p {
margin: 5px 0;
font-size: 14px;
}
.back-link {
display: block;
margin-top: 20px;
}
h1 {
font-size: 24px;
margin-bottom: 20px;
}
h2 {
font-size: 20px;
margin-bottom: 15px;
}
</style>
</head>
<body>
<div class="page-container">
<div class="product-container">
<h1>{{ product['title'] }}</h1>
<img src="{{ product['imageurl'] }}" alt="{{ product['title'] }}">
<p><strong>Category:</strong> {{ product['category2_code'] }} - {{ product['category3_code'] }}</p>
<p><strong>Price:</strong> ${{ product['price'] }}</p>
<a href="{{ url_for('product_page', category=category, subcategory=subcategory) }}" class="back-link">Back to Products</a>
</div>

<div class="recommendations">
<h2>Recommended Products</h2>
<div class="recommendations-grid">
{% for rec in recommendations %}
<div class="recommendation-item">
<a href="{{ url_for('product_detail', category=rec['category'], subcategory=rec['subcategory'], product_id=rec['product_code']) }}">
<img src="{{ rec['imageurl'] }}" alt="{{ rec['title'] }}">
</a>
<p><strong>{{ rec['title'] }}</strong></p>
<p>${{ rec['price'] }}</p>
</div>
{% endfor %}
</div>
</div>
</div>
</body>
</html>
"""
@app.route("/")
def home():
"""Display the main categories."""
categories = df["category2_code"].dropna().unique()
return render_template_string(home_template, categories=categories)

@app.route("/category/<category>")
def subcategory_page(category):
"""Display subcategories under a main category."""
subcategories = df[df["category2_code"] == category]["category3_code"].dropna().unique()
return render_template_string(subcategory_template, category=category, subcategories=subcategories)

@app.route("/category/<category>/<subcategory>")
def product_page(category, subcategory):
"""Display products under a specific subcategory."""
products = df[(df["category2_code"] == category) & (df["category3_code"] == subcategory)][["title", "imageurl", "price", "product_code"]].head(20).to_dict(orient="records")
return render_template_string(product_template, category=category, subcategory=subcategory, products=products)



@app.route("/category/<category>/<subcategory>/<product_id>")
def product_detail(category, subcategory, product_id):
"""Display product details with recommendations from two separate LLM queries."""
product = df[df["product_code"].astype(str) == product_id].iloc[0].to_dict()

# Add product to viewed list
if product not in viewed_products:
viewed_products.append(product)

# Generate two separate recommendation queries
search_query_all_products = generate_recommendation_query(all_products=True)
search_query_current_product = generate_recommendation_query(current_product=product)

recommended_products = []

# Fetch general recommendations from all viewed products
if search_query_all_products:
search_results_all = vector_store.similarity_search(search_query_all_products, k=50)
sampled_results_all = random.sample(search_results_all, min(10, len(search_results_all)))

recommended_products.extend([
{
"title": doc.metadata.get("title", "Unknown Title"),
"imageurl": doc.metadata.get("imageurl", ""),
"price": doc.metadata.get("price", "N/A"),
"category": doc.metadata.get("category2_code", "Unknown"),
"subcategory": doc.metadata.get("category3_code", "Unknown"),
"product_code": doc.metadata.get("product_code", "Unknown")
}
for doc in sampled_results_all
])

# Fetch 5 recommendations similar to the currently viewed product
if search_query_current_product:
search_results_current = vector_store.similarity_search(search_query_current_product, k=5)
recommended_products.extend([
{
"title": doc.metadata.get("title", "Unknown Title"),
"imageurl": doc.metadata.get("imageurl", ""),
"price": doc.metadata.get("price", "N/A"),
"category": doc.metadata.get("category2_code", "Unknown"),
"subcategory": doc.metadata.get("category3_code", "Unknown"),
"product_code": doc.metadata.get("product_code", "Unknown")
}
for doc in search_results_current
])

return render_template_string(product_detail_template, category=category, subcategory=subcategory, product=product, recommendations=recommended_products)

@weave.op
def generate_recommendation_query(all_products=False, current_product=None):
"""Generate search queries for product recommendations using an LLM."""
if all_products and viewed_products:
sampled_products = random.sample(viewed_products, min(3, len(viewed_products)))
product_descriptions = [
f"Product: {p.get('title', '')}, Category: {p.get('category2_code', '')}, Subcategory: {p.get('category3_code', '')}"
for p in sampled_products
]
prompt = f"""
The user has viewed the following products:
{', '.join(product_descriptions)}

Based on these products, generate a concise search query to find similar items with related styles, names, or categories.
You can suggest other product categories that the user might be interested in.
RESPOND ONLY with the query.
"""
elif current_product:
prompt = f"""
The user is currently viewing:
Product: {current_product.get('title', '')}, Category: {current_product.get('category2_code', '')}, Subcategory: {current_product.get('category3_code', '')}

Generate a concise search query to find similar products with related styles, names, or categories.
RESPOND ONLY with the query.
"""
else:
return None

response = llm.invoke(prompt)
return response.content.strip()


if __name__ == "__main__":
app.run(debug=True)

The system tracks browsing history to refine recommendations, ensuring that past interactions influence future suggestions. Unlike traditional recommendation systems that rely solely on collaborative filtering or static rules, this system dynamically generates search queries using an LLM. Instead of simply matching users with frequently purchased products, the LLM interprets the semantic meaning of viewed products and formulates a search query tailored to the user’s browsing behavior.
The system tracks previously viewed products in a session and generates two separate recommendation queries using an LLM. The first query is based on up to three randomly selected products from the user's viewed history. The LLM uses these product descriptions to generate a search query that captures common themes, styles, and categories. The second query is generated solely from the currently viewed product, ensuring precise recommendations based on its attributes.
Once these queries are created, they are used to perform vector similarity searches in the product database. For the first query, the system retrieves up to 50 matching products and randomly selects 10 for display to introduce variation. For the second query, it retrieves the 5 most relevant matches based purely on vector similarity. This approach balances generalization (recommendations based on overall browsing behavior) with specificity (recommendations based on the current product), ensuring more relevant and personalized product suggestions.
The bulk of our recommendation system can mostly be seen clearly in the generate_recommendation_query function which essentially generates a query which will be used to search through our vector database:
def generate_recommendation_query(all_products=False, current_product=None):
"""Generate search queries for product recommendations using an LLM."""
if all_products and viewed_products:
sampled_products = random.sample(viewed_products, min(3, len(viewed_products)))
product_descriptions = [
f"Product: {p.get('title', '')}, Category: {p.get('category2_code', '')}, Subcategory: {p.get('category3_code', '')}"
for p in sampled_products
]
prompt = f"""
The user has viewed the following products:
{', '.join(product_descriptions)}


Based on these products, generate a concise search query to find similar items with related styles, names, or categories.
You can suggest other product categories that the user might be interested in.
RESPOND ONLY with the query.
"""
elif current_product:
prompt = f"""
The user is currently viewing:
Product: {current_product.get('title', '')}, Category: {current_product.get('category2_code', '')}, Subcategory: {current_product.get('category3_code', '')}


Generate a concise search query to find similar products with related styles, names, or categories.
RESPOND ONLY with the query.
"""
else:
return None


response = llm.invoke(prompt)
return response.content.strip()
To track these interactions and refine recommendations over time, the system integrates Weave, which logs search queries, user interactions, and recommendation outputs. By adding the @weave.op decorator to our LLM inference function, we can track all inputs and outputs to the function inside Weave. Each time the LLM generates a search query, Weave records the process, allowing for real-time monitoring and adjustments. This tracking ensures that the recommendation system remains transparent, helping detect patterns in user preferences and refining search queries to improve accuracy.
Here’s a screenshot of one of the Weave traces from the website:

Using an LLM to generate search queries offers advantages over traditional recommendation methods. Methods like collaborative filtering requires extensive user data and struggles with new or niche products. In contrast, an LLM-driven approach enables recommendations based on meaning rather than rigid patterns, making it more flexible and responsive. The ability to generate contextually-aware queries ensures that even users with minimal browsing history receive relevant suggestions. By leveraging AI and Weave for real-time tracking, retailers can create a recommendation system that is adaptive, insightful, and effective at driving engagement and sales.

Conclusion

AI agents are revolutionizing retail by automating operations, enhancing customer engagement, and delivering real-time, personalized experiences. From email triage to intelligent recommendation systems, AI-driven solutions streamline processes, improve decision-making, and create seamless customer interactions. The shift from static, rule-based systems to adaptive, AI-powered agents enables retailers to stay agile and optimize workflows with minimal human intervention.
This project’s AI-powered recommendation system demonstrates how vector search, LLM-generated queries, and real-time tracking with Weave can deliver highly personalized shopping experiences. By continuously refining its understanding of user behavior, AI enhances product discovery and customer satisfaction.
As AI technology advances, its impact on retail will only grow. From conversational AI and predictive analytics to autonomous shopping assistants, AI-powered solutions will redefine how customers interact with brands. Retailers that embrace these innovations will not only boost efficiency and sales but also set new standards for personalized and intelligent shopping experiences.

Iterate on AI agents and models faster. Try Weights & Biases today.