Skip to main content

Building an LLM-Powered Data Analyst

Learn how to build an LLM-powered data analyst to streamline data analysis, automate insights, and enhance decision-making using natural language processing and tools like Weights & Biases.
Created on February 10|Last edited on January 22
Large language models (LLMs) are transforming the way we interact with data, enabling natural language queries and generating actionable insights with unprecedented ease. This article explores the concept of an LLM-powered data analyst, its applications across industries, and a technical guide to building one from scratch. From predictive insights to sentiment analysis, LLMs are redefining how businesses make informed decisions.
Source: Author

Table of contents



What is an LLM-powered data analyst?

An LLM-powered data analyst uses large language models to analyze and interpret vast datasets through natural language. It allows users to ask complex questions in plain English and receive meaningful answers.
These systems:
  • Automate report generation.
  • Identify patterns and trends.
  • Predict outcomes based on historical data.
Essentially, they serve as an intelligent assistant capable of providing insights round-the-clock, making them indispensable for businesses seeking to enhance efficiency and decision-making.

Applications of LLM-powered data analysts

LLM-powered data analysts are transformative in various domains:
  • E-commerce: Deliver personalized product recommendations by analyzing user behavior and market trends.
  • Healthcare: Diagnose diseases and suggest treatments by analyzing clinical data and research papers.
  • Real estate: Analyze housing prices, demographics, and economic indicators to understand market trends.
For example, AIDA Cruises uses a social listening tool powered by LLMs to capture real-time customer feedback, uncover trending topics, and enhance customer engagement.

How does an LLM-powered data analyst work?

Pre-trained LLMs serve as a robust foundation for developing LLM-powered data analysts, offering immediate capabilities without the need for extensive upfront training. These models can be evaluated for task-specific suitability, enabling teams to quickly identify the best fit for their requirements.
This approach minimizes time spent on data collection and preprocessing, accelerating deployment and testing. However, when pre-trained models fall short of domain-specific needs, fine-tuning or custom training becomes necessary to align the model with specialized objectives, ensuring the data analyst delivers precise and actionable insights.


Data collection

Data is sourced from APIs, web scraping, or direct feeds, including structured (e.g., databases) and unstructured formats (e.g., social media posts).

Data processing

  • Cleaning: Ensures data quality by handling errors, duplicates, and missing values.
  • Organization: Prepares data for analysis through tokenization and normalization.
  • Feature extraction: Identifies attributes like entities and sentiment using NLP techniques.

Modeling

Pre-trained LLMs are fine-tuned for specific tasks, leveraging deep learning to understand language nuances.

Insight generation

How might I improve this paragraph,  using natural language generation (NLG) to create human-readable summaries and visualizations tailored to decision-mak

Core technologies behind an LLM-powered data analyst

Given the immense volume of textual data on which LLMs are trained, the computational resources required for both initial training and subsequent tasks like fine-tuning are substantial. LLMs efficiently manage this challenge through:
Key technologies include:
  • Tokenization: Breaks text into smaller units for analysis.
  • Embeddings: Encodes semantic meaning into numerical representations.
  • Attention mechanisms: Focuses on relevant text for better comprehension.
  • Layered neural networks: Extracts complex patterns from data.
Tools like Weights & Biases enhance model performance through:

Building an LLM-Powered data analyst with Python

Let's now build an LLM-powered data analyst from scratch. This data analyst will use customer feedback data on food reviews, perform analysis on the dataset such as:
  1. finding its distribution,
  2. calculate statistics using prompts, and
  3. generate the sentiment of customers to help Amazon improve the products they allow to be listed on their platform, allowing improved customer satisfaction.

Step One: Importing the libraries and set up the environment

pip install openai wandb pandas

import pandas as pd
import os
import openai
import wandb
# Initialize a new Weights & Biases project
wandb.init(project='openai_data_analyst', name='LMM')

Step Two: Load the dataset

We will be using the dataset containing reviews of fine foods from Amazon. This data is publicly available on Kaggle, holding the textual reviews from the customers and the corresponding score rating from 1 to 5.
We read the dataset using pandas and set up a new wandb run to log the model’s predictions.
df = pd.read_csv('/content/kaggle/Reviews.csv')

Step Three: Set up the OpenAI API key

from IPython.display import Markdown, display
os.environ["OPENAI_API_KEY"] = 'add your openai API key'
openai.api_key = os.environ["OPENAI_API_KEY"]
In the sections below, we will first see how a pre-trained LLM can be used as a data analyst to explore the distribution of data, and then use the model to predict customer sentiment.
This model will be compared against a fine-tuned model to evaluate the purpose of fine-tuning. Lastly, we will explore the use cases of LLM-powered data analysis.

Step Four: LLM Powered Exploratory Data Analysis

Before getting the model to predict the sentiment, we will first explore the distribution of data to get a better understanding of what we are dealing with. For this, we will utilize a pre-trained model: ‘GPT-4o-mini’ to prompt our data for a response, instead of writing lines and lines of code.
Define the function that takes in data as context along with the model, and the prompt for which to retrieve the answer. The system's role is to only answer the questions from the provided context.
# Retrieve response from the model
def retrieve_analyst_response(model):

response = openai.chat.completions.create(
messages=[
{"role": "system", "content": "You are a helpful assistant who answers only from the given Context."},
{"role": "user", "content": "Context: " + context + "\n\n Query: " + query},
],
model= model, # Choose the model you wish to use
temperature=0,
max_tokens=60
)
print(response.choices[0].message.content)
Now, let's query the data to see if it has repeated products listed.
query = "Does the data have repeated ProductId?"
retrieve_analyst_response("GPT-4o-mini")

We can also retrieve the average score to see if the data is balanced or not.
query = "What is the average value for all Scores?"
retrieve_analyst_response("GPT-4o-mini")


Step Five: Evaluate performance using the pre-trained model

Now let's use the same pre-trained model: ‘GPT-4o-mini’ to predict customer feedback for the given reviews. Later on, this pre-trained model will be tested against a fine-tuned model to compare the performance in terms of predictions generated. Therefore, we first define a function ‘score_review’ and set the content for the system to rate customer feedback from 1 to 5 as a whole number. The results are retrieved from around 50 reviews of the entire data.
# Score with pretrained model
def score_review(review, score):
response = openai.chat.completions.create(
messages=[
{"role": "system", "content": "GPT is great at rating customer feedback from 1 to 5, where 1 is the worst and 5 is the best. Give the rating as a whole number."},
{"role": "user", "content": review},
],
model="GPT-4o-mini", # Choose the model you wish to use
temperature=0,
max_tokens=60
)

return response.choices[0].message.content


# Apply scoring to each review
df['pretrained_score'] = df.iloc[568400:].apply(lambda row: score_review(row['Text'], row['Score']), axis=1)

Step Six: Convert data to JSON for fine-tuning

To fine-tune the model on a given dataset, we first need to format the data in a way, such that, each row is a JSON object. The format is slightly different depending on the OpenAI model used which can be validated here.
# import pandas as pd
import json


# Specify the output file path
output_file_path = 'output_data.jsonl'


# Open the output file in write mode
with open(output_file_path, 'w') as file:
for index, row in df_downsampled.iterrows():
# Construct the JSON object for the current row
json_object = {
"messages": [
{"role": "system", "content": "GPT is great at rating customer feedback from 1 to 5, where 1 is the worst and 5 is the best. Give the rating as a whole number."},
{"role": "user", "content": row['Text']},
{"role": "assistant", "content": str(row['Score'])} # Ensure the score is a string
]
}
# Convert the dictionary to a JSON string and write it to the file
file.write(json.dumps(json_object) + '\n')


print(f"Data successfully written to {output_file_path}")

Step Seven: Fine-tune the model

After formatting the data we upload it for fine-tuning using the Files API. Once the data is uploaded successfully, a training file id is generated for your data that can be pasted to ‘training_file’. The fine-tuning process then begins; however, it must be noted that the finetuning model is not available on the fly. It takes some time for the model to become ready for use.
from openai import OpenAI
client = OpenAI()

client.files.create(
file=open("output_data.jsonl", "rb"),
purpose="fine-tune"
)


client.fine_tuning.jobs.create(
training_file="file-KDaUDoLb3qSbbyvv1oq8Ts2h",
model="GPT-4o-mini"
)
The status shows the process of fine-tuning the model as shown in the screenshots below. Once the status update succeeds, the fine_tuned_model is updated from None to the job-id. This is copied for using the fine-tuned model to retrieve predictions.

The status of the model can be verified using the code below:
openai.fine_tuning.jobs.retrieve("ftjob-your job id here")


Step Eight: Evaluate the performance using our fine-tuned model

Finally, the fine-tuned model is used to retrieve the predicted scores for the customer feedback. The content of the system is kept the same as the base model for unbiased comparison. However, modifying this content is part of prompt engineering and will lead to different results with different prompts.
results = []
correct_predictions = 0
# Score with pretrained model
def score_review(review, original_score, old_score):
response = openai.chat.completions.create(
messages=[
{"role": "system", "content": "GPT is great at rating customer feedback from 1 to 5, where 1 is the worst and 5 is the best. Give the rating as a whole number only."},
{"role": "user", "content": review},
],
model="GPT-4o-mini:personal::8otpi9E6", # Choose the model you wish to use
temperature=0,
max_tokens=60
)
results.append({
"sentiment": review,
"original_prediction": original_score,
"labeled_prediction": response.choices[0].message.content,
"old_model_prediction": old_score
})

return response.choices[0].message.content


# Apply scoring to each review
df['finetuned_score'] = df.iloc[568400:].apply(lambda row: score_review(row['Text'], row['Score'], row['pretrained_score']), axis=1)

Step Nine: Log the output to Weights & Biases and evaluate the models

The resulting predictions and corresponding sentiments are logged into Weights & Biases for analysis. Different prompts and hyperparameters can be tested and the results be logged for efficient comparison of the varied model’s performance. This helps to finalize the best-performing parameters and prompts.
# Convert results list to DataFrame
df_results = pd.DataFrame(results)


# Log the entire DataFrame as a table to W&B
wandb.log({"results_table": wandb.Table(dataframe=df_results)})
The table below shows the logged summary in Weights & Biases. It can be noted that the results for the fine-tuned model are better than the base model. Additionally, it must be noted that even though both models had the same prompt which mentioned: ‘return the score as a whole number’ the base model returned the score along with some text. This makes it difficult to compare the results in terms of accuracy which requires further preprocessing to extract the whole numbers from the textual response. The fine-tuned model did not make this mistake in any case.
Moreover, the model was fine-tuned using only 1000 samples from the entire data and the results are quite satisfactory. This number can be increased for more accurate results.



Step Ten: Further evaluation of the fine-tuned model in other scenario

After obtaining the sentiments for our data, we will now see the different use cases of LLM-powered data analysts in action. For this, we will first convert our CSV data to JSON for input to the model. Then we will act as a data analyst and prompt the model for questions about the data. We are using a part of our data for this analysis that is saved to test_df.
#extract a part of dataframe for analysis
test_df = df.iloc[568350:568400]


#convert the dataframe to json to pass as context to the model
context = test_df.head().to_json(orient="records")
context

Now, let's prompt the fine tuned model with questions a data analyst might ask and see how it responds:

query = "For the ProductId: B003O5Q3KE, what food is the Text about?"
retrieve_analyst_response('ft:GPT-4o-mini:personal::8otpi9E6'
)

query = "For Corriander Chutney, is the sentiment positive or negative?"
retrieve_analyst_response('ft:GPT-4o-mini:personal::8otpi9E6'
)

query = "Let's say if the value of finetuned_score is less than 3 the sentiment is negative and if it is greater than the sentiment is positive. Is the average sentiment positive or negative?"
retrieve_analyst_response('ft:GPT-4o-mini:personal::8otpi9E6'
)

A major part of data analysis is graphical visualizations that help breakdown the data in a more comprehensive way. However, the API is not directly capable of doing so, as demonstrated in the prompt below. To cater for this, libraries such as pandas_gpt can be leveraged that allow us to handle pandas DataFrame using ChatGPT prompts.
query = "Plot the distribution of Score using bargraph"
retrieve_analyst_response('ft:GPT-4o-mini:personal::8otpi9E6')


Conclusion

LLM-powered data analysts simplify data analysis, making advanced insights accessible across industries. By combining LLM capabilities with tools like Weights & Biases, organizations can enhance efficiency, improve decision-making, and gain a competitive edge. As these technologies evolve, their applications will expand, transforming how we understand and utilize data.
Iterate on AI agents and models faster. Try Weights & Biases today.