Building reliable apps with GPT-4o and structured outputs
Learn how to enforce consistency on GPT-4o outputs, and build reliable Gen-AI Apps.
Created on October 8|Last edited on October 17
Comment
The world is awash with unstructured data, from free-form text and images to complex interactions in voice and video. However, to unlock the true potential of this information, it needs to be structured and organized.
Once structured, this data can be harnessed in a multitude of ways, powering a range of use cases from retrieval-augmented generation (RAG) systems to recommendation engines and code generation tools. In this article, we’ll explore the concept of structured outputs, discuss their advantages, and build a few fun projects that showcase their impact on AI-driven solutions.

What do we mean by structured outputs? Integration of structured outputs in the OpenAI APIPractical examples Categorization: Creating a research paper indexUse in a retrieval-augmented generation system: Building a structured databaseCode generation from voice commands: Converting audio instructions to structured JSON for a form builderConclusion
What do we mean by structured outputs?
Structured outputs refer to the organized and formatted data generated by AI models that adhere to predefined schemas. This ensures that the generated information is both consistent and easily consumable by various systems, enhancing its overall readability and usability.
In traditional AI applications, ensuring that outputs conform to a desired structure has often relied on prompt engineering or post-processing layers. This approach can be error-prone and difficult to maintain, especially when working with complex data structures. Structured outputs, as implemented in the OpenAI API, solve these challenges by allowing developers to define schemas directly within the API, using formats such as JSON Schema. This guarantees that the model's response will match the expected format, minimizing errors and reducing the need for validation or reformatting.
To use structured outputs in the OpenAI API, include the schema definition in the response_format parameter when making a request. This will enforce the model to generate responses that match the given format, eliminating the need for manual validation or prompt engineering tricks. If a response doesn’t conform to the schema, you can handle it programmatically, such as logging a refusal message or applying conditional logic.
There are two primary methods for applying structured outputs: using JSON Schema or function calling. JSON Schema is ideal when you want the model’s response to strictly follow a specific structure, such as when formatting data for storage or display. Function calling, on the other hand, is used when you want the model to interact with tools or external systems. We will focus on using the JSON schema for this tutorial.
By leveraging structured outputs, you minimize issues like missing fields or invalid data types, ensuring your application can seamlessly integrate with the model’s output without additional error handling or complex post-processing steps.
Integration of structured outputs in the OpenAI API
The integration of structured outputs in the OpenAI API is designed to provide robust control over the model's responses. By defining a schema that the model adheres to, developers can ensure that critical elements—such as required fields or valid enumerated values—are always present. This approach differs from earlier methods where developers might have had to write extensive prompts to coax the model into producing the right format or manually verify the output afterwards.
Structured outputs provide several key benefits that enhance the overall efficiency and reliability of AI-driven applications. Improved data handling is achieved through adherence to predefined schemas, ensuring that all generated responses conform to the expected format. This eliminates the need for complex error handling or validation processes, saving developers time and effort. As a result, applications can function more predictably, reducing bugs and inconsistencies.
Additionally, the better system integration offered by structured outputs enables smooth data exchange between systems. With consistent formatting, there is less chance of data mismatches or integration issues, making it easier to pass structured information across various applications, databases, or APIs.
Finally, enhanced user experience is another advantage. Structured outputs improve the readability of responses and allow developers to design intuitive interfaces that can present different parts of the model's output in distinct ways. This leads to clearer communication of information and a more engaging interaction for users.
Practical examples
In this section, we’ll explore three primary use cases that illustrate the power and versatility of structured outputs in real-world applications. Each use case will highlight how structured outputs can be leveraged to enforce data consistency, making it easier to work with structured responses in a variety of scenarios.
Categorization: Creating a research paper index
We’ll demonstrate how to use structured outputs to categorize research papers into a broad set of predefined categories, effectively building an index of research papers. Using a structured JSON Schema format, we ensure that each document is classified consistently, with outputs adhering to categories such as "Supervised Learning," "Reinforcement Learning," or "Natural Language Processing."
This structured categorization process allows for organization and retrieval of research documents, making it easier to search and analyze the indexed content. This index can then serve as the backbone of a larger knowledge management system, providing a solid foundation for further analysis and applications.
Here’s some code that loads AI research papers and then classifies them using OpenAI structured outputs:
import osimport jsonimport arxivimport shutilfrom PyPDF2 import PdfReaderfrom openai import OpenAIimport weave# Initialize Weave and OpenAIweave.init("paper_classification")api_key = os.getenv('OPENAI_API_KEY')model = "gpt-4o-mini"client = OpenAI(api_key=api_key)# Directory to download and categorize papersdownload_dir = "./arxiv_papers"if not os.path.exists(download_dir):os.makedirs(download_dir)# List of machine learning categoriescategories = ["Supervised Learning", "Unsupervised Learning", "Reinforcement Learning", "Deep Learning","Natural Language Processing", "Computer Vision", "Graph Neural Networks", "Transfer Learning","Meta-Learning", "Few-Shot Learning", "Self-Supervised Learning", "Representation Learning","Multi-Modal Learning", "Generative Adversarial Networks (GANs)", "Bayesian Methods","Probabilistic Models", "Federated Learning", "Privacy-Preserving ML", "Fairness and Bias in ML","Explainable AI", "Optimization Algorithms", "Adversarial Robustness", "Causal Inference","Anomaly Detection", "Time Series Analysis", "Graph-Based Learning", "Knowledge Graphs","Ontology Learning", "Recommender Systems", "Information Retrieval", "Domain Adaptation","Semi-Supervised Learning", "Data Augmentation Techniques", "Multi-Agent Systems","Human-in-the-Loop Learning", "Curriculum Learning", "Active Learning", "Imitation Learning","Inverse Reinforcement Learning", "Policy Optimization", "Robustness to Distribution Shifts","Neural Architecture Search (NAS)", "Hyperparameter Optimization", "Neurosymbolic AI","Neural Ordinary Differential Equations", "Memory-Augmented Networks", "Recurrent Neural Networks (RNNs)","Long Short-Term Memory (LSTM)", "Transformer Models", "Attention Mechanisms","Pre-trained Language Models (e.g., BERT, GPT)", "Contrastive Learning", "Energy-Based Models","Neural Style Transfer", "Object Detection", "Segmentation Models", "Image Generation", "3D Vision","Motion Prediction", "Speech Recognition", "Speech Synthesis", "Emotion Recognition","Text Generation", "Summarization", "Machine Translation", "Question Answering", "Dialogue Systems","Conversational AI", "Autonomous Systems", "Robotics and Control", "Game Theory in ML","Synthetic Data Generation", "Biomedical Data Analysis", "Bioinformatics", "Healthcare Applications of ML","Drug Discovery", "Predictive Maintenance", "Financial Modeling", "Climate Modeling","Physics-Informed Learning", "Chemistry Applications", "Material Science Applications","Social Network Analysis", "Sentiment Analysis", "Text Mining", "Data Mining", "Complex Systems","Ensemble Methods", "Evolutionary Algorithms", "Quantum Machine Learning", "ML System Performance Optimization","ML in Edge Computing", "ML for Internet of Things (IoT)", "Multi-Task Learning", "Continual Learning","Neural-Symbolic Learning", "Vision-Language Models", "Zero-Shot Learning", "Learning from Demonstration","Neural Network Pruning"]# Define a function to read the first 1000 characters of a PDFdef read_pdf_first_1000_chars(pdf_path):try:with open(pdf_path, 'rb') as file:reader = PdfReader(file)text = ""for page in reader.pages:text += page.extract_text()if len(text) >= 1000:return text[:1000]except Exception as e:print(f"Failed to read {pdf_path}: {e}")return ""# Define a function to categorize a paper based on its content using structured output@weave.opdef categorize_paper(text):# Define the JSON schema for structured output with enum categories (not required but helpful)category_schema = {"type": "json_schema","json_schema": {"name": "paper_category_response","schema": {"type": "object","properties": {"category": {"type": "string","enum": categories, # Use the list of categories as enum options"description": "The category of the research paper"}},"required": ["category"], # Ensure that the response contains a category"additionalProperties": False,"strict": True}}}# Create the prompt for categorizing the textprompt = f"""Based on the following text from a research paper, categorize it into one of the following machine learning topics: {', '.join(categories)}.Please respond with a JSON object in the format: {{"category": "Category Name"}}.Research Paper Content:{text}"""# Make the API request to categorize the text using structured outputresponse = client.chat.completions.create(model=model,messages=[{"role": "system", "content": "You are a categorization assistant."},{"role": "user", "content": prompt}],response_format=category_schema, # Use structured output format with enummax_tokens=50,temperature=0.3)# Parse the model's response to extract the categoryresult = response.choices[0].message.content.strip()try:result_json = json.loads(result)category = result_json.get("category", "Uncategorized")except json.JSONDecodeError:category = "Uncategorized"return category# Define a function to move the PDF to the appropriate category folderdef move_pdf_to_category(pdf_path, category):category_dir = os.path.join(download_dir, category.replace(" ", "_"))if not os.path.exists(category_dir):os.makedirs(category_dir)shutil.move(pdf_path, os.path.join(category_dir, os.path.basename(pdf_path)))print(f"Moved {pdf_path} to {category_dir}")# Download recent papers from arXivquery = "machine learning"max_results = 1 # Change to a larger number as neededsearch = arxiv.Search(query=query,max_results=max_results,sort_by=arxiv.SortCriterion.SubmittedDate)# Iterate through each result and categorize the paperfor result in search.results():print(f"Downloading: {result.title}")paper_id = result.entry_id.split('/')[-1]pdf_url = result.pdf_urlfilename = f"{paper_id}.pdf"result.download_pdf(dirpath=download_dir, filename=filename)# Read the first 100 characters of the downloaded PDFpdf_path = os.path.join(download_dir, filename)text_snippet = read_pdf_first_1000_chars(pdf_path)if text_snippet:print(f"Categorizing paper: {filename}")# Use the categorize_paper function to get the categorycategory = categorize_paper(text_snippet)print(f"Assigned Category: {category}")# Move the PDF to the appropriate category foldermove_pdf_to_category(pdf_path, category)else:print(f"Failed to extract text from {filename}")
The script uses structured outputs to categorize research papers based on predefined machine learning categories. It leverages W&B Weave to log and track various inputs and outputs of the categorization function, making it easier to monitor and debug the model's predictions.
When a research paper is downloaded, a snippet of its content is extracted using PyPDF2 and passed through the categorize_paper function. This function sets up a JSON schema with enumerated categories, ensuring that the model output adheres to the defined structure. Using OpenAI’s API, the model generates a response in JSON format that is parsed and used to determine the category of the paper. Weave tracks these results, ensuring that every inference and categorization can be revisited or shared, providing transparency and ease of analysis. This is particularly useful for building large-scale, organized databases of research papers.
Use in a retrieval-augmented generation system: Building a structured database
In a previous project, I built a RAG-based restaurant menu, which basically allows users to search for menu items using natural language queries. In order to accomplish this, I had to first create a structured list of items in the form of a JSON object.
This required me creating a very complex prompt that would coax an LLM to output a structured JSON object given and unstructured body of text in the form of a a menu PDF. Here, we will focus on creating this same structured database that will later be used in a RAG system, without the use of a complex prompt, and leveraging structured outputs. This step involves organizing menu items into a structured JSON format, ensuring that the data is clean, well-organized, and easy to work with.
Once structured, this large JSONL object (basically a list of JSON objects) will be ready for vectorization (which is a major step in building a RAG system). By ensuring that the data adheres to a predefined schema before vectorization, we can maintain consistency and improve the overall efficiency of the retrieval and generation processes that RAG systems rely on. The script uses structured outputs to ensure that menu data is cleanly formatted before being stored in a structured format. This database will later serve as a core part of a RAG system.
import osimport jsonimport PyPDF2import refrom openai import OpenAIimport weave# this is for a menu rag system which allows users to search for menu items using# natural langauge# Initialize Weaveweave.init("menu_standardization")# Use your hardcoded OpenAI API key (set your key here)api_key = os.getenv('OPENAI_API_KEY')client = OpenAI(api_key=api_key)# Define a function to split the text into manageable chunksdef split_text(text, chunk_size=4000, overlap=500):chunks = []start = 0while start < len(text):if start + chunk_size > len(text):chunks.append(text[start:])else:end = start + chunk_sizechunks.append(text[start:end + overlap])start += chunk_sizereturn chunks# Define a function to read and process the menu PDFdef read_and_process_menu(pdf_path):menu_data = []# Define the structured output schema expecting an array of objectsmenu_items_schema = {"type": "json_schema","json_schema": {"name": "menu_items_response","schema": {"type": "object","properties": {"items": { # The root property is an array named "items""type": "array","items": {"type": "object","properties": {"title": {"type": "string","description": "The name of the menu item"},"description": {"type": "string","description": "A detailed description of the menu item"},"keywords": {"type": "string","description": "Comma-separated keywords for the menu item (e.g., 'dessert, chicken, side')"}},"required": ["title", "description", "keywords"], # All three fields are required"additionalProperties": False}}},"required": ["items"], # Ensure that "items" is included in the response"additionalProperties": False}}}# Read the PDF and extract textwith open(pdf_path, 'rb') as file:reader = PyPDF2.PdfReader(file)for page_num in range(len(reader.pages)):page_text = reader.pages[page_num].extract_text()text_chunks = split_text(page_text)for chunk in text_chunks:# Create the prompt for each chunkprompt_text = "Convert the following menu text into a JSON array with each item containing 'title', 'description', and 'keywords'."# Make the API request using the structured output formatresponse = client.chat.completions.create(model="gpt-4o-mini",messages=[{"role": "system", "content": "You are a helpful assistant that structures menu items into JSON array format."},{"role": "user", "content": prompt_text + " Here is the text: " + chunk}],response_format=menu_items_schema # Use structured output schema)# Parse the model's response to extract the structured outputresult = response.choices[0].message.content.strip()try:# Parse the structured JSON responseparsed_response = json.loads(result)# Expecting an "items" key containing the list of menu itemsmenu_items = parsed_response.get("items", [])# Add the page number to each menu itemfor item in menu_items:item['page'] = page_num + 1# Append the parsed items to the menu datamenu_data.extend(menu_items)except json.JSONDecodeError as e:print(f"JSON parsing failed for page {page_num + 1}: {e}")print(f"Response content: {result}")# Remove duplicates based on titleunique_menu_data = {item['title']: item for item in menu_data}menu_data = list(unique_menu_data.values())return menu_data# Replace 'menu.pdf' with the path to your PDF filepdf_path = './menu.pdf'menu_items = read_and_process_menu(pdf_path)# Save the results to a JSON filewith open('gpt4_menu_data.json', 'w') as json_file:json.dump(menu_items, json_file, indent=4)print("Menu items successfully processed and saved to gpt4_menu_data.json")
The read_and_process_menu function reads a menu PDF and extracts text using PyPDF2, splitting it into chunks to fit within the model’s token limit. Each chunk is processed with a prompt that instructs the model to return the menu data as a structured JSON array, adhering to a schema defined using OpenAI's structured outputs. By enforcing a schema that includes fields like title, description, and keywords, the data is guaranteed to be consistently structured, which is crucial before vectorizing it for a RAG system.
This will create a folder with subdirectories for each category, and move relevant papers into the correct folder. Score.
Since we call weave.init(), and we are using the OpenAI API, all calls to the API will be "auto-logged" to weave, and we can later view the inputs and outputs to our function.
Here's what it looks like inside weave after running the script:

Code generation from voice commands: Converting audio instructions to structured JSON for a form builder
We will build a system that captures voice commands and turns them into structured form schemas, which can then be rendered as interactive HTML forms. Using a combination of OpenAI's Whisper API for audio transcription and the OpenAI model for generating structured outputs, this system allows users to speak form details aloud and have them automatically converted into a functioning web form.
Once the form schema is created, it's passed to a Flask app that dynamically generates the form fields and renders them in a web interface. The Flask app reads the JSON object and converts it into HTML components—like text fields, dropdowns, and radio buttons—based on the type of each field. The form can then be filled out and submitted through the web interface, making it easy to interact with the generated schema.
For this project, there are two main components: the voice-to-text processing module and the form generator, each working in tandem to transform voice commands into structured outputs effortlessly. This enables a new way of interacting with software using voice-based inputs for rapid prototyping and UI design.
Here's the code:
import openaiimport sounddevice as sdimport numpy as npimport scipy.io.wavfile as wavimport tempfilefrom openai import OpenAIimport osimport jsonimport weave# Set your OpenAI API key (replace 'YOUR_API_KEY' with your actual key)api_key = os.getenv('OPENAI_API_KEY')client = OpenAI(api_key=api_key)# Parameters for audio recordingSAMPLE_RATE = 24000 # Sample rate for recordingRECORD_DURATION = 10 # Maximum duration to record in seconds# Function to record audio using sounddevice and save it as a temporary .wav filefrom flask import Flask, render_template_string, requestapp = Flask(__name__)# Store the form schema in a global variableform_schema = Nonedef generate_form_from_json(json_input):"""Generates an HTML form based on the given JSON input."""form_html = '<form method="POST" action="/submit">\n'for field in json_input["fields"]:field_type = field.get("type", "text")field_label = field.get("label", "")field_name = field.get("name", "")if field_type == "text":form_html += f'<label>{field_label}</label><br>\n'form_html += f'<input type="text" name="{field_name}" required><br><br>\n'elif field_type == "dropdown":form_html += f'<label>{field_label}</label><br>\n'form_html += f'<select name="{field_name}">\n'for option in field.get("dropdown_options", []):form_html += f'<option value="{option}">{option}</option>\n'form_html += '</select><br><br>\n'elif field_type == "multiple_choice":form_html += f'<label>{field_label}</label><br>\n'for option in field.get("dropdown_options", []):form_html += f'<input type="radio" name="{field_name}" value="{option}" required>{option}<br>\n'form_html += '<br>\n'elif field_type == "yes_no":form_html += f'<label>{field_label}</label><br>\n'form_html += f'<input type="radio" name="{field_name}" value="Yes" required> Yes\n'form_html += f'<input type="radio" name="{field_name}" value="No" required> No<br><br>\n'form_html += '<input type="submit" value="Submit">\n'form_html += '</form>'return form_html@app.route('/form', methods=['GET'])def form():"""This route will render the form based on the global JSON schema."""global form_schemaif form_schema is None:return "No form schema provided."form_html = generate_form_from_json(form_schema)return render_template_string(form_html)@app.route('/submit', methods=['POST'])def submit():form_data = request.form.to_dict()return f"Form submitted successfully with data: {form_data}"def run_form_app(json_input, host='127.0.0.1', port=5000):"""Runs the Flask app with a provided JSON schema."""global form_schemaform_schema = json_inputapp.run(host=host, port=port)def record_audio():print("Press Enter to begin recording...")input() # Wait for the Enter key to start recordingprint("Recording... Speak into the microphone.")# Record audio using sounddevice directly in PCM16 formataudio_data = sd.rec(int(RECORD_DURATION * SAMPLE_RATE), samplerate=SAMPLE_RATE, channels=1, dtype='int16')sd.wait() # Wait until recording is finished# Save audio to a temporary .wav filetemp_wav_file = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)wav.write(temp_wav_file.name, SAMPLE_RATE, audio_data)print(f"Audio recording saved to: {temp_wav_file.name}")return temp_wav_file.name # Return the path of the recorded file# Function to send the audio file to the Whisper API for transcriptiondef transcribe_audio(file_path):with open(file_path, 'rb') as audio_file:# Call the Whisper API for transcriptiontranscription = client.audio.transcriptions.create(model="whisper-1",file=audio_file)return transcription.text # Return the transcription text# Function to handle the model inference and generate the form schema@weave.opdef generate_form_schema(user_prompt):# Make the API call to generate the form schema using the transcribed promptresponse = client.chat.completions.create(model="gpt-4o-2024-08-06",messages=[system_message,{"role": "user", "content": user_prompt} # Use transcribed text as user prompt],response_format=form_schema # Specify the response format using the structured output schema)# Extract the parsed form schema from the responseresponse_content = response.choices[0].message.content# Parse the JSON content into a Python dictionarygenerated_form_schema = json.loads(response_content)return generated_form_schema# Set up the system message to guide the modelsystem_message = {"role": "system","content": """You are a helpful assistant that generates form schemas in JSON format. The form schema should include:1. Use 'type', 'label', and 'name' for all fields.2. For dropdown fields, include a 'dropdown_options' property with a list of choices.3. Example structure:{"form": {"fields": [{"type": "text","label": "Player's Name","name": "player_name"},{"type": "dropdown","label": "Position","name": "position","dropdown_options": ["Forward", "Midfielder", "Defender", "Goalkeeper"]},{"type": "yes_no","label": "Previous Experience","name": "previous_experience"}]}}"""}# Create the JSON schema for structured outputs with a root object that contains a 'form' propertyform_schema = {"type": "json_schema","json_schema": {"name": "form_response","schema": {"type": "object","properties": {"form": {"type": "object","properties": {"fields": {"type": "array","items": {"type": "object","properties": {"type": {"type": "string","enum": ["text", "dropdown", "multiple_choice", "yes_no"]},"label": {"type": "string"},"name": {"type": "string"},"dropdown_options": {"type": ["array", "null"],"items": {"type": "string"},"description": "Options for dropdown fields, if applicable"}},"required": ["type", "label", "name"],"additionalProperties": False}}},"required": ["fields"],"additionalProperties": False}},"required": ["form"],"additionalProperties": False,"strict": True}}}# Main function to record audio, transcribe it, and generate the form schemaif __name__ == "__main__":# Record audio and save it to a temporary filerecorded_audio_path = record_audio()# Transcribe the recorded audio to textuser_prompt = transcribe_audio(recorded_audio_path)print(f"Transcribed user prompt: {user_prompt}")# Generate the form schema using the transcribed textgenerated_form_schema = generate_form_schema(user_prompt)# Display the generated schema to the userprint("Generated form schema:")print(json.dumps(generated_form_schema['form'], indent=2)) # Pretty print the dictionary# Launch the form builder with the generated schemarun_form_app(generated_form_schema['form'])
We chose to use the Whisper API for transcribing voice commands, as it provides a straightforward and reliable way to turn spoken language into text. The text (essentially a prompt from the user) is then used to create predefined form schemas that can be easily integrated into existing workflows.
Although OpenAI now has a fully multimodal option using the Streaming API, which allows real-time handling of text, images, and audio in a single session, it is still in beta and requires more a more complicated configuration compared to using Whisper in a two-stage system (in my humble opinion). The Streaming API would be suitable for applications needing to process multiple rounds of chat dialogue between the model and the user, but here, I chose to take advantage of Whisper’s dedicated focus on audio-to-text transcription, for a single audio prompt.
By leveraging structured outputs, we ensure that the generated form JSON object follows a predefined schema, making it easy to integrate into existing development workflows. This setup showcases how combining voice-to-text transcription with structured outputs can enable developers to generate code through spoken instructions, providing precise control over the format and structure without manual coding.
Weave is also used to log inputs and outputs at various stages of the form generation process, providing an easy way to track, analyze, and visualize the model's performance. By integrating Weave using the @weave.op decorator, we can capture the transcribed audio, the generated form schema, and the model's responses at different stages. This makes it easier to monitor the flow of data, debug issues, and share results with collaborators. Additionally, it offers transparency by allowing developers to revisit the transformation process from voice commands to a rendered form, enhancing the reliability of the application.
Here's a screenshot of a form that was generated purely from audio:

Conclusion
Structured outputs help simplify working with AI. Instead of dealing with messy or unpredictable responses, they let you enforce specific formats, making the data easier to use and reducing the headaches caused by post-processing. Whether you're categorizing papers, organizing menu items, or converting voice commands to code, structured outputs take away a lot of the tedious work. It’s a straightforward way to get reliable results and focus on what matters without wasting time fixing errors. In the end, it's just a practical tool that makes things run smoother—no hype, just less hassle. Thanks for reading.
How to train and evaluate an LLM router
This tutorial explores LLM routers, inspired by the RouteLLM paper, covering training, evaluation, and practical use cases for managing LLMs effectively.
How to fine-tune Phi-3 Vision on a custom dataset
Here's how to fine tune a state of the art multimodal LLM on a custom dataset
How to Fine-Tune LLaVA on a Custom Dataset
A tutorial for fine-tuning LLaVA on your own data!
Skin Lesion Classification on HAM10000 with HuggingFace using PyTorch and W&B
Explore the use of HuggingFace, PyTorch, and W&B for classifying skin lesions with the HAM10000 dataset. We will build, train, and evaluate models for medical diagnostics!
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.