Skip to main content

Bulk creating high-quality titles and descriptions with GPT-4o

Learn how to automate bulk creation of high-quality titles and meta descriptions for web pages using GPT-4o. This step-by-step guide covers everything from web scraping to Google Sheets integration, ensuring efficient and optimized content updates.
Created on November 27|Last edited on November 27
I had a problem. I had hundreds of pages (you may have more) that needed an overhaul of the titles and descriptions. The current ones were old, some were duplicates of copied pages, some were in the wrong case, some were missing, and some were just bad.
As I'm sure you can guess, my go-to solution was to automate the task with high-level prompting and a bit of scraping.
While it took a couple of hours to get the script right and tune the prompt to produce strong results with GPT-4o, that's way less time than it would take me to get as far as doing the job manually and with great results. On top of that, the script is reusable when I need it again—and now for you too.

What's going on?

If you're wondering what's going on in this script, essentially we're:
  1. Accessing the Google Sheet and referencing the URLs in column 1.
  2. Scraping the content using the Beautiful Soup library.
  3. Creating a summary of the content with gpt-4o-mini. Initially I was sending the whole content to GPT-4o. When you have hundreds of pages, this gobbles up tokens and dollars fast. Mini is far cheaper, so we're using it to summarize.
  4. Using this summary as well as the URL, current title, current description and a detailed list of rules and style guides to steer our system. The prompt below is less elaborate than the one I would use in a live case, but those get very specific. I've tried to write this one to illustrate the direction you can go and give you a strong starting spot.
  5. Write the new title and description to the appropriate columns of the Google Sheet, as well as the reason the system feels they are better.
  6. We'll also be writing this all to a W&B Table, which I find easier to work with when evaluating results.
Alright, let's dive in.

The tutorial for creating high-quality titles and descriptions with GPT-4o

A few things you'll need to have at the ready are:
  • Screaming Frog. I'm an SEO so you knew this was in the mix. I use Screaming Frog to crawl a website and pull the URLs and current title and description (title and description optional). The free version will do the trick if your site is under 500 pages, but if you prefer a different URL collection method, that's fine.
  • An OpenAI API key. We'll be using the API for GPT access. The costs you'll incur are pretty low (a few bucks) because I've taken the hit for you already in making a costly mistake.
  • Google service account. We need to access Google Sheets via API and to do this we'll use a service account. I found is a bit non-intuitive to download the first time, so I added it to a previous tutorial you'll find here.
  • Weights & Biases API key. We'll use this for easy filtering and evaluation while editing our prompt.
Once you have these, you're only a few minutes away from new titles and descriptions.
You'll notice that I ran this on the Anthropic website.
This was just for the demo, because I like them. I don't work for Anthropic. :)
💡

Step one: Crawl the site with Screaming Frog

The first step is to simply crawl the site you want to create new titles and descriptions for with Screaming Frog.
After you've installed it (if you haven't already got it installed) simple drop the homepage of the site you want to crawl into it, and let it run.

You'll then simply choose "HTML" from the drop-down above the URL's and then "Export".
This will download the list of pages.

Step two: Import it into Google Sheets

The next step is to simply import the CSV you just downloaded into Google Sheets (File > Import > Upload).

By default the page has a LOT of extra columns in it that we don't need for this task.
We're going to reduce it down to:
  • Column A: Address. The URL of the page.
  • Column B: Title 1. The title of the page.
  • Column C: Title New. This is where the LLM-generated title will go.
  • Column D: Meta Description 1. The description of the page.
  • Column E: Description New. this is where the LLM-generated description will go.
  • Column F: Reason. This is where the LLM will explain to us why they think their suggestions are good.

I recommend duplicating the tab once complete, and deleting all but the first few rows. This will give you a place to test the script that'll complete quickly, so if you want to edit your prompts to improve them, you haven't waiting and paid for hundreds of URLs.
💡

Step three: The script

There will be a few things you'll want to edit in the script below, but we're going to start with it now, and then get into what you'll need to edit.
If you're playing along at home, I'd suggest you copy it now, and then read on with it open in your Jupyter Notebook or other.
import os
import re
import requests
from bs4 import BeautifulSoup
import openai
from google.oauth2 import service_account
from googleapiclient.discovery import build
from urllib.parse import urlparse
import itertools
import wandb

# Set up your OpenAI API key
openai.api_key = 'sk-YOUR_API_KEY'

# Set up Google Sheets API credentials
SERVICE_ACCOUNT_FILE = 'C:/Users/USERNAME/YOUR_FILE.json' # Replace with your credentials path and file.
SCOPES = ['https://www.googleapis.com/auth/spreadsheets']

credentials = service_account.Credentials.from_service_account_file(
SERVICE_ACCOUNT_FILE, scopes=SCOPES)
service = build('sheets', 'v4', credentials=credentials)

# Initialize Weights & Biases
wandb_api_key = 'YOUR_WANDB_KEY'
wandb.login(key=wandb_api_key)

# Replace with your spreadsheet ID and sheet name
SPREADSHEET_ID = 'YOUR_SPREADSHEET_ID'
SHEET_NAME = 'TAB_NAME' # Replace with the actual name of your tab

# Initialize W&B run
wandb.init(project='new_title_descriptions', name='Title and Description Generation')

# Create a W&B Table
table = wandb.Table(columns=['Row Index', 'URL', 'Current Title', 'New Title', 'Current Description', 'New Description', 'Reason'])

# Read data from the Google Sheet
sheet_range = f'{SHEET_NAME}!A:F'
result = service.spreadsheets().values().get(
spreadsheetId=SPREADSHEET_ID,
range=sheet_range
).execute()
values = result.get('values', [])

if not values:
print('No data found.')
data_rows = []
else:
headers = values[0]
data_rows = values[1:] # Skip the header row

# Define batch size
batch_size = 10

# Helper function to split data into batches
def batch_data(data, batch_size):
"""Yield successive batch_size chunks from data."""
for i in range(0, len(data), batch_size):
yield data[i:i + batch_size]

# Group data_rows into batches
batches = list(batch_data(data_rows, batch_size))

for batch_num, batch in enumerate(batches, start=1):
print(f"Processing batch {batch_num}/{len(batches)}")
batch_updates = []
batch_wandb_data = []
for idx_in_batch, row in enumerate(batch):
row_idx = batch_size * (batch_num - 1) + idx_in_batch + 2 # +2 to account for header and zero indexing
url = row[0] if len(row) > 0 else ''
if not url:
continue # Skip if URL is empty

parsed_url = urlparse(url)
current_title = row[1] if len(row) > 1 else '' # Column B
current_description = row[4] if len(row) > 4 else '' # Column E

print(f"Processing URL at row {row_idx}: {url}")

# Scrape the content
try:
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
page_text = soup.get_text(separator=' ', strip=True)
except Exception as e:
print(f"Error fetching URL at row {row_idx}: {e}")
continue

# Summarize the content using GPT-4-mini
summary_prompt = f"Please summarize the following content in 400 words:\n\n{page_text}"
try:
summary_response = openai.chat.completions.create(
model='gpt-4o-mini',
messages=[
{"role": "system", "content": "You are a helpful assistant summarizing web page content for further processing."},
{"role": "user", "content": summary_prompt}
]
)
summary = summary_response.choices[0].message.content
except Exception as e:
print(f"Error summarizing content at row {row_idx}: {e}")
continue

# Generate new title and description using GPT-4o
title_description_prompt = f"""
Based on the following current title and description, generate a new title and description for the content.
Also, explain why the suggested title and description are better than the current ones.

Current Title: {current_title}
Current Description: {current_description}

Summary:
{summary}

URL: {url}

Obey the following rules:
- You write in sentence case.
- Use the URL itself to understand the content that may exist on a page.
- While we want sentence case, if a title or description has a colon, the word after should start capitalized..
- do not use the terms: deep dive, delve
- When referencing people in title tags, include their company and position whenever possible.

Provide the output in the following format:

Title: [Your New Title - maximum 65 characters in sentence case]
Description: [Your New Description - max 165 characters in sentence case]
Reason: [Explain why the suggested title and description are better than the current ones, focusing on improvements in clarity, relevance, or engagement.]
"""
try:
td_response = openai.chat.completions.create(
model='gpt-4o',
messages=[
{"role": "system", "content": "You are a helpful assistant for generating web page titles and descriptions. You write everything in sentence case."},
{"role": "user", "content": title_description_prompt}
]
)
td_content = td_response.choices[0].message.content
except Exception as e:
print(f"Error generating title and description at row {row_idx}: {e}")
continue

# Extract the title, description, and reason
title_match = re.search(r'Title:\s*(.*)', td_content)
description_match = re.search(r'Description:\s*(.*)', td_content)
reason_match = re.search(r'Reason:\s*(.*)', td_content)

new_title = title_match.group(1).strip() if title_match else ''
new_description = description_match.group(1).strip() if description_match else ''
reason = reason_match.group(1).strip() if reason_match else ''

# Append updates for Google Sheets
batch_updates.extend([
{
'range': f'{SHEET_NAME}!C{row_idx}',
'values': [[new_title]]
},
{
'range': f'{SHEET_NAME}!E{row_idx}',
'values': [[new_description]]
},
{
'range': f'{SHEET_NAME}!F{row_idx}',
'values': [[reason]]
}
])

# Append data for W&B Table
batch_wandb_data.append([
row_idx,
url,
current_title,
new_title,
current_description,
new_description,
reason
])

# Update Google Sheets with batch updates
if batch_updates:
body = {
'valueInputOption': 'RAW',
'data': batch_updates
}

try:
service.spreadsheets().values().batchUpdate(
spreadsheetId=SPREADSHEET_ID,
body=body
).execute()
print(f"Updated batch {batch_num}")
except Exception as e:
print(f"Error updating sheet at batch {batch_num}: {e}")
continue
else:
print(f"No updates to perform for batch {batch_num}")

# Add data to W&B Table
for data in batch_wandb_data:
table.add_data(*data)

# Log the table to W&B
wandb.log({'Processed Data': table})

# Finish the W&B run
wandb.finish()
So it's likely pretty obvious that where you see the bold characters, you fill that in with your own information:

  • sk-YOUR_API_KEY: Paste your OpenAI API key here. If it's a public computer, this is not recommended.
  • C:/Users/USERNAME/YOUR_FILE.json: You need to put the fill path and file name to your service account file here. For example, mine is C:/Users/dave_/Desktop/daves-awesome-articles-effc6fff54dc.json.
  • YOUR_WANDB_KEY: You'll put your API key here.
  • YOUR_SPREADSHEET_ID: You'll find it in the address bar, it's the long alphanumeric sequence after the /d/. For example, in the URL https://docs.google.com/spreadsheets/d/17B83eojSVdez9HI0BhMzwSrvmXiyBLUBkykIiT_4wAM/edit?gid=754411251#gid=754411251 it is 17B83eojSVdez9HI0BhMzwSrvmXiyBLUBkykIiT_4wAM.
  • TAB_NAME: This is simply the name of the tab. In the case of this example, mine will be "10" as that tab has 10 rows.
Once you've done that, it's just a matter of clicking the Run button.
As the system works, you'll see it processing URLs. It does this in batches of 10, so you don't have to wait until the end to start seeing the results.
Once the batch is done you'll see something like:

And over in the Sheet you'll see:

As you can see, we have new and unique titles and descriptions, with the reason provided by GPT-4o.
As noted above, I find it a bit clunky to work in Sheets for proofing, but if you click the link in the output:

You're taken to an in interactive table.
And if you want to send it to your boss with a write up, you can even add it to a report like:



In closing

I hope you take the time to tweak the prompt, and make this script do all that it can do for you.
And if you're interesting in learning how to get multiple attempts at the prompt logging into the same table for comparison, or have some amazing fun with Weave for more sophisticated generative AI applications, just let me know in the comments below.