Generating content outlines with prompt engineering, entities and GPT-4o
Generate article outlines for SEO with GPT-4o, Google's Programmable Search, Natural Language API and prompting.
Created on June 10|Last edited on November 3
Comment
Most of us create article outlines, either for ourselves or for copywriters we work with. I can't speak for everyone, but when I create them the process can take about 30 or 40 minutes. Between looking up the entities on the top ranking pages to the keywords and the questions being asked on each topic and then putting it all together into an outline that is likely to rank well—it can be onerous to say the least.
Let's fix that. Below, we'll be walking through a script you can use yourself as-is, to generate outlines by:
- Collecting the top 10 ranking URLs for a query
- Scraping them
- Pulling the entities off each page, ordered by how important they are and then selecting the top 10 entities off of all the pages collectively
- Pulling the top 5 questions answered on each pages and then selecting the top 5 in total that best represent them
- Creating an advanced prompt from this information and an article outline from it

Here's what we'll "bee" covering:
Getting set upAPIs and a Programmable Search EngineGoogle Programmable Search EngineGoogle API KeyGoogle service accountOpenAI APIWeights & Biases APIGenerating content outlines with GPT-4oWhat do you want to rank for?Loading the librariesSetting the APIsCreating the functionsCollect the dataEntities pulled from the contentQuestions pulled from the contentGenerating the article outline with prompt engineeringClick to toggle open text versionWhich appears as:Q&AOther applications
Let's jump in:
Getting set up
The first thing you'll need to do (if you haven't already) is get your machine set up. If you haven't installed Jupyter Notebook yet to run Python, it takes just a minute or two. I've written a separate tutorial on how to do that here.
So pop over and get it installed. We'll wait.
APIs and a Programmable Search Engine
To work with this script you'll just need the following—and don't worry, it'll take about 5 minutes to set these up:
- A Programmable Search Engine from Google (this is to collect the top 10 results)
- A Google API key
- A Google service account key for the Natural Language API.
- An OpenAI API key
- Weights & Biases API key
For the Google services, you will need to set up billing. If you haven't used it already, they'll offer you a giant credit. Even without it, you're looking at spending pennies on this type of transaction.
The OpenAI API also requires billing to be set up. Again, you'll be dealing with pennies and there's a good chance they've already got your card info on file from your pro subscription.
The Weights & Biases API key is free.
Google Programmable Search Engine
The purpose of the Google Programmable Search Engine is access the top 10 search results, without violating Google's guidelines. It's not exactly the same as Google search, but it's more than sufficient for our purposes: to collect attributes from ranking pages.
To set up your search engine, just follow the quick instructions at https://programmablesearchengine.google.com/controlpanel/create.
The settings I use are:

Once you have set it up however, you can define things like regions, sites to exclude, etc.
Google API Key
The API key is to get you access to the search engine. Again, it's easily set up and you can do so at https://support.google.com/googleapi/answer/6158862?hl=en.
When you're in the Console you'll simply click to Enable APIs:

And you'll want to enable the "Custom Search API" and "Cloud Natural Language API."
To set up the Cloud Natural Language API, you will be required to add your billing information. You won't need to worry about the cost unless you get crazy with it. I use it almost daily to create article outlines and a few other things and my bill is about a dollar or two each month.
You'll also need to set up your credentials:

Then, select API key from the drop-down:

And copy the key to a notepad doc. You'll want to delete it soon—this is just to have it handy in a few moments.
For good measure, I recommend to click on the API Key you just created, which will have an orange triangle beside it, noting it's unrestricted. You can click it and set the API to restricted, and give it access to just the Custom Search API to help safeguard against misuse.
💡
Google service account
And while you're at this screen, you can set up the service account.
Again, you'll click to Create Credentials, but instead of API key you'll click Service account.

You'll just need to give your project a name and select the role. As I'm the only one with access to my projects and machine, I just set it as owner. You may want to choose otherwise. You can find out more about the roles here.
Once you've created the service account, you need to create a key for it. If you're not automatically taken to do so, simply click on the service account you just created:

Click "Keys" in the top tabs, and then "Create new key".

And choose JSON

The key will automatically download to your machine.
OpenAI API
From there you simply need to sign up. You'll be given a few dollars in free credits, which will be more than enough for what we're doing here.
When you're given your API key you'll want to copy it too into the notepad doc notes above. You won't be saving it, but when you close the window with your API it's closed for good and you won't be able to see the key again. You can easily create a new one, but it's even easier to not have to.
Weights & Biases API
And the easiest API to set up is the one for Weights & Biases.
Just click here to sign up. You'll also find a link to sign up towards the top right, and at the bottom of this post.
Generating content outlines with GPT-4o
Time to dive in. You'll hopefully find that the setup was worth the few minutes spent.
You'll need to launch a Jupyter Notebook to get going. To do this, just open the Anaconda prompt and type jupyter notebook, and then Enter.

Click "New" to start a new notebook, and then "Python 3." This will put you on a screen that looks like:

You're ready to get going!
I like to organize my code into different cells based on their function (and the type of error I'm likely to find or create, to make it a little easier to troubleshoot).
You can then run the cells one-by-one.
When you run a cell, you be jumped to the next cell BUT it's very important you wait for the previous cell to finish running before you move on!
When you click "Run", if the cell needs time to complete running you will see an * to it's left. When it's done, the * will be replaced by a number (which number will depend on how many cells you have run.
💡
I'm going to put a heading and short description above each cell, to explain what's going on.
What do you want to rank for?
The first cell simply lets you enter what you want to rank for. This would be entered directly into the code, but I find this a bit cleaner and easier to use while avoiding errors.
# Define what you're trying to doquery = input ("What do you want to rank for :")print(query)
When run, it looks like:

Loading the libraries
Next we need to load the libraries we need. They are:
- os - Some operating system functions
- requests - Facilitates HTTP requests
- BeautifulSoup - For scraping web pages
- build - Creates the service object that performs the queries into the search engine
- language_v1 - Entity analysis
- service_account - Manages the service account credentials
- re - Facilitates working with regular expressions
- numpy - For math
- openai - Exactly what you think it is
- weave - For GenAI data logging
- wandb - For logging table-based data (and more, but we're just using it for tables)
!pip install google-api-python-clientimport osfrom getpass import getpass!pip install requestsimport requests!pip install beautifulsoup4from bs4 import BeautifulSoup!pip install google-cloud-languagefrom googleapiclient.discovery import buildfrom google.cloud import language_v1!pip install google-authfrom google.oauth2 import service_accountimport re!pip install numpyimport numpy as np!pip install openaifrom openai import OpenAI!pip install weave --upgradeimport weave!pip install wandbimport wandbwandb.login()
When run it will look something like:

Setting the APIs
Next, you need to set your APIs and the path to the service account for use by your function.
One of the lines of code you'll see is:
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'C:/Users/dave_/Desktop/daves-awesome-articles-effc6fff54dc.json'
The path:
C:/Users/dave_/Desktop/daves-awesome-articles-effc6fff54dc.json
...needs to be replaced with the path to the service account credentials you downloaded above.
Right click the file and pull up the properties:

The Location is the path to your file, and the filename showing is exactly what it seems.
Path: C:/Users/dave_/Desktop/
+
File: daves-awesome-articles-effc6fff54dc.json
=
C:/Users/dave_/Desktop/daves-awesome-articles-effc6fff54dc.json
That's the URL to put into the line below. The rest are simply the APIs you copied to your notepad doc above, or you're about to get in a moment.
#Instructions on getting keys, etc. at http://wandb.me/smx_advanced# Google API Key and credentialsgoogle_api = 'YOUR_GOOGLE_API_KEY'os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'C:/Users/dave_/Desktop/daves-awesome-articles-effc6fff54dc.json'#Google Custom Search APIgoogle_search_id = 'CUSTOM_SEARCH_ENGINE_ID'# Set up OpenAI API Key%env OPENAI_API_KEY=sk-OPENAI_API
Creating the functions
In the next cell we'll be creating the functions required to make everything work.
- google_search - Defining the custom search engine
- analyze_entities - Extract the entities from a block of text
- extract_questions - Extract question from a block of text using GPT-4o
- top_questions - Reduce all the questions from all the pages down to just 5 questions that are most important, again using GPT-4o
The code:
# Setup Google Search APIdef google_search(search_term, api_key, cse_id, **kwargs):service = build("customsearch", "v1", developerKey=api_key)res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()return res['items']def analyze_entities(text_content):client = language_v1.LanguageServiceClient()type_ = language_v1.Document.Type.PLAIN_TEXTdocument = {"content": text_content, "type_": type_}encoding_type = language_v1.EncodingType.UTF8response = client.analyze_entities(request = {'document': document, 'encoding_type': encoding_type})return responseclient = OpenAI()def extract_questions(text):# Prompt GPT-4 to generate questions from the textprompt = "Extract questions from the following text:\n" + text + "Related to the query: " + query + "\n Questions:"response = client.chat.completions.create(messages = [{"role": "system", "content": "You are a highly skilled writer, who wants to maximize the impact of their work by answering common questions on a topic in articles you write."},{"role": "user", "content": prompt}],model="gpt-4o",max_tokens=1000,temperature=0.1,n=1 # You can adjust this parameter based on the desired number of questions generated)# Extract questions from GPT-4's responsequestions = [choice.message.content.strip() for choice in response.choices]print(questions)return questionsdef top_questions(text):# Prompt GPT-4 to generate the top questions from the textprompt = "Across all of these questions, what 5 questions from this list or generated new, would answer the questions you think are the most important to the topic and would be most useful for the user who entered the keyword."response = client.chat.completions.create(messages = [{"role": "system", "content": "You're an SEO and content strategies and you're analyzing all the questions answered in the top 10 search pages for the keyword:" + query + ". The questions you come up with are: /n" + str(all_questions)},{"role": "user", "content": prompt}],model="gpt-4o",max_tokens=1000,temperature=0.1,n=1 # You can adjust this parameter based on the desired number of questions generated)# Extract questions from GPT-4's responsequestions = [choice.message.content.strip() for choice in response.choices]print(questions)return questions
Nothing appears when run.
Collect the data
In this cell, we pull the URLs of the top 10 pages for the query we entered at the top, scrape them, extract the entities and salience score from that content, and fetch the top questions as well.
The salience score is a score between 0 and 1 that notes the importance on the entity to the document overall.
💡
We further reduce all the entities down to the top 10 across all the scraped pages. We do this by summing the salience scores across all the ranking pages.
Entities that appear on more than one page will benefit from having their salience scores summed and thus, entities that appear on more pages (and are therefore more likely to be relevant to the query as a whole) will generally be in the top 10 used in our article.
The code for all of this is:
# Initialize Weights & Biaseswandb.init(project="smx-article-outline-demo")# Create W&B Tables to store questions and entitiesquestions_table = wandb.Table(columns=["URL", "Question"])entities_table = wandb.Table(columns=["URL", "Entity", "Salience"])# Search and scrape top 10 pagessearch_results = google_search(query, google_api, google_search_id, num=10)entity_sums = {}all_questions = []for result in search_results:page = requests.get(result['link'])if page.status_code == 403:print(f"Access to {result['link']} was denied with a 403 Forbidden error.")continuesoup = BeautifulSoup(page.content, 'html.parser')page_text = soup.get_text()entities_response = analyze_entities(page_text)# Processing entities and their salience scoresfor entity in entities_response.entities:entity_name = entity.namesalience = entity.salienceif entity_name == '403 Forbidden':continue # Skip this entityif entity_name in entity_sums:entity_sums[entity_name] += salienceelse:entity_sums[entity_name] = salienceentities_table.add_data(result['link'], entity_name, salience)# Extract and process questions from the current pagequestions = extract_questions(page_text)all_questions.extend(questions)# Log questions to W&B Tablesfor question in questions:questions_table.add_data(result['link'], question)# Getting the top 10 entities based on entity_sumtop_entities = sorted(entity_sums.items(), key=lambda item: item[1], reverse=True)[:10]# Getting the top 5 questionstop_questions_list = top_questions(all_questions)# Log top questions and entities to W&Bwandb.log({"questions_table": questions_table,"entities_table": entities_table,})
The output will look something like:

We begin with links to wandb.ai and then move into the questions drawn from the page.
You can click the links to view information about the run you just completed:
You'll see something that looks like:
Entities pulled from the content
Run set
2
Questions pulled from the content
Run set
57
Generating the article outline with prompt engineering
And in the "last" cell, we generate a prompt from all the information we've gathered. Because we're going through the API we'll be defining the what the role of the chat assistant will be and the user prompt.
For this example I've set the assistant prompt to:
You are a highly skilled writer, and you want to produce an outline for an article that will appeal to users and rank well.
All we've done here is define GPT-4o's behavior and persona and given it a high level goal (produce an article outline that will appeal to users and rank well).
For the user, I've set the prompt to:
The following entities appear to be relevant to ranking in the top 10 and should be worked into the page: /n" + entities_str + ". Try to ensure the outline will make it easy to work these into the article prominently and explain how this might be done in comments. Additionally, the following questions appear to be important to answer in the article: /n" + questions_str + "/n. Try to ensure that it will be easy to answer these questions in the article, and again, explain how you would recommend to do this in a way that will seem useful to the user. The article outline should provide begin by explaining /n - all of the core concepts required to understand the topic, and/n - also include a tutorial to accomplish the task defined by the keyword " + query + " if possible./n After you have provided the outline, explain clearly how this article outline could be used to create an article that will rank well using best-practice SEO strategies as well as be helpful to users. You will be judged based on how well the article ranks, as well as how engaging the article is to readers and provide the metrics you would suggest be used to judge whether you are successful.
It could easily be extended to include some well-formatted example, but this works for the article.
We are telling the assistant to write an article that will rank well and appeal to users. As the user we're then adding:
- The list of top entities, and informing the assistant that the outline would be easy to work the entities into. We've also asked that the assistant explain how the entities could be worked in.
- We've done the same with the top questions.
- We've provided a list of how the article should be structured. I've just included two items (cover the core concepts, and include a tutorial if possible) but you could easily include more.
- We've reminded it what the query is that we're writing to rank for.
- We've requested that the assistant explain to us how the outline will accomplish our goal.
- We've told the assistant how it will be judged, and I've further asked the assistant to tell me what metrics I would use to do this, to ensure that the assistant and I are on the same page.
The full code block looks like:
# Generate an article outline@weave.op()def generate_outline(top_entities, top_questions, query):entities_str = ', '.join([entity[0] for entity in top_entities])questions_str = ', '.join(top_questions)response = client.chat.completions.create(messages = [{"role": "assistant", "content": "You are a highly skilled writer, and you want to produce an outline for an article that will appeal to users and rank well."},{"role": "user", "content": "The following entities appear to be relevant to ranking in the top 10 and should be worked into the page: /n" + entities_str + ". Try to ensure the outline will make it easy to work these into the article prominently and explain how this might be done in comments. Additionally, the following questions appear to be important to answer in the article: /n" + questions_str + "/n.Try to ensure that it will be easy to answer these questions in the article, and again, explain how you would recommend to do this in a way that will seem useful to the user. The article outline should provide begin by explaining /n - all of the core concepts required to understand the topic, and/n - also include a tutorial to accomplish the task defined by the keyword " + query + " if possible./nAfter you have provided the outline, explain clearly how this article outline could be used to create an article that will rank well using best-practice SEO strategies as well as be helpful to users. You will be judged based on how well the article ranks, as well as how engaging the article is to readers and provide the metrics you would suggest be used to judge whether you are successful."}],model="gpt-4o",max_tokens=4000,n=1, #How many results to produce per promptstop=None,temperature=0.2 #A number between 0 and 1, where higher numbers add randomness)return response.choices[0].message.content.strip()weave.init('smx-article-outline-demo')# Generate the outlineoutline = generate_outline(top_entities, top_questions_list, query)print(outline)# Log the outline to Weights & Biaseswandb.log({"article_outline": outline})
Which gives us the output:
Click to toggle open text version
Which appears as:

You can see that we get an outline, explanations of how to optimize it, as well as the metrics the assistant believes should be used.
That's unexpected: this prompt was not 100% successful, in that I wanted the assistant to explain how the outline would help to rank and serve users, not to get instructions on it.
This is a great example of why having the assist outline their understanding is valuable. If they made this error, I need to revisit the prompt.
💡
You'll also notice links to Weave dashboard in the above output.
We can use this to explore the conditions that led to the outlines we've been generating. As we explore different entities, queries, temperatures, APIs and even different models (Gemini perhaps?) this data can be invaluable.
Q&A
I also like to add one final block:
# Define what you're trying to douserquestion = input ("Based on the output above, do you have any questions?")def generate_outline(userquestion,outline):response = client.chat.completions.create(messages = [{"role": "assistant", "content": "You are a helpful agent, responding to a question about the outline you created above: /n" + outline + "/n Be brief and clearly answer the question."},{"role": "user", "content": userquestion}],model="gpt-4o",max_tokens=500,n=1, #How many results to produce per promptstop=None,temperature=0.2 #A number between 0 and 1, where higher numbers add randomness)return response.choices[0].message.content.strip()userreply = generate_outline(userquestion,outline)print(userreply)
If you add this at the end and run it, you'll have an opportunity to ask questions about the output. Like:

Other applications
My hope is that you won't stop at just using this script as is.
With adjustments and different APIs, you can use this for social, email or connect to your analytics or Google Ads to create assets inspired by what's working for you, your competitors, or just companies you know are investing heavily and have likely done some great conversion optimization.
Or just those who are ranking well.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.