Skip to main content

OpenAI Announces GPT-4

Today, OpenAI has officially announced GPT-4, the long-awaited fourth generation of GPT natural language processing models.
Created on March 14|Last edited on March 14
Today, OpenAI has officially announced GPT-4, the long-awaited fourth generation of GPT natural language processing models. GPT-4 builds on the research that went into the creation of ChatGPT with GPT-3.5. Something that sets GPT-4 apart from its previous generations (other than its improved NLP benchmarks) is a new multimodal capability for image interpretation.
Alongside this, OpenAI has open-sourced OpenAI Evals, a benchmarking framework designed to help guide the development of models like GPT-4.

Also be sure to check out the GPT-4 Developer Livestream that was streamed today, which shows off some of GPT-4's capabilities in a fun and informative session. Watch them go from creating a Discord bot all the way to writing a poem about taxes.


If you want all the raw details on GPT-4, be sure to look at the technical paper.

Using images with GPT-4 & other new capabilities

The standout feature that GPT-4 brings to the table is its new multimodal capabilities. Unlike the previous generations, like GPT-3 or ChatGPT, GPT-4 can work with user-provided images and interpret them however you ask it to.
Some examples provided in the GPT-4 announcement blog post are providing memes and asking what's funny about them, summarizing data from graphs, and even working through a complex word problem by reading text and diagrams from the image.
In the GPT-4 developer livestream, GPT-4 was even able to provide code for a fully functional web page based entirely on a rough hand-drawn mockup image.


Something else GPT-4 comes with is a huge 32k token context (compared to a typical 8k context afforded by other models), meaning you can copy-paste huge documents into a conversation with GPT-4. In the livestream, this was shown off first by copy-pasting an entire code documentation page for a Python Discord bot, and later shown when asking a question about legal documents for taxes.

Steering GPT-4 for personalized outputs

If you've played with ChatGPT, you're probably familiar with its rather stoic and agreeable "personality". Something coming with the release of GPT-4 will let developers steer their implementation of GPT-4's personality in any way they want.
This is thanks to a "system" message which will let you define how the instance of GPT-4 will act, whether you want it to be an AI assistant that only responds in formatted JSON code or acts like a Shakespearean pirate, it's a natural language prompt so anything's possible (as long as it follows OpenAI's usage terms, of course).


GPT-4 hits new benchmarks

GPT-4 of course hits new highs for the GPT model line, showing significant improvements beyond its previous generation's performance as well as standing atop external models' performance in many traditional benchmarks and even exams built for humans.
This image, lifted from the blog post, compares GPT-4 (with and without vision capabilities) to GPT-3.5 on a number of exams built for humans. The most drastic of improvements is on the Uniform Bar Exam, where GPT's performance skyrocketed from the bottom 10 to the top 10 estimated percentile.

Among traditional benchmarks, GPT-4 greatly outpaces GPT-3.5.

Additionally, they tested the non-English capabilities of GPT-4 using the MMLU benchmark translated into many languages using Azure Translate, where GPT-4 shows stronger accuracy compared to GTP-3.5's English language accuracy on not only English tasks, but on over 20 other languages as well.

On GPT-4's multimodal capabilities, it was tested against other models built specifically for image interpretation in a natural language context on a handful of benchmarks. GPT-4 showed comparable, if not often better, performance against the SOTA models.


Where GPT-4 still struggles

A classic problem with machine learning is hallucination, where a model feels confident in a statement or other form of output, even though it may be non-factual or incorrect. GPT-4 has not solved hallucination, and in fact, when looking at the base models, doesn't perform much better than GPT-3.5.
Though, with a little post-training RLHF (Reinforcement Learning from Human Feedback), its result improves significantly compared to when RLHF is performed on other models.
And again, like every machine learning model out there, GPT-4 has biases that it leans into. Though, as mentioned in the blog post, they are aiming to make it so the default behaviors of GPT-4 and their other models reflect the expectations of their wider audience, with room for customization and feedback.
OpenAI is working with domain experts to mitigate risks like harmful advice or buggy code. One action being taken is a new reward signal in the RLHF process for safety, where the model can be guided away from replying with dangerous information, but still, be able to engage with topics rather than necessarily completely blocking the topic.
A popular pass time for ChatGPT users is what the blog refers to as "jailbreaking", or guiding the model in such a way that it bypasses its learned restrictions, opening up for a more unpredictable, rules-free (to an extent) conversation. GPT-4 is still susceptible to this activity, but OpenAI is working to make it more difficult to misuse the model in this way.

Get access to GPT-4

API access for GPT-4 is currently waitlisted, so if you're interested, sign up on this page. Though be aware, only the natural language capabilities are available right now, as multimodal image understanding is still in alpha and will be accessible in the future.
Access to GPT-4 can also be granted earlier to those who contribute high-quality evals to the now open-sourced OpenAI Eval benchmarking framework. Head to the GitHub repository for more information.
Alternatively, if you subscribe to ChatGPT Plus, you have the privilege of chatting with GPT-4 in the ChatGPT interface.

Find out more

Iterate on AI agents and models faster. Try Weights & Biases today.