Skip to main content

OpenAI Releases New & Improved GPT-Based Moderation Tools

OpenAI's new free-to-use Moderation API uses a GPT-based model to ensure that harmful content created by humans or generated by AI is blocked and flagged for review.
Created on August 11|Last edited on August 11
In the aim of ensuring safe and productive use of their API and AI models, OpenAI has developed a new Moderation API which interprets user input and flags whether it is considered harmful or not. This new Moderation API is available to all developers using the OpenAI API, and while it's currently intended for use with other OpenAI API endpoints, it is also available for use with non-API traffic in beta with some restriction.

This release is a little more involved than just a simple policy update; Since this is a brand new endpoint in the API intended for direct use by developers, there's a guide and documentation to go along with it's use, as well as a paper published on the creation of the new model which drives it and the dataset used to evaluate it. The new API is free to use for developers working within the OpenAI API.

How does the new Moderation API work?

Behind the scenes, the Moderation endpoint has access to a GPT-based model which was trained to assess input text for potentially harmful content. When used with other parts of the OpenAI API, user-provided input text is first passed through a Moderation API call to block misuse of OpenAI's models.

In many applications of the OpenAI API, use of the Moderation endpoint is required. Use of the Moderation endpoint in applications within the OpenAI API is at no cost to the developer, and use for external applications such as content moderation on another platform is in beta and does come with fees.
The Moderation API is useful in identifying problematic language created by both humans and AI. It can be used for blocking harmful inputs provided by malicious human interactors and at the same time block any harmful text content which a model might generate on it's own.
A paper describing the creation of the GPT-based model was released, as well as the dataset which was used to evaluate it, if you're interested on the fine details of the model driving the Moderation API.

Find out more

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.