Overview
Moderation API detects harmful or inappropriate content in text, helping you:- Content Filtering: Automatically filter inappropriate user submissions
- Safety Review: Detect potential violations before publishing
- Compliance Check: Ensure content meets platform guidelines
- Risk Warning: Identify potentially harmful content types
Quick Start
Basic Example
Batch Moderation
Detection Categories
| Category | Description |
|---|---|
hate | Hate speech targeting specific groups |
hate/threatening | Threatening hate speech |
harassment | Harassing content |
harassment/threatening | Threatening harassment |
self-harm | Self-harm related content |
self-harm/intent | Intent to self-harm |
self-harm/instructions | Self-harm instructions |
sexual | Sexual content |
sexual/minors | Sexual content involving minors |
violence | Violent content |
violence/graphic | Graphic violence |
Practical Examples
1. User Input Filter
2. Chatbot Safety Layer
3. Custom Threshold
Best Practices
Multi-layer Protection
Pricing
Moderation API is currently free to use and does not count towards token consumption.FAQ
How accurate is moderation? Based on OpenAI’s model with high accuracy, but not 100% reliable. Recommend:- Combine with human review for critical scenarios
- Set appropriate thresholds
- Provide appeal channels