Overview
LaoZhang API provides powerful image and video understanding capabilities through multiple advanced AI models. Using the unified OpenAI API format, you can easily implement image recognition, scene description, OCR text extraction, video content analysis and more. 🔍 Intelligent Visual Analysis Support for object recognition, scene understanding, text extraction, sentiment analysis, video content understanding and various other vision tasks - let AI truly “see” and understand images and videos.🌟 Key Features
- 🎯 Multiple Models: GPT-5, Gemini 2.5 Pro/Flash and other top vision models
- 📸 Flexible Input: Support for URL links and Base64-encoded images and videos
- 🎬 Video Understanding: Gemini series supports video content analysis (up to several minutes)
- ⚡ Fast Response: High-performance inference with sub-second results
- 💰 Cost Control: Multiple model options to fit different budgets
- 🔄 OpenAI Compatible: Drop-in replacement for OpenAI vision APIs
📋 Supported Vision Models
| Model Name | Model ID | Image | Video | Features |
|---|---|---|---|---|
| GPT-5 ⭐ | gpt-5 | ✅ | ❌ | Latest model, highly detailed image recognition |
| Gemini 2.5 Pro ⭐ | gemini-2.5-pro | ✅ | ✅ | Ultra-long context, supports video analysis |
| Gemini 2.5 Flash ⭐ | gemini-2.5-flash | ✅ | ✅ | Extremely fast, best value, supports video |
| GPT-4.1 Mini | gpt-4.1-mini | ✅ | ❌ | Lightweight, fast, low cost |
| Claude 3.5 Sonnet | claude-3-5-sonnet | ✅ | ❌ | Deep understanding, accurate descriptions |
🚀 Quick Start
1. Basic Example - Image URL
2. Local Image Example - Base64 Encoding
3. Advanced Example - Multi-Image Comparison
🎬 Video Content Analysis
Supported Video Models
Currently only Gemini series models support video analysis:gemini-2.5-pro- Detailed and accurate, recommended for complex video analysisgemini-2.5-flash- Fast and cost-effective, suitable for batch processing
1. Basic Video Analysis - URL Method
2. Using OpenAI SDK
3. cURL Command Example
4. Video + Image Mixed Analysis
Gemini supports analyzing videos and images in the same request:5. Local Video Analysis - Base64 Encoding
For local video files, you can use Base64 encoding to upload:- MP4:
data:video/mp4;base64,... - WebM:
data:video/webm;base64,... - MOV:
data:video/quicktime;base64,... - AVI:
data:video/x-msvideo;base64,...
Video Analysis Best Practices
- File Size: Recommend max 20 MB per video - larger files may take longer to process
- Video Formats: Supports MP4, WebM, MOV, AVI and other mainstream formats
- Video Duration: Short videos (< 5 minutes) work best - consider splitting longer videos
- Resolution: Higher resolution videos provide better recognition but take longer to process
- Prompt Optimization: Be specific about what to analyze (e.g., “analyze character movements”, “extract dialogue”)
Video Analysis Use Cases
- 📹 Content Moderation: Automatically identify inappropriate content in videos
- 🎓 Educational Video Analysis: Extract key points and subtitles
- 🛡️ Surveillance Video Understanding: Detect abnormal behaviors and identify events
- 🎬 Ad Creative Analysis: Evaluate creative elements and emotional impact
- 📊 Sports Analysis: Recognize athlete movements and key moments in games
- Video processing typically takes longer than images (depending on video length and complexity)
- Use
max_tokensparameter to limit output length and avoid excessive costs - Be mindful of data security when processing privacy-sensitive video content
🎯 Common Use Cases
1. Product Recognition & Analysis
2. Document OCR Recognition
3. Medical Imaging Assistance
4. Security Monitoring Scene Analysis
💡 Best Practices
Image Preprocessing Guidelines
- Format Support: JPEG, PNG, GIF, WebP and other mainstream formats
- Size Limit: Recommend max 20MB per image
- Resolution: Higher resolution images get better recognition results
- Compression: Moderate compression improves transmission speed
Prompt Optimization
Error Handling
🔧 Advanced Features
1. Streaming Output
For long analyses, use streaming for better user experience:2. Multi-turn Dialogue
Maintain context for in-depth analysis:3. Function Calling Integration
📊 Performance Comparison
| Model | Image Support | Video Support | Response Speed | Recognition Accuracy | Price |
|---|---|---|---|---|---|
| GPT-5 | ✅ | ❌ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | $$$ |
| Gemini 2.5 Pro | ✅ | ✅ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | $$ |
| Gemini 2.5 Flash | ✅ | ✅ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | $ |
| GPT-4.1 Mini | ✅ | ❌ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | $ |
🚨 Important Notes
- Privacy Protection: Don’t upload images or videos containing sensitive information
- Compliance: Follow relevant laws and regulations, avoid illegal uses
- Result Verification: AI analysis results are for reference only - verify for critical decisions
- Cost Control: Choose appropriate models to avoid unnecessary expenses
- Video Limitations: Video analysis is only supported by Gemini series, other models don’t support it yet
🔗 Related Resources
- Chat Completions API - Learn more about the Chat API
- Pricing Information - View model pricing details