Handling Translation API Rate Limits
Practical patterns for dealing with translation API rate limits: exponential backoff, token buckets, circuit breakers, request queuing, and fallback chains.
You've built a translation pipeline, it works great in development, and then you deploy it and immediately start getting 429 errors. Every translation API has rate limits, and most are more restrictive than you'd expect. Here's how to handle them properly.
Know your limits
Common rate limits for translation APIs (as of 2026):
| API | Requests/second | Characters/second | Daily limit | | ------------------------ | ------------------ | ----------------- | --------------- | | Google Cloud Translation | 600 req/s | 600K chars/s | Varies by quota | | Amazon Translate | 40 req/s (default) | 40K chars/s | Configurable | | DeepL API Pro | 50 req/s | — | Varies by plan | | Microsoft Translator | 100 req/s | 50K chars/s | 2M chars/hour | | OpenAI (for translation) | Varies by tier | Token-based | Token-based |
These are default limits. Many services let you request increases, but the approval process takes days to weeks.
The gotcha: even if your average request rate is well under the limit, bursts can trigger throttling. A deployment that translates 10,000 strings at once will blow past any per-second limit.
Exponential backoff: the minimum viable approach
When you get a 429 (Too Many Requests) or 503 (Service Unavailable), retry with increasing delays: