Translation API Checklist: 15 Things to Evaluate Before You Integrate
A practical evaluation checklist for translation APIs: language coverage, pricing models, rate limits, latency, caching, glossary support, format handling, and more.
Picking a translation API is one of those decisions that's easy to make and painful to change. You integrate it, build caching layers around it, tune your pipeline to its quirks, and then six months later discover it doesn't support a language you need or charges 5x more than you budgeted.
Here are 15 things to evaluate before you commit. I've organized them roughly by how often teams overlook them — the most commonly missed items first.
1. Actual language pair quality, not just language count
Every API advertises "100+ languages supported." That number is meaningless. What matters is quality for your specific language pairs.
"Supports Japanese" could mean anything from "produces native-quality output" to "produces grammatically correct but awkward text" to "produces mostly comprehensible output with frequent errors."
How to evaluate: Take 50 representative strings from your product. Translate them into your target languages. Have a native speaker rate them on a 1-5 scale. Do this for every API you're considering. The results will surprise you — the API with the most languages often isn't the best for any specific pair.
2. Pricing model and hidden costs
Translation APIs use different pricing units:
- Per character (Google, Amazon, Microsoft) — straightforward but CJK text is denser, so per-character pricing costs more per "meaning unit" for Asian languages
- Per token (OpenAI, Anthropic) — similar to per-character but tokenizer-dependent
- Per word (some specialized APIs) — friendlier for European languages, confusing for CJK
- Flat monthly fee (some enterprise plans) — predictable but may include volume caps
- Language detection (separate charge at some providers)
- Custom model training / glossary management
- Support tiers (basic support is free, priority support is $$$)
- Minimum monthly commitments
- Overage charges beyond plan limits
3. Rate limits and burst handling
Most APIs publish their rate limits. Few make it clear how they handle bursts.
Questions to ask:
- Is the limit per-second, per-minute, or per-hour?
- What happens when you exceed it — 429 error, queuing, or silent throttling?
- Is there a
Retry-Afterheader in rate limit responses? - Can you get higher limits on request? How long does approval take?
- Are limits per API key or per account?
4. Batch API support
Sending strings one by one is inefficient. Does the API support batch requests?
// What you want
{
"texts": ["Hello", "Goodbye", "Save", "Cancel", "Submit"],
"targetLanguage": "de"
}
// vs. what some APIs force you to do
// 5 separate HTTP requests
Also check: what's the maximum batch size? Some APIs cap at 25 items, others at 100, some at 5,000. If your batches are larger than the cap, you need client-side chunking logic.
5. Latency (p50 and p99, not just average)
Average latency doesn't tell the full story. A 200ms average with a p99 of 3 seconds means 1 in 100 requests takes 15x longer than expected.
Test latency for:
- Short strings (< 20 characters): UI buttons, labels
- Medium strings (20-200 characters): notifications, error messages
- Long strings (200-2000 characters): paragraphs, descriptions
- Batch requests of 50 items
6. Format preservation
Does the API handle markup in the source text?
Test with:
Click <strong>Save</strong> to apply your changes.
Enter your {email} and {password} to log in.
You have <a href="/notifications">{{count}} notifications</a>.
Bad APIs translate the HTML tag names, mangle the template variables, or drop the markup entirely. Good APIs preserve it. Great APIs understand that {email} is a placeholder and don't translate it without needing special instructions.
7. Glossary / terminology management
Can you enforce consistent terminology? If "workspace" must always be "Arbeitsbereich" in German, can you define that rule?
Some APIs offer:
- Server-side glossaries — upload a terminology list, the API enforces it automatically
- Per-request context — send context/instructions with each request
- No glossary support — you're on your own
8. Formality control
Can you specify formal vs informal register?
This matters for German (du/Sie), French (tu/vous), Japanese (multiple levels), Korean, and many other languages. If the API doesn't support it, you get an inconsistent mix that reads poorly.
DeepL offers explicit formality settings. LLM-based APIs accept it as a prompt parameter. Traditional NMT APIs mostly don't support it at all.
9. Context support
Can you provide context alongside the text to improve disambiguation?
As discussed in detail elsewhere, "bank" translates differently in financial vs geographical contexts. APIs that accept context fields produce significantly better output for ambiguous strings.
{
"text": "Post",
"context": "Button label on social media feed to publish a new post",
"targetLanguage": "de"
}
auto18n and LLM-based services support this. Most traditional NMT APIs don't.
10. Supported file formats
If you're working with .PO, .XLIFF, .ARB, .JSON, .YAML, or .STRINGS files, check whether the API can ingest them directly or if you need to extract strings, translate, and reassemble yourself.
Some APIs accept file uploads and return translated files with structure preserved. Others only accept plain text and leave the file handling to you.
11. Caching behavior
Does the API cache translations server-side? If you send the same string twice, are you charged twice?
- Server-side caching: You pay once, subsequent identical requests are free or discounted
- No caching: Every request is billed at full price, even duplicates
- Translation Memory: The API stores your previous translations and reuses them for similar (not just identical) content
12. Quality consistency across updates
When the API provider updates their model, does translation quality change? This seems minor but it causes real problems:
- Your cached translations no longer match new translations of the same content
- A/B tests comparing old and new translations become meaningless
- Regression: some language pairs might get worse after a model update
13. Data privacy and retention
Where does your data go?
- Is the translated text used to train models? (Google and Amazon have opt-out options; check the defaults)
- Is data retained after the API call? For how long?
- Is there a data processing agreement (DPA) available?
- Which regions process the data? (Relevant for GDPR, data residency requirements)
14. Error handling and status reporting
When things go wrong, does the API give you useful information?
Test scenarios:
- Send an unsupported language code — do you get a clear error or a generic 400?
- Send extremely long text — is there a documented max length?
- Send malformed markup — does it fail gracefully or produce corrupted output?
- Hit the rate limit — is the Retry-After header present?
15. Migration path
If you need to switch providers later, how painful is it?
Considerations:
- Are you using provider-specific features (custom models, glossary formats) that don't port?
- Is your code tightly coupled to the provider's SDK, or do you have an abstraction layer?
- Can you export your translation memory / glossaries?
interface TranslationProvider {
translate(text: string, options: TranslateOptions): Promise<string>;
translateBatch(texts: string[], options: TranslateOptions): Promise<string[]>;
}
Then swapping providers is a single implementation change, not a codebase-wide refactor.
The evaluation spreadsheet
Score each API on a 1-5 scale for each criterion. Weight by what matters to your use case. A real-time chat app cares more about latency than a batch documentation translator. A healthcare app cares more about data privacy than a gaming company.
| Criterion | Weight (your use case) | API A | API B | API C | | --------------------- | ---------------------- | ----- | ----- | ----- | | Language pair quality | | | | | | Pricing | | | | | | Rate limits | | | | | | Batch support | | | | | | Latency | | | | | | Format preservation | | | | | | Glossary support | | | | | | Formality control | | | | | | Context support | | | | | | File format support | | | | | | Caching | | | | | | Quality consistency | | | | | | Data privacy | | | | | | Error handling | | | | | | Migration path | | | | |
Fill this out with real test data, not marketing claims. The 2-3 hours you spend evaluating will save you months of pain after integration.