2026-04-14

Translation API Checklist: 15 Things to Evaluate Before You Integrate

A practical evaluation checklist for translation APIs: language coverage, pricing models, rate limits, latency, caching, glossary support, format handling, and more.

Picking a translation API is one of those decisions that's easy to make and painful to change. You integrate it, build caching layers around it, tune your pipeline to its quirks, and then six months later discover it doesn't support a language you need or charges 5x more than you budgeted.

Here are 15 things to evaluate before you commit. I've organized them roughly by how often teams overlook them — the most commonly missed items first.

1. Actual language pair quality, not just language count

Every API advertises "100+ languages supported." That number is meaningless. What matters is quality for your specific language pairs.

"Supports Japanese" could mean anything from "produces native-quality output" to "produces grammatically correct but awkward text" to "produces mostly comprehensible output with frequent errors."

How to evaluate: Take 50 representative strings from your product. Translate them into your target languages. Have a native speaker rate them on a 1-5 scale. Do this for every API you're considering. The results will surprise you — the API with the most languages often isn't the best for any specific pair.

2. Pricing model and hidden costs

Translation APIs use different pricing units:

Per character (Google, Amazon, Microsoft) — straightforward but CJK text is denser, so per-character pricing costs more per "meaning unit" for Asian languages
Per token (OpenAI, Anthropic) — similar to per-character but tokenizer-dependent
Per word (some specialized APIs) — friendlier for European languages, confusing for CJK
Flat monthly fee (some enterprise plans) — predictable but may include volume caps

Hidden costs to ask about:

Language detection (separate charge at some providers)
Custom model training / glossary management
Support tiers (basic support is free, priority support is $$$)
Minimum monthly commitments
Overage charges beyond plan limits

3. Rate limits and burst handling

Most APIs publish their rate limits. Few make it clear how they handle bursts.

Questions to ask:

Is the limit per-second, per-minute, or per-hour?
What happens when you exceed it — 429 error, queuing, or silent throttling?
Is there a Retry-After header in rate limit responses?
Can you get higher limits on request? How long does approval take?
Are limits per API key or per account?

A 50 req/s limit sounds generous until you need to translate 10,000 strings for a new language launch. At 50 req/s with batching, that's still 3-4 minutes of sustained requests — and some APIs will throttle you well before the stated limit during sustained bursts.

4. Batch API support

Sending strings one by one is inefficient. Does the API support batch requests?

// What you want { "texts": ["Hello", "Goodbye", "Save", "Cancel", "Submit"], "targetLanguage": "de" }

// vs. what some APIs force you to do // 5 separate HTTP requests

Also check: what's the maximum batch size? Some APIs cap at 25 items, others at 100, some at 5,000. If your batches are larger than the cap, you need client-side chunking logic.

5. Latency (p50 and p99, not just average)

Average latency doesn't tell the full story. A 200ms average with a p99 of 3 seconds means 1 in 100 requests takes 15x longer than expected.

Test latency for:

Short strings (< 20 characters): UI buttons, labels
Medium strings (20-200 characters): notifications, error messages
Long strings (200-2000 characters): paragraphs, descriptions
Batch requests of 50 items

From different regions. If your servers are in us-east-1 and the translation API's nearest endpoint is in Europe, you're adding 100ms+ of network latency on every call.

6. Format preservation

Does the API handle markup in the source text?

Test with:

Click <strong>Save</strong> to apply your changes.
Enter your {email} and {password} to log in.
You have <a href="/notifications">{{count}} notifications</a>.

Bad APIs translate the HTML tag names, mangle the template variables, or drop the markup entirely. Good APIs preserve it. Great APIs understand that {email} is a placeholder and don't translate it without needing special instructions.

7. Glossary / terminology management

Can you enforce consistent terminology? If "workspace" must always be "Arbeitsbereich" in German, can you define that rule?

Some APIs offer:

Server-side glossaries — upload a terminology list, the API enforces it automatically
Per-request context — send context/instructions with each request
No glossary support — you're on your own

Server-side glossaries are the most reliable. Per-request context works for LLM-based APIs but adds prompt tokens to every call. No support means you need post-processing to enforce terminology.

8. Formality control

Can you specify formal vs informal register?

This matters for German (du/Sie), French (tu/vous), Japanese (multiple levels), Korean, and many other languages. If the API doesn't support it, you get an inconsistent mix that reads poorly.

DeepL offers explicit formality settings. LLM-based APIs accept it as a prompt parameter. Traditional NMT APIs mostly don't support it at all.

9. Context support

Can you provide context alongside the text to improve disambiguation?

As discussed in detail elsewhere, "bank" translates differently in financial vs geographical contexts. APIs that accept context fields produce significantly better output for ambiguous strings.

{
  "text": "Post",
  "context": "Button label on social media feed to publish a new post",
  "targetLanguage": "de"
}

auto18n and LLM-based services support this. Most traditional NMT APIs don't.

10. Supported file formats

If you're working with .PO, .XLIFF, .ARB, .JSON, .YAML, or .STRINGS files, check whether the API can ingest them directly or if you need to extract strings, translate, and reassemble yourself.

Some APIs accept file uploads and return translated files with structure preserved. Others only accept plain text and leave the file handling to you.

11. Caching behavior

Does the API cache translations server-side? If you send the same string twice, are you charged twice?

Server-side caching: You pay once, subsequent identical requests are free or discounted
No caching: Every request is billed at full price, even duplicates
Translation Memory: The API stores your previous translations and reuses them for similar (not just identical) content

If the API doesn't cache, you need to build your own cache layer. This isn't hard but it's engineering time you could spend elsewhere.

12. Quality consistency across updates

When the API provider updates their model, does translation quality change? This seems minor but it causes real problems:

Your cached translations no longer match new translations of the same content
A/B tests comparing old and new translations become meaningless
Regression: some language pairs might get worse after a model update

Ask: Does the provider offer versioned models? Can you pin to a specific version? Do they notify customers before model changes?

13. Data privacy and retention

Where does your data go?

Is the translated text used to train models? (Google and Amazon have opt-out options; check the defaults)
Is data retained after the API call? For how long?
Is there a data processing agreement (DPA) available?
Which regions process the data? (Relevant for GDPR, data residency requirements)

If you're translating user data, PII, or confidential content, this isn't just a checkbox — it's a compliance requirement.

14. Error handling and status reporting

When things go wrong, does the API give you useful information?

Test scenarios:

Send an unsupported language code — do you get a clear error or a generic 400?
Send extremely long text — is there a documented max length?
Send malformed markup — does it fail gracefully or produce corrupted output?
Hit the rate limit — is the Retry-After header present?

Also check: is there a status page? An API health endpoint? Do they send notifications for planned maintenance?

15. Migration path

If you need to switch providers later, how painful is it?

Considerations:

Are you using provider-specific features (custom models, glossary formats) that don't port?
Is your code tightly coupled to the provider's SDK, or do you have an abstraction layer?
Can you export your translation memory / glossaries?

The smartest approach: wrap the translation API behind your own interface from day one.

interface TranslationProvider {
  translate(text: string, options: TranslateOptions): Promise<string>;
  translateBatch(texts: string[], options: TranslateOptions): Promise<string[]>;
}

Then swapping providers is a single implementation change, not a codebase-wide refactor.

The evaluation spreadsheet

Score each API on a 1-5 scale for each criterion. Weight by what matters to your use case. A real-time chat app cares more about latency than a batch documentation translator. A healthcare app cares more about data privacy than a gaming company.

| Criterion | Weight (your use case) | API A | API B | API C | | --------------------- | ---------------------- | ----- | ----- | ----- | | Language pair quality | | | | | | Pricing | | | | | | Rate limits | | | | | | Batch support | | | | | | Latency | | | | | | Format preservation | | | | | | Glossary support | | | | | | Formality control | | | | | | Context support | | | | | | File format support | | | | | | Caching | | | | | | Quality consistency | | | | | | Data privacy | | | | | | Error handling | | | | | | Migration path | | | | |

Fill this out with real test data, not marketing claims. The 2-3 hours you spend evaluating will save you months of pain after integration.