Building a Translation Cache That Works
How to design a translation cache with proper key structure, invalidation strategy, and the cost savings math to justify the effort.
Every translation API charges per character. None of them (except auto18n) cache results for you. If you translate "Save" to Spanish on Monday and again on Friday, you pay twice. At scale, this is a massive waste.
Here's how to build a translation cache that actually saves money.
The Cache Key Problem
The obvious cache key is source_text + target_language. But that's not enough.
"Save" + "es" → "Guardar"
This breaks when:
- You change translation providers (Google and DeepL produce different translations)
- You update the formality setting
- You add context that changes the translation
- You want to force re-translation after fixing a bad result
import { createHash } from "crypto";
function cacheKey(params: {
text: string;
to: string;
provider?: string;
context?: string;
formality?: string;
}): string {
const input = JSON.stringify({
t: params.text,
l: params.to,
p: params.provider ?? "default",
c: params.context ?? "",
f: params.formality ?? "",
});
return createHash("sha256").update(input).digest("hex");
}
Using a hash keeps the key length fixed regardless of input size. This matters for Redis, where long keys waste memory.
Storage Options
Redis
The standard choice. Fast reads (~1ms), simple key-value model, TTL support.
import Redis from "ioredis";
const redis = new Redis(process.env.REDIS_URL);
async function getCachedTranslation(key: string): Promise<string | null> {
return redis.get(tr:${key});
}
async function setCachedTranslation(
key: string,
translation: string,
ttlSeconds = 60 60 24 * 30, // 30 days
): Promise<void> {
await redis.setex(tr:${key}, ttlSeconds, translation);
}
The tr: prefix namespaces translation keys so they don't collide with other cached data.
Memory estimate: A typical translation (50 chars source, 60 chars target) with overhead is about 200 bytes in Redis. 100,000 cached translations = ~20MB. 1 million = ~200MB. Redis on a small VPS can handle this easily.
SQLite
If you're already running a server and don't want to add Redis, SQLite works fine for caching. Reads are fast (especially with WAL mode), and you don't need another service.
CREATE TABLE translation_cache (
cache_key TEXT PRIMARY KEY,
source_text TEXT NOT NULL,
target_lang TEXT NOT NULL,
translation TEXT NOT NULL,
created_at INTEGER DEFAULT (unixepoch()),
accessed_at INTEGER DEFAULT (unixepoch())
);
CREATE INDEX idx_cache_accessed ON translation_cache(accessed_at);
Filesystem
For CI/CD pipelines where you're translating i18n JSON files, a simple JSON file works:
import { readFileSync, writeFileSync, existsSync } from "fs";
const CACHE_FILE = ".translation-cache.json";
function loadCache(): Record<string, string> {
if (!existsSync(CACHE_FILE)) return {};
return JSON.parse(readFileSync(CACHE_FILE, "utf-8"));
}
function saveCache(cache: Record<string, string>): void {
writeFileSync(CACHE_FILE, JSON.stringify(cache, null, 2));
}
Commit the cache file to your repo. Every CI run starts with previous translations already available. This alone can cut your translation API costs by 80%+ since most deploys only add a few new strings.
Invalidation Strategy
Cache invalidation is famously hard. For translations, it's actually manageable because translations don't change often. Here's what triggers invalidation:
1. Source text changes
If the English string changes from "Save changes" to "Save your changes," the cache key changes automatically (since the source text is part of the key). No explicit invalidation needed.
2. Bad translation reported
A user reports a translation is wrong. You need to evict it and re-translate.
async function invalidateTranslation(
text: string,
targetLang: string,
): Promise<void> {
const key = cacheKey({ text, to: targetLang });
await redis.del(tr:${key});
}
3. Provider change
Switching from Google Translate to DeepL? If the provider is part of your cache key (it should be), old translations remain in cache under the old provider's key and new requests use the new provider. No mass invalidation needed.
4. TTL-based expiration
Set a TTL of 30-90 days. Translation quality improves over time as providers update their models. A 90-day TTL means your translations are never more than 3 months stale.
Don't set the TTL too short. A 1-hour TTL defeats the purpose of caching translations, which rarely change.
The Cost Savings Math
Let's work through a real scenario.
Your app: 5,000 translatable strings, 8 target languages, deploying 20 times per month.
Without cache:
- 5,000 strings x 80 chars average x 8 languages = 3.2M chars per deploy
- 20 deploys x 3.2M = 64M chars/month
- At $20/1M chars (Google) = $1,280/month
- Assume 50 new or changed strings per deploy (1% of total)
- 50 strings x 80 chars x 8 languages = 32,000 chars per deploy
- 20 deploys x 32K = 640,000 chars/month
- At $20/1M chars = $12.80/month
Even if you're less aggressive — say 10% of strings change per deploy — you're still looking at $128/month vs $1,280/month. A 90% saving.
Cache Warming
When you first deploy the cache, you have a cold start problem. Every string needs translation, so your first run is expensive and slow.
For i18n file workflows, warm the cache by running a full translation once:
async function warmCache(
sourceStrings: Record<string, string>,
targetLangs: string[],
): Promise<void> {
const cache = loadCache();
let translated = 0;
for (const lang of targetLangs) {
for (const [key, text] of Object.entries(sourceStrings)) {
const ck = cacheKey({ text, to: lang });
if (cache[ck]) continue;
const translation = await translateViaApi(text, lang);
cache[ck] = translation;
translated++;
// Rate limiting
if (translated % 100 === 0) {
await new Promise((r) => setTimeout(r, 1000));
saveCache(cache); // checkpoint
}
}
}
saveCache(cache);
console.log(Warmed cache with ${translated} new translations);
}
Or Just Use a Service That Caches
If building and maintaining a translation cache sounds like more infrastructure than you want, auto18n handles this automatically. Every translation is cached on their side — same string, same target language, same context = instant response at no additional cost.
The architectural principle is sound either way: never pay to translate the same string twice. Whether you implement the cache yourself or let your translation service handle it, this should be a hard requirement in your translation pipeline.
Common Mistakes
Caching the wrong thing. Cache the _translation result_, not the API response. API responses include metadata (detected language, confidence scores) that changes between calls and bloats your cache.
Forgetting to normalize input. "Save Changes" and "Save changes" have different cache keys unless you normalize. Decide on a normalization strategy (lowercase? trim whitespace?) and apply it consistently.
No cache metrics. Track your hit rate. If it's below 50%, your cache isn't helping enough and you should investigate why. If it's above 95%, you could probably reduce your TTL to get fresher translations.
Caching empty translations. If the API returns an error or empty string, don't cache it. You'll serve bad results until the TTL expires.
if (translation && translation.trim().length > 0) {
await setCachedTranslation(key, translation);
}