Real-Time Translation for Chat Apps
How to build a real-time translation pipeline for chat applications using WebSockets, caching, and translation APIs.
Adding real-time translation to a chat app sounds simple: user sends message, you translate it, you deliver the translated version. In practice, you'll hit latency constraints, cost concerns, and edge cases that make it genuinely tricky.
Here's how to build a translation pipeline that works at chat speed.
The Latency Budget
In a chat app, message delivery needs to feel instant. Users expect their messages to appear within 200-500ms. If you add translation on top of that, you need to stay within this budget:
- WebSocket receive: ~10ms
- Translation API call: 100-800ms (NMT) or 300-2000ms (LLM)
- WebSocket send: ~10ms
The solution: don't block message delivery on translation.
Architecture: Translate Asynchronously
The most practical pattern is to deliver the original message immediately and translate in the background:
User A sends message (Spanish)
→ Server receives via WebSocket
→ Server immediately forwards original to User B
→ Server kicks off async translation to English
→ Translation completes (~200ms)
→ Server sends translated version as an update to User B
User B sees the Spanish message first, then sees the English translation appear underneath it. This feels fast and is honest — the user sees real content immediately.
// server.ts
import { WebSocketServer } from "ws";
const wss = new WebSocketServer({ port: 8080 });
interface ChatMessage {
id: string;
sender: string;
text: string;
lang: string;
room: string;
}
wss.on("connection", (ws) => {
ws.on("message", async (raw) => {
const msg: ChatMessage = JSON.parse(raw.toString());
// 1. Broadcast original message immediately
broadcast(msg.room, {
type: "message",
...msg,
});
// 2. Translate asynchronously for users who need it
const roomMembers = getRoomMembers(msg.room);
const targetLangs = new Set(
roomMembers.filter((m) => m.lang !== msg.lang).map((m) => m.lang),
);
for (const targetLang of targetLangs) {
translateAsync(msg, targetLang);
}
});
});
async function translateAsync(msg: ChatMessage, targetLang: string) {
try {
const translated = await translateText(msg.text, targetLang);
broadcast(msg.room, {
type: "translation",
messageId: msg.id,
targetLang,
text: translated,
});
} catch (err) {
console.error("Translation failed:", err);
// Don't crash — the user still has the original message
}
}
Caching Chat Messages
Chat messages are repetitive. "OK", "Thanks", "See you tomorrow", and emoji reactions get sent thousands of times. Cache them aggressively.
import { createHash } from "crypto";
const translationCache = new Map<string, string>();
function cacheKey(text: string, targetLang: string): string {
return createHash("md5").update(${text}|${targetLang}).digest("hex");
}
async function translateText(
text: string,
targetLang: string,
): Promise<string> {
const key = cacheKey(text, targetLang);
const cached = translationCache.get(key);
if (cached) return cached;
const response = await fetch("https://api.auto18n.com/translate", {
method: "POST",
headers: {
Authorization: Bearer ${process.env.AUTO18N_API_KEY},
"Content-Type": "application/json",
},
body: JSON.stringify({ text, to: targetLang }),
});
const data = await response.json();
translationCache.set(key, data.translation);
return data.translation;
}
An in-memory cache works for a single server. For a distributed setup, use Redis:
import Redis from "ioredis";
const redis = new Redis(process.env.REDIS_URL);
async function translateText(
text: string,
targetLang: string,
): Promise<string> {
const key = chat:tr:${cacheKey(text, targetLang)};
const cached = await redis.get(key);
if (cached) return cached;
const translation = await callTranslationApi(text, targetLang);
await redis.setex(key, 86400, translation); // 24h TTL
return translation;
}
With auto18n, caching is handled server-side automatically, so cached lookups return in under 50ms without you maintaining a cache. But for chat apps, a local cache layer still helps because it avoids the network round-trip entirely for common phrases.
Handling Typing Indicators
A question that comes up: should you translate typing indicators? No. The "User is typing..." indicator should be language-agnostic. Translating it would be confusing (the other user isn't typing in your language — they're typing in theirs).
Language Detection
For chat messages, you have two options for knowing the source language:
Option A: User sets their language. Each user has a language preference in their profile. All their messages are assumed to be in that language. Simple, but breaks when users code-switch (mixing languages in one message).
Option B: Auto-detect per message. Run language detection on each message and translate based on the detected language. More accurate, but adds latency and can misdetect very short messages.
I recommend Option A with Option B as a fallback for ambiguous cases:
function getSourceLang(msg: ChatMessage, detectedLang: string | null): string {
// Trust the user's profile language for short messages
// (auto-detection is unreliable under ~20 characters)
if (msg.text.length < 20) return msg.lang;
// Use detected language if available and different
if (detectedLang && detectedLang !== msg.lang) {
return detectedLang;
}
return msg.lang;
}
Rate Limiting and Cost Control
Chat generates a lot of translation requests. A busy room with 50 active users across 5 languages can generate hundreds of translation requests per minute.
Debounce edits. If your chat supports message editing, don't re-translate on every keystroke. Wait until the edit is finalized.
Skip short messages. "OK", "lol", ":)", "👍" — don't translate these. Set a minimum character threshold.
function shouldTranslate(text: string): boolean {
// Skip very short messages
if (text.length < 3) return false;
// Skip emoji-only messages
const emojiPattern = /^[\p{Emoji}\s]+$/u;
if (emojiPattern.test(text)) return false;
// Skip URLs
if (/^https?:\/\/\S+$/.test(text.trim())) return false;
return true;
}
Budget caps. Set a per-room or per-user translation budget. If a room is generating $10/day in translation costs, something is wrong (spam, bot activity) and you should throttle.
The Full Client-Side Flow
On the client, you need to handle both the original message and the translation update:
// client.ts
interface Message {
id: string;
sender: string;
text: string;
lang: string;
translation?: string;
}
const messages = new Map<string, Message>();
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === "message") {
messages.set(data.id, {
id: data.id,
sender: data.sender,
text: data.text,
lang: data.lang,
});
renderMessage(data.id);
}
if (data.type === "translation") {
const msg = messages.get(data.messageId);
if (msg && data.targetLang === myLang) {
msg.translation = data.text;
renderMessage(data.messageId); // re-render with translation
}
}
};
Latency Numbers From Production
From a real implementation using auto18n with Redis caching:
- Cache hit (local Redis): 0.5ms
- Cache hit (auto18n server-side): 30-50ms
- Cache miss (LLM translation): 400-700ms
- Cache miss (NMT translation via Google): 100-200ms
Final Advice
Don't try to translate everything synchronously. Deliver first, translate second. Cache aggressively. Skip messages that don't need translation. And test with real chat patterns — the distribution of message lengths and repetition rates in real chat is very different from what you'd expect.