2026-04-06

Real-Time Translation for Chat Apps

How to build a real-time translation pipeline for chat applications using WebSockets, caching, and translation APIs.

Adding real-time translation to a chat app sounds simple: user sends message, you translate it, you deliver the translated version. In practice, you'll hit latency constraints, cost concerns, and edge cases that make it genuinely tricky.

Here's how to build a translation pipeline that works at chat speed.

The Latency Budget

In a chat app, message delivery needs to feel instant. Users expect their messages to appear within 200-500ms. If you add translation on top of that, you need to stay within this budget:

WebSocket receive: ~10ms
Translation API call: 100-800ms (NMT) or 300-2000ms (LLM)
WebSocket send: ~10ms

A standard NMT translation (Google, DeepL) takes 100-200ms for short text. That's tight but workable. LLM-based translation takes 300-800ms, which pushes past the threshold where users feel a delay.

The solution: don't block message delivery on translation.

Architecture: Translate Asynchronously

The most practical pattern is to deliver the original message immediately and translate in the background:

User A sends message (Spanish)
  → Server receives via WebSocket
  → Server immediately forwards original to User B
  → Server kicks off async translation to English
  → Translation completes (~200ms)
  → Server sends translated version as an update to User B

User B sees the Spanish message first, then sees the English translation appear underneath it. This feels fast and is honest — the user sees real content immediately.

// server.ts
import { WebSocketServer } from "ws";
const wss = new WebSocketServer({ port: 8080 });
interface ChatMessage {
  id: string;
  sender: string;
  text: string;
  lang: string;
  room: string;
}
wss.on("connection", (ws) => {
  ws.on("message", async (raw) => {
    const msg: ChatMessage = JSON.parse(raw.toString());
// 1. Broadcast original message immediately
    broadcast(msg.room, {
      type: "message",
      ...msg,
    });
// 2. Translate asynchronously for users who need it
    const roomMembers = getRoomMembers(msg.room);
    const targetLangs = new Set(
      roomMembers.filter((m) => m.lang !== msg.lang).map((m) => m.lang),
    );
for (const targetLang of targetLangs) {
      translateAsync(msg, targetLang);
    }
  });
});
async function translateAsync(msg: ChatMessage, targetLang: string) {
  try {
    const translated = await translateText(msg.text, targetLang);
    broadcast(msg.room, {
      type: "translation",
      messageId: msg.id,
      targetLang,
      text: translated,
    });
  } catch (err) {
    console.error("Translation failed:", err);
    // Don't crash — the user still has the original message
  }
}

Caching Chat Messages

Chat messages are repetitive. "OK", "Thanks", "See you tomorrow", and emoji reactions get sent thousands of times. Cache them aggressively.

import { createHash } from "crypto";
const translationCache = new Map<string, string>();
function cacheKey(text: string, targetLang: string): string {
  return createHash("md5").update(${text}|${targetLang}).digest("hex");
}
async function translateText(
  text: string,
  targetLang: string,
): Promise<string> {
  const key = cacheKey(text, targetLang);
  const cached = translationCache.get(key);
  if (cached) return cached;
const response = await fetch("https://api.auto18n.com/translate", {
    method: "POST",
    headers: {
      Authorization: Bearer ${process.env.AUTO18N_API_KEY},
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ text, to: targetLang }),
  });
const data = await response.json();
  translationCache.set(key, data.translation);
  return data.translation;
}

An in-memory cache works for a single server. For a distributed setup, use Redis:

import Redis from "ioredis";
const redis = new Redis(process.env.REDIS_URL);
async function translateText(
  text: string,
  targetLang: string,
): Promise<string> {
  const key = chat:tr:${cacheKey(text, targetLang)};
const cached = await redis.get(key);
  if (cached) return cached;
const translation = await callTranslationApi(text, targetLang);
  await redis.setex(key, 86400, translation); // 24h TTL
  return translation;
}

With auto18n, caching is handled server-side automatically, so cached lookups return in under 50ms without you maintaining a cache. But for chat apps, a local cache layer still helps because it avoids the network round-trip entirely for common phrases.

Handling Typing Indicators

A question that comes up: should you translate typing indicators? No. The "User is typing..." indicator should be language-agnostic. Translating it would be confusing (the other user isn't typing in your language — they're typing in theirs).

Language Detection

For chat messages, you have two options for knowing the source language:

Option A: User sets their language. Each user has a language preference in their profile. All their messages are assumed to be in that language. Simple, but breaks when users code-switch (mixing languages in one message).

Option B: Auto-detect per message. Run language detection on each message and translate based on the detected language. More accurate, but adds latency and can misdetect very short messages.

I recommend Option A with Option B as a fallback for ambiguous cases:

function getSourceLang(msg: ChatMessage, detectedLang: string | null): string {
  // Trust the user's profile language for short messages
  // (auto-detection is unreliable under ~20 characters)
  if (msg.text.length < 20) return msg.lang;
// Use detected language if available and different
  if (detectedLang && detectedLang !== msg.lang) {
    return detectedLang;
  }
return msg.lang;
}

Rate Limiting and Cost Control

Chat generates a lot of translation requests. A busy room with 50 active users across 5 languages can generate hundreds of translation requests per minute.

Debounce edits. If your chat supports message editing, don't re-translate on every keystroke. Wait until the edit is finalized.

Skip short messages. "OK", "lol", ":)", "👍" — don't translate these. Set a minimum character threshold.

function shouldTranslate(text: string): boolean {
  // Skip very short messages
  if (text.length < 3) return false;
// Skip emoji-only messages
  const emojiPattern = /^[\p{Emoji}\s]+$/u;
  if (emojiPattern.test(text)) return false;
// Skip URLs
  if (/^https?:\/\/\S+$/.test(text.trim())) return false;
return true;
}

Budget caps. Set a per-room or per-user translation budget. If a room is generating $10/day in translation costs, something is wrong (spam, bot activity) and you should throttle.

The Full Client-Side Flow

On the client, you need to handle both the original message and the translation update:

// client.ts
interface Message {
  id: string;
  sender: string;
  text: string;
  lang: string;
  translation?: string;
}
const messages = new Map<string, Message>();
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
if (data.type === "message") {
    messages.set(data.id, {
      id: data.id,
      sender: data.sender,
      text: data.text,
      lang: data.lang,
    });
    renderMessage(data.id);
  }
if (data.type === "translation") {
    const msg = messages.get(data.messageId);
    if (msg && data.targetLang === myLang) {
      msg.translation = data.text;
      renderMessage(data.messageId); // re-render with translation
    }
  }
};

Latency Numbers From Production

From a real implementation using auto18n with Redis caching:

Cache hit (local Redis): 0.5ms
Cache hit (auto18n server-side): 30-50ms
Cache miss (LLM translation): 400-700ms
Cache miss (NMT translation via Google): 100-200ms

For a chat app with reasonable caching, about 70-80% of messages hit the cache (common phrases, repeated messages). The average translation latency ends up around 80-120ms, which is well within the acceptable range.

Final Advice

Don't try to translate everything synchronously. Deliver first, translate second. Cache aggressively. Skip messages that don't need translation. And test with real chat patterns — the distribution of message lengths and repetition rates in real chat is very different from what you'd expect.