All posts

Building a Multilingual Documentation Site

Practical guide to multilingual docs with Docusaurus, GitBook, and MkDocs: directory structure, automated translation pipelines, versioning, and URL strategies.

Your docs are in English. Your users aren't. Here's how to add language support to documentation sites built with Docusaurus, GitBook, and MkDocs — including the directory structure, translation pipeline, and versioning strategy that actually work at scale.

The URL structure decision

Before anything else, pick a URL strategy. This affects SEO, routing, and how you organize files.

Subdirectory (recommended): docs.example.com/ja/getting-started

  • Single domain, good for SEO consolidation
  • Easy to implement in most frameworks
  • Clear language segmentation in analytics
Subdomain: ja.docs.example.com/getting-started
  • Treated as separate sites by search engines
  • More complex DNS and hosting setup
  • Useful if different language versions are maintained by different teams
Separate domain: docs.example.jp/getting-started
  • Maximum separation
  • Most expensive to maintain
  • Only makes sense for very large, independently managed locales
For 95% of documentation sites, subdirectories are the right choice. Let's go with that.

Docusaurus: built-in i18n

Docusaurus has first-class i18n support. Configuration in docusaurus.config.js:

module.exports = {
  i18n: {
    defaultLocale: "en",
    locales: ["en", "ja", "de", "fr", "ko", "zh-Hans"],
    localeConfigs: {
      en: { label: "English" },
      ja: { label: "日本語" },
      de: { label: "Deutsch" },
      fr: { label: "Français" },
      ko: { label: "한국어" },
      "zh-Hans": { label: "简体中文" },
    },
  },
};

Docusaurus uses a directory structure like:

docs/
  getting-started.md
  api-reference.md
i18n/
  ja/
    docusaurus-plugin-content-docs/
      current/
        getting-started.md
        api-reference.md
  de/
    docusaurus-plugin-content-docs/
      current/
        getting-started.md
        api-reference.md

The i18n/ directory mirrors the docs/ structure per locale. To initialize translation files:

npx docusaurus write-translations --locale ja

This generates JSON files for UI strings (navbar, footer, etc.) and copies the Markdown files as stubs.

The annoying part: Docusaurus doesn't auto-translate anything. It gives you the file structure and expects you to fill in translations. For a 200-page docs site across 5 languages, that's 1,000 files to manage manually — unless you automate it.

MkDocs: plugin-based i18n

MkDocs doesn't have built-in i18n, but the mkdocs-static-i18n plugin works well:

# mkdocs.yml
plugins:
  
  • i18n:
default_language: en languages: en: English ja: 日本語 de: Deutsch

File structure with the plugin:

docs/
  getting-started.en.md
  getting-started.ja.md
  getting-started.de.md
  api-reference.en.md
  api-reference.ja.md
  api-reference.de.md

Or alternatively with directory-based separation:

docs/
  en/
    getting-started.md
  ja/
    getting-started.md
  de/
    getting-started.md

The suffix-based approach keeps translations next to their source, making it easier to spot missing translations. The directory-based approach is cleaner for larger sites.

GitBook: a different model

GitBook handles i18n through "spaces" — each language version is a separate space. There's no built-in mechanism to keep them in sync. You create a space per language, duplicate the content, and translate it.

This works fine for 2-3 languages but becomes a maintenance nightmare at 5+. The main risk is content drift — the English docs get updated, the translations don't, and users in other languages see outdated information.

GitBook's API can be used to automate syncing, but it requires custom tooling.

Automating the translation pipeline

The manual workflow (export strings, send to translators, wait, import) doesn't scale for documentation that changes frequently. Here's an automated pipeline:

import os
import hashlib
import json

DOCS_DIR = "docs" I18N_DIR = "i18n" MANIFEST_FILE = ".translation-manifest.json" TARGET_LOCALES = ["ja", "de", "fr"]

def get_file_hash(filepath): with open(filepath, 'r') as f: return hashlib.sha256(f.read().encode()).hexdigest()

def load_manifest(): if os.path.exists(MANIFEST_FILE): with open(MANIFEST_FILE, 'r') as f: return json.load(f) return {}

def translate_docs(): manifest = load_manifest() updated = []

for root, dirs, files in os.walk(DOCS_DIR): for filename in files: if not filename.endswith('.md'): continue

filepath = os.path.join(root, filename) current_hash = get_file_hash(filepath)

if manifest.get(filepath) == current_hash: continue # File unchanged

# File is new or modified — translate it with open(filepath, 'r') as f: content = f.read()

for locale in TARGET_LOCALES: translated = translate_markdown(content, target_locale=locale) target_path = filepath.replace(DOCS_DIR, f"{I18N_DIR}/{locale}") os.makedirs(os.path.dirname(target_path), exist_ok=True) with open(target_path, 'w') as f: f.write(translated)

manifest[filepath] = current_hash updated.append(filepath)

with open(MANIFEST_FILE, 'w') as f: json.dump(manifest, f, indent=2)

return updated

Run this in CI on every merge to main. Only changed files get re-translated, keeping costs and build times reasonable.

Versioned docs and translations

Documentation versioning adds complexity. If you maintain docs for v1, v2, and v3, each version needs translations. The naive approach (translate all versions of all pages into all languages) creates a combinatorial explosion.

A more practical strategy:

  • Only translate the latest version. Older versions stay in English. Most users are on the current version anyway.
  • Carry translations forward. When you release a new version, copy the previous version's translations as a starting point and only re-translate changed pages.
  • Track translation freshness. Show a banner on translated pages that are outdated:
  • <!-- i18n-outdated: source-hash=abc123 current-hash=def456 -->
    {
      isOutdated && (
        <Banner type="warning">
          This translation may be outdated.
          <a href={englishUrl}>View the English version</a>
        </Banner>
      );
    }

    Handling partial translations

    You won't translate every page into every language immediately. You need a fallback strategy:

    Option 1: Fall back to English. If a page doesn't exist in the user's language, show the English version. This is what Docusaurus does by default.

    Option 2: Show what you have. Translate high-traffic pages first (getting started, API reference, common guides). Show a "not yet translated" notice on other pages with a link to the English version.

    Option 3: Machine-translate everything, human-review priority pages. Use auto18n or similar to get a baseline translation of all pages, then have native speakers review the top 20% of pages by traffic. This gives users something in their language immediately while you improve quality incrementally.

    Option 3 is what I recommend for most teams. A machine-translated page that's 90% accurate is better than no translation at all.

    SEO for multilingual docs

    For translated docs to rank in local search results:

    hreflang tags. Every page needs hreflang annotations pointing to all its language variants:

    <link
      rel="alternate"
      hreflang="en"
      href="https://docs.example.com/en/getting-started"
    />
    <link
      rel="alternate"
      hreflang="ja"
      href="https://docs.example.com/ja/getting-started"
    />
    <link
      rel="alternate"
      hreflang="de"
      href="https://docs.example.com/de/getting-started"
    />
    <link
      rel="alternate"
      hreflang="x-default"
      href="https://docs.example.com/en/getting-started"
    />

    Translated metadata. Page titles, meta descriptions, and OpenGraph tags should be translated, not just the body content.

    Sitemap per locale. Generate a sitemap index that includes sitemaps for each language:

    <sitemapindex>
      <sitemap><loc>https://docs.example.com/en/sitemap.xml</loc></sitemap>
      <sitemap><loc>https://docs.example.com/ja/sitemap.xml</loc></sitemap>
      <sitemap><loc>https://docs.example.com/de/sitemap.xml</loc></sitemap>
    </sitemapindex>

    Don't auto-redirect based on IP. Let Google and users access all language versions from any location. Use hreflang to signal the right version, but let users choose.

    The realistic timeline

    For a 100-page docs site adding 5 languages:

    • Week 1: Set up the i18n framework (directory structure, build config, locale switcher)
    • Week 2: Machine-translate all content, set up the automated pipeline
    • Week 3-4: Human review of the top 20 pages per language
    • Ongoing: Automated translation of new/changed content, periodic human review
    The initial setup is a one-time cost. The ongoing work is proportional to how fast your docs change, not how many languages you support — and that's the whole point of automating the pipeline.