Building a multilingual website in Jekyll

Jekyll is a great tool for creating (mostly) static websites; in fact this very site is built upon it. But it doesn’t come with built-in support for using multiple languages. This is a feature I needed for the website of Mystery Game No. 1, which will be released in German and English. I had to invent how to do it, because existing approaches didn’t quite fit the bill.

Requirements

Here’s how I want it to work:

  • Language-agnostic assets (JavaScript, CSS, some images) do not need to be cloned per language.
  • There are multiple domains (in this case, .com and .de), and each domain serves up a different language.
  • URLs are clean: the language does not appear in the path (no /en/, /de/).
  • The same page might have a different (translated) URL path on a different domain, like /contact and /kontakt. These may collide (that is, / is the home page in all language versions).
  • We can use absolute paths to refer to other files.
  • We can use jekyll serve to preview the result.

Note how this differs from Sylvain Durand’s approach which uses a single domain and assumes non-colliding URLs between languages, and from healthcare.gov’s, Patrice Brend’amour’s and liaohuqiu’s which all use a URL subpath (rather than a top-level domain) to distinguish languages.

It looks like my approach might be a DIY version of Anthony Gaudino’s plugin, but I only found that out after the fact.

Alternatives

The simplest approach is to simply have two entirely separate repositories (or Git branches), but of course it would be a hassle to keep changes to shared assets in sync. So that option is out.

One could imagine a single repo with one subdirectory for each language:

/de
   /index.html
   /kontakt.html
/en
   /index.html
   /contact.html
/static
       /style.css
       /header.png

Upon uploading, we’d take the contents of one of the language subdirs and promote it to the top level:

/index.html
/kontakt.html
/static
       /style.css
       /header.png

Sadly this won’t work either, because I want internal links to be able to use absolute paths. (Relative paths are harder to write; moreover, they break when moving the source, whereas absolute paths break when moving the target, which is more natural.) But in this scenario, in order to link to contact.html, I’d have to write either /contact.html (breaking preview) or /en/contact.html (breaking the production site).

Configuration

The fact that we really need to generate /index.html once for each language led me to the conclusion that we need multiple Jekyll runs to generate this thing. I would like to be able to specify on the command line which language we’re building/previewing, with something like this fictional syntax:

$ jekyll serve --vars lang=de
$ jekyll serve --vars lang=de

Unfortunately, it’s hard to pass any custom data to Jekyll like that. The only way is to use JEKYLL_ENV, but I’d like to keep that for its common usage of distinguishing development from production builds. Then, while looking through every bit of documentation for possible hacks, I found this:

--config FILE1[,FILE2,...]

Specify config files instead of using _config.yml automatically. Settings in later files override settings in earlier files.

Perfect! So I set it up like this:

_config.yml

# Jekyll configuration that is language-agnostic.
#
# To run Jekyll, two configuration files need to be specified, in this order:
#
#     jekyll --config _config.yml,_config_en.yml
#     jekyll --config _config.yml,_config_de.yml
...
# Settings overridden by language-specific configs
destination: "/ See _config.yml on how to build this thing"
exclude: ["*.??.html"]

_config_de.yml

lang: de
destination: ./_site/de
include: ["*.de.html"]

_config_en.yml

lang: en
destination: ./_site/en
include: ["*.en.html"]

This puts each site, including static assets, under its own subdirectory in _site. Any HTML file with a language suffix only gets included into the appropriate version (I’m not using Markdown here, adjust to taste). Finally, the lang attribute is available in templates as site.lang, for instance:

<html lang="{{ site.lang }}">

When running Jekyll, two configuration files must be specified, in the right order:

$ jekyll --config _config.yml,_config_en.yml

When you forget, e.g. you just type jekyll serve without the right arguments, you get a useful error message (assuming you’re not running Jekyll as root):

Error:  Permission denied @ dir_s_mkdir - / See _config.yml on how to build this thing

One minor drawback is that you have to specify the final URL in the YAML front matter of each file, in order to strip out the .xx.html extension:

contact.en.html

---
title: Contact
...
permalink: /contact/
---

...

Note the trailing slash, which is needed for “pretty” permalinks without .html file extension. The same approach should work if you want to put each language’s source files into a subdirectory, rather than using file extensions like I did here.

Translating common elements

The above takes care of translating entire files only. In order to translate common site elements that appear in otherwise shareable layout files (e.g. navigation), we can use the _data directory:

_data/de.yml

contact: Kontakt
...

_data/en.yml

contact: Contact
...

Each filename in that directory becomes a key in site.data, so we can access translated strings in this (slightly verbose) way:

<a href="...">{{ site.data[site.lang].contact }}</a>

You may wonder what goes on the place of the dots in the above example, since URL paths can differ by language. How do we say “the path to the contact page, but in the current language”? First, we need a way to identify that page to begin with, since each language has different URLs. So we add a unique identifier to the YAML front matter:

contact.de.html

---
title: Kontakt
key: contact
permalink: /kontakt/
---

...

We can use that from our layout file as follows:

<a href="{{ site.pages | where:'key','contact' | map:'permalink' | first }}">...</a>

Again it’s slightly verbose, but we don’t usually need many of these.

This needs a bit more configuration in _config.yml:

languages: ["en", "de"]

And in _data/en.yml and _data/de.yml:

baseurl: http://example.com  # or .de, respectively

Then we can generate links to the same page in different languages as follows:

{% raw %}
{% for lang in site.languages %}
  <a href="{{ site.data[lang].baseurl }}">
    {{ lang | upcase }}
  </a>
{% endfor %}
{% endraw %}

Admittedly, I have been a bit lazy here: this links back to the root page, not to the page we’re currently on. In the current setup, that is hard to do, because we excluded all files in alternative languages, which means Jekyll doesn’t load them and we don’t have access to those pages or their permalinks. We could get around this by registering all permalinks in _data as well, but since this is a small site with most content on the home page, I’m not going to bother for now.