AppSuite:I18n

From Open-Xchange
Revision as of 11:39, 27 September 2013 by Tierlieb (talk | contribs)

API status: In Development

Internationalization (i18n)

Providing software for users in the whole world means providing it in multiple languages. This consists of two steps:

  • Internationalization (i18n) - Making the software international, i. e. preparing it to be adapted to different languages and locations.
  • Localization (L10n) - Adapting the software to a particular language and/or location.

The Open-Xchange platform offers several facilities to simplify both parts. The L10n part is mostly a concern for translators. Open-Xchange facilities for that consist mainly of using a well-established format for translations: GNU Gettext Portable Object (PO) files. This allows translators to use existing dedicated translation tools or a simple UTF-8-capable text editor.

The i18n part is what software developers will be mostly dealing with and is what the rest of this document describes.

Translation

The main part of i18n is the translation of various text strings which are presented to the user. For this purpose, the Open-Xchange platform provides the RequireJS plugin 'gettext'. Individual translation files are specified as a module dependency of the form 'gettext!module_name'. The resulting module API is a function which can be called to translate a string using the specified translation files:

define('com.example/example', ['gettext!com.example/example'],
  function (gt) {
    'use strict';
    alert(gt('Hello, world!'));
  });

After building the module, the file ox.pot in the project's root directory will contain an entry for every call to one of gettext functions:

#: apps/com.example/example.js:4
msgid "Hello, world!"
msgstr ""

After translation, the PO files in the directory i18n shoud contain the translated entry:

#: apps/com.example/example.js:4
msgid "Hello, world!"
msgstr "Hallo, Welt!"

During the next build, the entries are copied from the central PO files into individual translation files. In our example, this would be apps/com.example/example.de_DE.js. Because of the added language ID, translation files can usually have the same name as the corresponding main module. Multiple related modules should use the same translation file to avoid the overhead of too many small translation files.

Most modules will require more complex translations than can be provided by a simple string lookup. To handle some of these cases, the gettext module provides traditional methods in addition to being a callable function. Other cases are handled by the build system.

Composite Phrases

In most cases, the translated texts will not be static, but contain dynamic values as parts of a sentence. The straight-forward approach of translating parts of the sentence individually and then using string concatenation to compose the final text is a BAD idea. Different languages have different sentence structures, and if there are multiple dynamic values, their order might need to be reversed in some languages, and not reversed in others.

The solution is to translate entire sentences and then to use the gettext function to insert dynamic values:

alert(
  //#. %1$s is the given name
  //#. %2$s is the family name
  //#, c-format
  gt('Welcome, %1$s %2$s!', firstName, lastName));

Results in:

#. %1$s is the given name
#. %2$s is the family name
#, c-format
msgid "Welcome, %1$s, %2$s"
msgstr ""

As shown in the example, it is possible to add comments for translators by starting a comment with "#.". Such comments must be placed immediately before the name of the variable which refers to the gettext module (in this case gt). They can be separated by arbitrary whitespace and newlines, but not other tokens. All such gettext calls should have comments explaining every format specifier.

Comments starting with "#," are meant for gettext tools, which in the case of "#, c-format", can ensure that the translator did not leave out or mistype any of the format specifiers.

For the cases when the format string must be translated by one of the functions described below, there is a dedicated format function gettext.format which, except for debugging, is an alias for _.printf.

Debugging

One of the most common i18n errors is forgetting to use a gettext function. To catch this kind of error, the UI can be started with the hash parameter "#debug-i18n=1". (Reloading of the browser tab is usually required for the setting to take effect.)

In this mode, every translated string is marked with invisible Unicode characters, and any DOM text without those characters gets reported on the console. The gettext.format function then also checks that every parameter is translated. This is the reason why _.printf should not be used for user-visible strings directly.

Unfortunately, this method will also report any string which does not actually require translation. Examples of such strings include user data, numbers, strings already translated by the server, etc. To avoid filling the console with such false positives, every such string must be marked by passing it through the function gettext.noI18n:

alert(
  //#. %1$s is the given name
  //#. %2$s is the family name
  //#, c-format
  gt('Welcome, %1$s %2$s!', gt.noI18n(firstName), gt.noI18n(lastName)));

This results in the strings being marked as 'translated' without actually changing their visible contents. When not debugging, gettext.noI18n simply returns its first parameter.

Advanced gettext Functions

Besides gettext.format and gettext.noI18n there are several other functions which are required to cover all typical translation scenarios.

Contexts

Sometimes, the same English word or phrase has multiple meanings and must be translated differently depending on context. To be able to tell the individual translations apart, the method gettext.pgettext ('p' stands for 'particular') should be used instead of calling gettext directly. It takes the context as the first parameter and the text to translate as the second parameter:

alert(gt.pgettext('description', 'Title'));
alert(gt.pgettext('salutation', 'Title'));

Results in:

msctxt "description"
msgid "Title"
msgstr "Beschreibung"

msctxt "salutation"
msgid "Title"
msgstr "Anrede"

Plural forms

In the case of numbers, the rules to select the proper plural form can be very complex. With the exception of languages with no separate plural forms, English is the second simplest language in this respect, having only two plural forms: singular and plural. Other languages can have up to four forms, and theoretically even more. The functions gettext.ngettext and gettext.npgettext (for a combination of plural forms with contexts) can select the proper plural form by using a piece of executable code embedded in the header of a PO file:

alert(gt.format(
  //#. %1$d is the number of mails
  //#, c-format
  gt.ngettext('You have %1$d mail', 'You have %1$d mails', n),
  gt.noI18n(n)));

The function gettext.ngettext accepts three parameters: the English singular and plural forms and the number which determines the chosen plural form. The function gettext.npgettext adds a context parameter before the others, similar to gettext.pgettext. They are usually used in combination with gettext.format to insert the actual number into the translated string.

The above example results in the following entry:

#. %1$d is the number of mails
#, c-format
msgid "You have %1$d mail"
msgid_plural "You have %1$d mails"
msgstr[0] ""
msgstr[1] ""

The number of msgstr[N] lines is determined by the number of plural forms in each language. This number is specified in the header of each PO file, together with the code to compute the index of the correct plural form the supplied number.

Custom Translations

App Suite has support for custom translations. gettext provides a function addTranslation(dictionary, key, value) that allows adding custom translation to a global dictionary ("*") as well as replacing translations in existing dictionaries. In order to use this, you have to create a plugin for the "core" namespace.

 
var gt = require('io.ox/core/gettext');
gt.addTranslation('*', 'Existing translation', 'My custom translation');

To keep it simple you can use the internal escaping to build counterparts for ngettext, pgettext, as well as npgettext. A context is escaped by \x00. Singular and plural are separated by \x01:

 
// load gettext
var gt = require('io.ox/core/gettext');

// replace translation in dictionary 'io.ox/core'. Apps use context "app".
gt.addTranslation('io.ox/core', 'app\x00Address Book', 'My Contacts');

// replace translation - counterpart to ngettext
gt.addTranslation('io.ox/mail', 'Copy\x01Copies', { 0: 'Kopie', 1: 'Kopien' });

// replace translation - counterpart to npgettext
gt.addTranslation('io.ox/mail', 'test\x00One\x01Two', { 0: 'Eins', 1: 'Zwei' });

// some checks
require(['gettext!io.ox/mail'], function (gt) {

    console.log('Sigular:', gt.ngettext('Copy', 'Copies', 1));
    console.log('Plural:', gt.ngettext('Copy', 'Copies', 2));

    console.log('Without context', gt.gettext('Files')); // should be 'My File Box'
    console.log('With context', gt.pgettext('test', 'Files')); // should be 'Files'

    console.log('Plural with context', gt.npgettext('test', 'One', 'Two', 2)); // should be 'Two'
});

addTranslation() always works for current language. If you want to benefit from automatic POT file generation, use the following approach:

 
define('plugins/example/register', ['io.ox/core/gettext', 'gettext!custom'], function (gettext, gt) {

    'use strict';

    /* Just list all custom translations in your plugin by standalone gt() calls.
       The build system recognizes these strings and collects them in a POT file, 
       so that they can be subject to translation processes. At runtime, gt('...')
       returns translations for current language.
    */
    gt('Tree');
    gt('House');
    gt('Dog');

    // get internal dictionary (for current language of course)
    var dictionary = gt.getDictionary();

    // add all translations to global dictionary. Done!
    gettext.addTranslation('*', dictionary);
});