Tutorial :I18N of XML documents


I'm about to decide how to handle internationalisation of an XML-based format for UI description.

The format typically looks something like this:

...  <devif>    <screen id="scr1" title="Settings for this and that">      <header text="Climate readings"/>      <rd setp="123" text="Air temperature" unit="°C"/>      <rd setp="234" text="Humidity" unit="%RH"/>      <rd setp="345" text="CO2" unit="ppm"/>        <header text="Settings"/>      <wr setp="567" text="Air temperature demand" unit="°C"/>    </screen>    ...  </devif>  

Each file contains lots of screens and can be up to some 10.000 lines, and we have a dozen of these files in our application.

I can still change the format to best suit our needs. So how would you go about translating this?

I've been thinking about some possible ways to handle this:

  • A separate file for each language containing the english text and translated text
  • Using an ID for each tag and use a separate file with translated text for each id
  • Placing all translations in the same file

The first solution has the problem where the english text might be translated into different messages depending on context.

The second solution makes the source file less readable (although not by much), and it does not handle translation of attributes easily.

The third solution would make the file very large and cumbersome to work with once the file has been translated into some 5-6 languages.


I would create a template file e.g. named "somename.xml.template"

<devif>    <screen id="scr1" title="[SettingsForThisCard]">      <header text="[ClimateReadings]"/>      <rd setp="123" text="[AirTemperature]" unit="°C"/>  ...  

Then you can create a bunch of ini-like files for each language containing:

SettingsForThisCard=your message in given language

Then, you can replace tags with the messages readed from the ini files. The advantage is that if there is a tag that has no translation, it is easy to detect and do not waste translation efforts. Also, it's very simple, thus may not be the best one for your specific requirements.


We use standard TMX files, that are standard XML files to hold internationalized literals. Each entry is identified by a label, which is reference all around the code. Every entry has all the possible translations and the most important part is that TMX is a standard used by translation programs and professionals.

If you already have XML files to hold your literals you can convert them by means of a XSLT stylesheet.

Here is an example of the format:

<?xml version="1.0" encoding="UTF-8"?>  <tmx>      <body>          <tu tuid="$ALARM_BARCODE_READER_COMMS">              <tuv lang="ES">                  <seg>Lector códigos de barras: No Operativo</seg>              </tuv>              <tuv lang="EN">                  <seg>Barcode reader: Not operative</seg>              </tuv>              <tuv lang="ZH">                  <seg>读卡器:通讯é"™è¯¯</seg>              </tuv>          </tu>          <tu tuid="$ALARM_BARCODE_READER_FAIL">              <tuv lang="ES">                  <seg>Lector códigos de barras: Fallo</seg>              </tuv>              <tuv lang="EN">                  <seg>Barcode Reader: Fail</seg>              </tuv>              <tuv lang="ZH">                  <seg>读卡器:故障</seg>              </tuv>          </tu>          <tu tuid="$NO_PAYMENT_MODE_AVAILABLE">              <tuv lang="ES">                  <seg>No hay sistemas de pago disponibles</seg>              </tuv>              <tuv lang="EN">                  <seg>No payment systems available</seg>              </tuv>              <tuv lang="ZH">                  <seg>读卡器:故障</seg>              </tuv>          </tu>                 <tu tuid="$ALARM_BLACKLIST_CARD">              <tuv lang="ES">                  <seg>Tarjeta de pago en lista negra</seg>              </tuv>              <tuv lang="EN">                  <seg>Payment card in blacklist</seg>              </tuv>              <tuv lang="ZH">                  <seg>付费的IC卡是é»'名单卡</seg>              </tuv>          </tu>     </body>  </tmx>  


Why not use XML entities to define terms that have to be localized and them have separate DTDs for each language? This is essentially the approach used by Firefox XUL, for instance: Localization on MDN.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »