Pseudo-translations: Part 1
by Ian Henderson, CEO, Rubric
Creating a successful software product for domestic markets does not guarantee success in adapting the same product for international markets. Programmers who cut code, but who have not been exposed to internationalization (I18n) issues, risk creating problems that will arise during localization (L10n) efforts.
Waiting for the problems to be addressed after translation is the wrong approach to internationalization, for two primary reasons:
First, the cost of retrofitting a product for internationalization is huge, particularly if the product is being localized into multiple languages.
Second, your time-line is elongated, slowing your time-to-market.
Generally speaking, your engineering team is busy fixing bugs in a core product right up till the release date. Adding effort to fix localization bugs for multiple languages puts an unnecessary burden on your engineering team when they are already stressed.
Internationalization testing was developed for just this reason. In broad terms, there are two general approaches which for such testing:
- Full internationalization, where your software code is inspected line by line for potential problems.
- Pseudo-translation, where a mock translation of the software is performed, and the software tested for problems.
Neither approach can be said to be better than the other. One is thorough but slow, and the other is fast but useful mainly for broad validation. In fact, a combination of the two makes sense for many software companies – using a quick and inexpensive pseudo-translation test to determine if a more extensive – and expensive – full internationalization is required. Regardless of the approaches, these exercises should be executed well in advance of the release date to provide ample time for re-coding where necessary.
The purpose of pseudo-translation is to alter the source code semi-automatically in order to identify internationalization problems (this is especially useful in User Interface, or “UI” components, but has uses in other parts of a software product as well). The automation aspect of pseudo-translation can usually be done quickly and inexpensively. The low cost of pseudo-translation testing is significantly less than the cost of discovering and correcting mistakes during localization, or worse yet, during field acceptance testing.
There are really two steps in pseudo-translating source code:
- Identify what is to be translated (i.e. parsing)
- Modify the text which has been identified as translatable
Of course, merely pseudo-translating source files is insufficient to uncover internationalization issues. To complete the process you need to add two additional steps:
- Build the pseudo-translated product
- Test the pseudo-translated product
Parsing is the most difficult step in the procedure, though the actual pseudo-translation itself is fairly easy. Pseudo-translated source file usability is thus restricted by the number of file parsers that are supported by pseudo-translation tools. For example, standard UI file formats such as RC or DLG are supported by many commercial tools. However, if your software is composed using non-standard software file formats, you will have few – if any – commercially available tools to aid in pseudo-translation parsing.
Rubric believes that pseudo-translation is an indispensable part of localization, and a very profitable processes for our clients. Because of this, we have made non-standard file format parsing and pseudo-localizing part of our practice. Rubric deals with many different non-standard file formats and we write our own file parsers when no commercial tools are available for a particular project.
Once your source files are properly parsed, there are different approaches to the pseudo-translation itself. Some of which are more useful than others as we will illustrate.
Pig Latin: This approach to pseudo-translations involves moving one or more letters from the beginning of a word to the end and then adding a few letters. Even though the source text is modified, it is quite hard to read. Take a look at http://www.google.com/intl/xx-piglatin/ for a very visual example of this type of pseudo-translation before you decide to adopt this approach. Pig-Latin pseudo-translations also suffer from not fully exercising local character sets which, in the long run, are essential. Consider this approach as a first-pass test to isolate fundamental coding mistakes such as buffer sizing.
Machine translation: Conceptually, this ought to be the best approach to pseudo-translation because you get a “close” translation using native character sets, but without the extra expense of native/human translation services. However, there are several problems with this approach:
- Cost and accuracy
Let us examine the cost/accuracy point first. Free machine translations usually ignore context (e.g., The English word “File” in a UI menu is automatically translated as “Akte” in German. This may be an accurate translation when dealing with legal materials, but in a Windows UI the word “File” should be translated as “Datei” in German.) But, since the purpose of pseudo-translation is to identify coding mistakes, these types of exceptions are often insignificant.
Completeness is also a problem for machine translation. Let us take the common UI word “OK”. This translates into “OK” in German. The translation is correct, but any associated coding errors around internationalization will go undetected. This may be acceptable if you only localize to German. But, if you add Spanish to your list of target markets, the translation of “OK” is “Aceptar”, and this significantly changes your test results.
The third problem with machine translation for testing purposes is that UI navigation can be difficult, particularly when dealing with options that are sorted. You may wonder why your French photo editing software makes grass red instead of green, until you realize that sorted colors blue/green/red are in a different order in French – bleu (blue) / rouge (red) / vert (green). Machine translation has its uses, but pseudo-translation is not the best one.
In the next edition of the Rubric Newsletter, we will examine simple text modifications, character mapping, and the pseudo-testing environment.