Subscribe

Poetic licence

When coding strays into pidgin territory, hard-coded language rules will corrupt the text.

Ian Henderson
By Ian Henderson, chairman and CTO, is co-founder of Rubric.
Johannesburg, 10 Apr 2015

Software engineers take pride in their problem-solving capabilities. Often, this comes to good use as they find ways to perform certain tasks more efficiently, for example.

However, sometimes coders go too far and stray outside of their area of specialisation. One transgression, often encountered by language service providers (LSPs), is an attempt by developers to 'codify' language.

Software and Web site translation projects have suffered greatly because of developers that encode a very limited understanding of a grammar rule, in the interests of automating what appears to them to be a repeat task.

Developers are neither linguists nor localisation experts. As non-experts, they are 'tone deaf' to the myriad subtle word variations and exceptions in language (eg, plural, possessive, adjectival and many other forms).

But the real problem with reducing language to a set of basic rules is those rules can only ever be language-specific. If the content that is being converted into code is destined for translation into one or more languages, the hard-coded rule will corrupt the text.

If, for example, the developer was to hard-code that all plural nouns take the 's' suffix, it can make life exceedingly difficult for translators, who must correct their mistakes, as well as the entire software project and the business as a whole. This can have ruinous consequences.

Have a look at some examples:

Plurals
Hard-coding the above mentioned 's' rule for pluralising nouns is not even 100% fail-safe in English (eg, we say 'houses', but 'mice'). In other languages, like Arabic, which has single, dual and multiple forms, adding an 's' in all cases is laughably simplistic.

Qualifiers
If a Web site offers custom online ordering (eg, 'choose any of 100 items in any of 10 colours'), there are 1 000 possible variations. To write code that displays all the possible choices, the developer must treat each colour and item as a variable in its own right and write out all the combinations.

But developers are proud - they will always try to save the company time and money, and shield laypersons (like translators) from software complexity. They might choose to write just 100 software strings - one for each item variable (eg, 'car'), which can be paired with any one of the 10 colours, which are treated as constants. After all, red is red, right?

When an online buyer makes menu-based choices in English, and pairs 'car' with 'red', he or she is presented with an order confirmation screen stating 'red car'.

In German, the screen will read 'rot auto'. And this is a problem - in fact, to any German-speaking customer, it is a painfully obvious error that reflects poorly on the Web site owner.

They are 'tone deaf' to the myriad subtle word variations and exceptions in language.

By way of explanation, German nouns have genders, and this influences the form of the adjective too, which should in this case be 'rotes'. Preceding 'towel', 'rot' would again have a different form and spelling.

Consequences, exceptions and fixes

These are just two very broad examples; many others exist. The only rule that can be crystallised from all of it is that, in all cases, it is a bad idea to code word variables.

How does one correct code that erred in this way? The client will have to fix it, but if the software is too far gone, a software project reset may be on the cards.

Exceptions are possible if clients fully appreciate what it is they're attempting, and are willing to spend time and resources on incorporating advanced grammatical rules for various languages into the code.

If the LSP has software engineering experience (many make use of computer-aided translation) this can help, but this tends to compound the complexity of projects to the point where there can be many multiple variations for individual words across the source language and target languages, catering for different language modalities, declensions, conjugations, etc.

As a consequence, even companies that knowingly cater for numerous linguistic variations to attain nuanced computer-aided translations eventually call it quits, acknowledging what all translators know - languages are poetic and abide by different rules from one another; they don't strictly conform to standard, universal rules.

Sophisticated speech

It is generally easier and more cost-effective to write out all variables, and to leave the word manipulation to translators.

The truth is there are tech-savvy translators out there. For example, there is quality assurance technology that synthesises software translations and emulates how it displays on target devices (eg, a PC browser or mobile app). This frequently shows up problems in the code.

The irony is it has been developed to fix the coding glitches of experienced programmers; these errors were committed in the first place because the programmers did not realise the level of software engineering sophistication resident within some translation firms.

Share