May 31, 2004

i18n good practices: resource bundles

one of the dreariest bits of i18n work is dealing with strings, especially for retro-fitting existing apps. you'll have to comb thru the existing code substituting resource bundle (rb) keys for existing strings. while regex filters, etc. help, nothing beats a pair of "mark IV eyeballs". in order to keep this task within the bounds of tolerable cruelty, there are a few simple things you might keep in mind when developing cf applications:
  • case: not ever language has case, Thai for instance doesn't, so PERMISSIONS, Permissions and permissions would be represented by the same string. in languages that do have case, those kinds of case permutations are plainly cosmetic (i was going to say cosmetic nonsense but thought better). if there's a application real need for this sort of thing, say to accent some heading, it should be handled via CSS and not hardcoded. hardcoded case strings make the difficult i18n process even more so. think twice before you get carried away with case, especially if you find yourself writing complex <cfif> blocks to handle it.
  • pluralization: not every language deals with plurals the same as English, simply adding a letter ("s" for instance) hardly ever cuts it and in some instances the language structure is completely different (the English phrase "five wood blocks" becomes something like "block of wood five units" in Thai). while you can blow off quite a few CPU cycles with complicated logic to handle plurals, i contend that item(s) is just as understandable as <cfif someQ.recordCount GT 1>items<cfelse>item</cfif> and has the added benefit of i18n simplicity. otherwise you'll have to add another set of rb keys (plural forms vs singular forms) and logic to handle pluralization.
  • compound strings: compound strings are, besides being my pet peeve, strings that contain substituted values. for example, "You owe me #dollarFormat(amountDue)#. Please pay by #dateFormat(normalDueDate)# or I will be forced to shoot you with #numberFormat(budgetQ.bulletsPerDeadbeat)# bullets. Thank you." if you do much i18n research you'll often see folks recommending you avoid compound strings like the plague (for instance, the API for the messageFormat java class comes right and says this). why? because they're hard to handle. first you have to figure out the logic and in some cases its not going to be trivial. then you have to rework the rb string to use place holders for localization ("You owe me {1}. Please pay by {2} or I will be forced to shoot you with {3} bullets. Thank you.") . finally you have to substitute the intended values at runtime--newer versions of my javaRB and RBjava CFC have methods for this. its often much easier to simply rewrite the compound string.
  • floating prepositions: these are perhaps a form of compound string but often can't be handled like them. i sometimes encounter extremely complicated output logic/displays or HTML form elements separated by a preposition (usually "at", "by" or "in"). in its simplest form it might be "dateValue at timeValue" (which actually can be handled as a compound string) but more often then not it's much more complicated. if i can get my way, we normally send floating prepositions to the garbage dump, i mean most folks would have no problem understanding "dateValue timeValue".
i suppose many folks might find this trivial but it adds time and complexity to an already time-consuming and complicated process.

0 Comments:

Post a Comment

<< Home