February 21, 2006

good i18n practices really are good

an i18n-related issue popped up on the cfeclipse list yesterday that reinforced (at least to me) that good i18n practices really are good. a user had their eclipse encoding setup as UTF-8 yet was getting their unicode coldfusion pages garbaged. my first look at this used code from our existing codebase and of course it worked. for the life of me, well for 2-3 hours anyway, i couldn't see how this was going wrong. it wasn't until i whipped up a simple dummy page that just had unicode text and nothing else that i was able to see the problem. the issue is simple but clearly illustrates a good i18n practice.

eclipse (not cfeclipse) doesn't add a BOM to UTF-8 encoded files. why? well
  • the BOM isn't actually required as part of the definition of UTF-8 (and i know of plenty of s/w that either doesn't write one out or in fact strips them from files)
  • in the past (i think) the java compiler wouldn't compile a file w/a BOM & since that's what eclipse was originally meant for, NOT having a BOM makes perfect sense (from a very a quick test i just ran it seems this is no longer true, at least from within eclipse)

so why was our cfeclipse-edited UTF-8 encoded code working? because we follow our own good i18n practices and liberally use encoding hinting starting with the cfprocessingdirective. each of our coldfusion pages starts with:

<cfprocessingdirective pageencoding="utf-8">


BOM or no BOM, this ensures your code will be always be interpreted as UTF-8. for more good i18n practices grab a copy of the advanced coldfusion book.

see? good i18n practices really are good.

0 Comments:

Post a Comment

<< Home