February 21, 2006

good i18n practices really are good

an i18n-related issue popped up on the cfeclipse list yesterday that reinforced (at least to me) that good i18n practices really are good. a user had their eclipse encoding setup as UTF-8 yet was getting their unicode coldfusion pages garbaged. my first look at this used code from our existing codebase and of course it worked. for the life of me, well for 2-3 hours anyway, i couldn't see how this was going wrong. it wasn't until i whipped up a simple dummy page that just had unicode text and nothing else that i was able to see the problem. the issue is simple but clearly illustrates a good i18n practice.

eclipse (not cfeclipse) doesn't add a BOM to UTF-8 encoded files. why? well
  • the BOM isn't actually required as part of the definition of UTF-8 (and i know of plenty of s/w that either doesn't write one out or in fact strips them from files)
  • in the past (i think) the java compiler wouldn't compile a file w/a BOM & since that's what eclipse was originally meant for, NOT having a BOM makes perfect sense (from a very a quick test i just ran it seems this is no longer true, at least from within eclipse)

so why was our cfeclipse-edited UTF-8 encoded code working? because we follow our own good i18n practices and liberally use encoding hinting starting with the cfprocessingdirective. each of our coldfusion pages starts with:

<cfprocessingdirective pageencoding="utf-8">


BOM or no BOM, this ensures your code will be always be interpreted as UTF-8. for more good i18n practices grab a copy of the advanced coldfusion book.

see? good i18n practices really are good.

February 20, 2006

BIG numbers in coldfusion

mark kruger has an interesting post on his blog concerning formatting big numbers in coldfusion. in that post's comments sean corfield points out that the real issue is the precision of the float datatype that coldfusion uses under the covers. this issue has also come up a few times on the support forums and probably the best answer is (as usual) to dip down into the java underlying coldfusion to use one of the Big math classes (java.math.BigInteger, java.math.BigDecimal) to handle the math on the special occasions that you really need that kind of precision. as sean pointed out, float is faster than BigDecimal for calculations so you should use those classes only when they are really needed.

what has this got to do with g11n? well even if you do use those Big math classes, core java's NumberFormat class doesn't understand it's own BigDecimal/BigInteger classes (ie it casts everything back to double/long). so when you come to display these values you're back in the same situation that mark's post describes. what to do? use icu4j of course (everybody knew that was coming). it's NumberFormat class understands BigDecimal/BigInteger plenty fine. for example:

<cfscript>
theNumber="9123456789123456789.123";
//use server default locale nF=createObject("java","com.ibm.icu.text.NumberFormat").getInstance();
cNF=createObject("java","java.text.NumberFormat").getInstance();
bigDecimal=createObject("java","java.math.BigDecimal").init(theNumber);
formattedNumber=nf.format(bigDecimal);
coreJavaFormattedNumber=cNF.format(bigDecimal);
writeoutput("original number:=#theNumber#<br>
   big decimal representation:=#bigDecimal#<br>
   icu4j number Formatted:=#formattedNumber#<br>
   core java number Formatted:=#coreJavaFormattedNumber#"
);
</cfscript>


which outputs:

original number:=9123456789123456789.123
big decimal representation:=9123456789123456789.123
icu4j number Formatted:=9,123,456,789,123,456,789.123
core java number Formatted:=9,123,456,789,123,457,000


i really wish coldfusion would use icu4j. it would make i18n work much easier and as a side effect help w/problems like this.

February 18, 2006

unicode font madness

ever needed a font to handle Berber language? or Khmer? while i most often use the massive Arial Unicode MS for our i18n work there are some rare occasions where it doesn't contain the glyphs we need. and other occasions where i simply like the way a font looks (like Tifinagh abjad used to write Berber).

well, look no further. the Unicode Font Guide For Free/Libre Open Source Operating Systems has put together a super cool collection of free/cheap fonts covering pretty much every language in the world. the content is organized regionally which to me makes a boat load of sense.

the main site also has some excellent font/web related resources including an XHTML and CSS guide for middle school students (which i dare say some cf developers, like me for example, could make good use of).

this is another excellent i18n resource to add to your bookmarks.

just for fun, below is an example of Tifinagh abjad. tell me this doesn't look so cool, there's something almost alien about it.

Tifinagh abjad

February 16, 2006

i seem to keep missing these....the super cool icu4j lib was updated 20-jan-2006 to version 3.4.2. it contains a few bug fixes (Chinese date format/calendar, currency rounding bug for de_CH locale, etc.) but the biggest deal is that this release dumps the dependency on core java timezone data. while i normally use core java's timezone classes this puppy has several methods that i find pretty cool. for instance, one of the biggest headaches w/using timezone data is that there is just so darned many of them. filtering these down into something reasonable often results in some compromises that always leave me feeling like we're missing something. now we can do filtering that at least looks more reasonable, say like using a user's country:

<cfscript>
tz=createObject("java","com.ibm.icu.util.TimeZone");
//get TZ based on country
zones=tz.getAvailableIDs("TH");
</cfscript>

<cfdump var="#zones#">


how cool is that?

February 14, 2006

more on encoding

encoding issues just never seem to end. after another week's worth of helping folks slog thru their encoding problems, i recalled that sun has recently published a pretty decent article on their SDN about encoding (even includes a nice mojibake example).

and while it's mainly java/jsp it's worth the read for us cf folks.