January 31, 2004

ICU4J Version 2.8 released

IBM has released the latest version of its excellent ICU4J java lib. new good stuff includes:
  • historical timezones: "where daylight savings time rules or other related data have changed after the date in question". cool.
  • updated locales and more locale methods (to access stuff like paper page sizes, measurement systems, etc.). cool.
  • improved sorting (now does proper Thai Royal Dictionary order). way cool for me ;-)
  • XLIFF conversion tool (in case you're developing your own resource data)
  • a how-to for using eclipse with ICU4J
  • bug fixes, performance improvements, etc.
its available from this page.

January 30, 2004

YAIP (yet another internet patent): geoLocation

quova, inc. has announced that it has secured a US patent related to geoLocation. in case you don't already know, geoLocation is a technology that determines the geographic location of website visitors basically by doing a db lookup of a visitor's IP (data usually based on WHOIS, etc.). i'm not a lawyer and don't know how this will play out but it smells like it could spell the end of my freebie geoLocatorCFC and others like it: oh boy.

January 25, 2004

it seems i cannot add or subtract

if you're using the timeZone CFC, please pick up a new version here. it seems i've managed to swap the cast to/from UTC offsets (added when i should have subtracted) and so botched the cast to/from timezones methods. sorry.

January 22, 2004

OT: geography matters

interesting story making the rounds today. it seems the UK's best selling hiking magazine printed a trail guide that would lead hikers off a cliff. given the weather conditions and terrain there, yup walking off that mountain seems easy enough and yup, might hurt a bit. that kind of mistake is sort of understandable--spatial data is quite complex, i'd say there's no data as complicated. the magazine's editor says they print 200 a year (as if that was a lot) and this was their first mistake (i bet it wasn't). in comparison NIMA (now the NGA) produces 1000's of maps. makes you wonder. looking at it from the opposite (geographic data collecting) end, i was recently re-reading some of the journals of the first systematic geographic and geologic surveys of Thailand, carried out 50-100 years ago, literally from the backs of elephants. people died collecting that spatial information. drowning. snake bites. tigers. falls. tough way to make a living. so next time you pick up a map, you might give some thought to what it took to produce it.

January 20, 2004

OT: updated BoL released

ms has released an updated BoL (books on line). you can download it here. if you're using sql server and don't know what the BoL is, i suggest you find out fast.

January 16, 2004

resourceBundle gotcha

continuing in the same week long obsession with resource bundles, i thought i'd point out a potential "gotcha" concerning the java flavored resourceBundles before it induced any psychotic episodes--cf folks used to dealing with structures (or those people using the cf-based UTF-8 resourceBundles) might be particularly susceptible to this. using cf structures you could always build a key value pair like (i don't think its such a hot idea but you could): montyPython=structNew(); montyPython["ministry of silly walks"]="too funny for words"; as long as you referenced the montyPython structure using this sort of syntax montyPython["ministry of silly walks"] all was well with the world. you could just as easily use this style in cf-based UTF-8 resourceBundles (again not a good idea but you could): ministry of silly walks=too funny for words because the resourceBundle CFC (or whatever you're using but should behave similarly) would simply parse this as a list delimited with an "=", stuffing the left side into a structure as a key with the right side as that key's value. fine and dandy but this won't cut it with java flavored resourceBundles. "why" you ask? because java resourceBundles' keys are defined (according to the java.util.Properties API) as: "The key consists of all the characters in the line starting with the first non-whitespace character and up to, but not including, the first ASCII =, :, or whitespace character." so "ministry of silly walks=too funny for words" would be equivalent to "ministry=of" when parsed by either of the two java resourceBundle classes i've been going on about lately. and that of course might cause a bit of head scratching and finger pointing.... so now you know.

January 08, 2004

cf resource bundle flavors

last week (02-jan-04) i droned on about the three types of resourceBundle (rb) methods that can be used in cfmx. this week i'd thought i'd flap my lips about the two flavors of resourceBundle files used with these three methods. let's deal with the simplest one first (for use with resourceBundleCFC). its nothing more than a utf-8 encoded text file of key/value pairs. something like the following: englishFive=5 thaiFive = ๕ (you will need a thai or unicode font to read this) these types of rb files can be easily created using notepad (yes notepad), dreamweaver, or any sort of text editor capable of producing utf-8 encoded files (unfortunately not cfstudio, in case you were wondering). as you can see, these are human readable. this flavor of rb files are easily and directly accessible by cf. the downside to all this goodness is that it can spiral out-of-control with large, complex rb files covering many locales (languages). the other rb flavor is based on java style rb files (because it makes use of java resourceBundle or PropertyResourceBundle classes) and similarly consists of key/value pairs in a text file but the "value" text is ASCII escaped unicode (\uXXXX where XXXX is the unicode code point expressed as a hexadecimal value). for instance: loatianFive=\u0ED5 bengaliFive=\u09EB thaiFive=\u0E55 the javaRB CFC can handle this type of rb file. creating these types of files is a bit more complicated (unless you are one of those very rare individuals who have the whole of the unicode in your head) and is usually handled by external tools such as the command line native2ascii supplied with normal java installs (in the bin dir) or the nifty rbManager tool from IBM. recent experience tells me that this might be a concept some folks will have trouble understanding so here's a snippet that actually builds and reads this flavor of rb file (its part of the guts of an rbManager cf clone i've been building off and on): <cfscript> // set up some constants thaiFive=chr(3669); tibetianFive=chr(3877); loatianFive=chr(3797); tamilFive=chr(3051); bengaliFive=chr(2539); arabicFive=chr(1637); malayamFive=chr(3432); // java objects prop=createObject("java","java.util.Properties"); fos = CreateObject("java", "java.io.FileOutputStream"); fis = CreateObject("java", "java.io.FileInputStream"); // resourceBundle rbFile=getDirectoryFromPath(expandpath("*.*")) & "test.properties"; // build test property file (as a basis for resourceBundle) fos.init(rbFile); prop.setProperty("thaiFive","#thaiFive#"); prop.setProperty("loatianFive","#loatianFive#"); prop.setProperty("tibetianFive","#tibetianFive#"); prop.setProperty("tamilFive","#tamilFive#"); prop.setProperty("bengaliFive","#bengaliFive#"); prop.setProperty("arabicFive","#arabicFive#"); prop.setProperty("malayamFive","#malayamFive#"); prop.store(fos,"test: brought to you by the number five"); fos.close(); // done close output file //get property file & dump keys fis.init(rbFile); prop.load(fis); fis.close(); // done close input file keys=prop.propertyNames(); writeoutput('<font face="Arial Unicode MS">'); while (keys.hasMoreElements()) { thisKEY=keys.nextElement(); thisMSG=prop.getProperty(thisKey); writeoutput("#thisKEY# = #thisMSG#<br>"); } writeoutput("</font>"); </cfscript> the rb file produced by this snippet would be something like (note that its a bunch of locales jumbled together, absolutely NOT what you'd do in production but you get the idea) : #test: brought to you by the number five #Thu Jan 01 19:04:53 GMT+07:00 2004 malayamFive=\u0D68 loatianFive=\u0ED5 bengaliFive=\u09EB thaiFive=\u0E55 arabicFive=\u0665 tibetianFive=\u0F25 tamilFive=\u0BEB output would be something along these lines (again you'll need some unicode capable font to properly read these): malayamFive = ൨ loatianFive = ໕ thaiFive = ๕ bengaliFive = ৫ arabicFive = ٥ tibetianFive = ༥ tamilFive = ௫ so now you know.

new multilingual web application article on sun site

ok so it is a JSP article (just pretend all the tag based code is actually cf ;-) but the article does contain a boatload of content that applies to g11n cf apps equally well. makes good reading.

January 02, 2004

quick review of resource bundles methods for cf

first let me dispatch the notion of using cf code in lieu of resourceBundles (rb). its a bad habit that might work with very small files for a couple of languages but will eventually breakdown as your g11n apps become more complex and cover more and more languages (locales). so if you're just beginning g11n work, don't start with this method no matter how tempting it looks. and if you're already using this approach, quit while you're ahead. mingling code and text like that is just a bad idea. last year (well last week) i was mildly berated by some java folks for suggesting the use of either utf-8 based cf "resourceBundles" or the PropertyResourceBundle java class instead of the more typical ResourceBundle. oh the shame, but from a cf prespective though, those java folks were just being sort of snobbish. depending on your cf app needs it seems acceptable to me to use either rb method. below you'll find a quick and dirty comparison between the two less "normal" methods and the more traditional java method. each has their good and bad points however for me the biggest negative associated with using the "pure" java ResourceBundle approach is it's requirement that rb always be in a classpath. thats a show stopper for many shared hosts. though it won't stop me from releasing an rb CFC using that style ;-) CFMX UTF-8 pros - human readable - easy to manage (notepad, etc.) - simple to implement in MX - quite fast cons - complex rb quickly become hard to manage - can't easily use standard rb tools java ResourceBundle pros - pure standard java rb solution - handles rb from standard tools - self determines rb for locale - handles complex rb quite easily cons - not human readable - requires rb be somewhere in classpath - requires createObject permission - some overhead in using java object java PropertyResourceBundle pros - rb can be anywhere - pure standard java rb solution - handles rb from standard tools - handles complex rb quite easily cons - not human readable - requires caller to determine rb from locale - requires createObject permission - some overhead in using java object i'd appreciate any feedback on this.

January 01, 2004

unicode compression

unicode is good. unicode is great. it's by far the best choice for g11n work (i "bite my thumb" at codepage encodings). but like everything else in the real world it has its seamier side. in order to encode all the world's scripts unicode must often use more than 1 or even 2 bytes for many characters. ASCII only folks with an occasional need for non-ASCII characters wouldn't give this a second thought, the rest of the world (especially CJK folks) however aren't so fortunate. "unicode bloat" is a sad but true fact of g11n life. unicode compression is therefore a fairly interesting topic to many unicode users and developers. doug ewell has posted a pretty understandable article on unicode compression. the article provides a nice background to unicode and compression. it's a good read and well worth the time. and by now i hope everyone's had a safe and happy new year's eve celebration. for those of us already past that and wondering "what the heck went on", i refer you to the jean luc-ponty tune (composed and arranged by frank zappa) and ask the question "how would you like to have a head like that".