December 19, 2003

new i18n stuff on w3c

new should read i18n content on the w3c site: once again, i'd like to recommend the w3c internationalization activity website to i18n folks. well worth a bookmark.

December 17, 2003

OT: joel on software on biculturalism

i really enjoy reading joel on software articles and since no one seems to have mentioned this one, i thought i talk it up a bit. this week's is on biculturalism (its actually a review of a book by eric "i-love-a-good-controversy" raymond) and while it deals with two programming cultures (unix and windows) it does serve as a very good reminder about the importance of trying to understand "cultures" other than one's own. i've lived in bangkok for more than 20 years and have had my nose rubbed in thailand's culture all the while, my east coast american cultural skin has been rubbed clean off in places. perhaps because of this i don't find it all that hard to get into other cultures while i develop cf applications. its "normal" for me to make an app come out of the gate i18n. but you certainly don't have to live 20 years in some place other than your hometown to get an i18n mindset, all have you to do is not be so "provinicial" and recognize that there's other places in the world than what you have in your frontyard--there are also a lot of messy technical details but that's another story. in any case, the article's a good read, my ranting aside.

December 15, 2003

OT: know where you are

ben forta's recent blog concerning GPS devices and an article on the internet search engine db site got me to thinking about how few websites bother to geographically locate themselves. in fact, many don't even bother with a simple city, state/province, country text blurb. this is bad practice at so many levels, i don't know whether to whack the webmaster with mike hotek's big rubber mallet or swiftly apply my foot to their backside. locating your website is important for a number of reasons, chief among them is that you're actually some "place". being at a known physical location gives people a sense of your business being "real" and therefore perhaps trustworthy. beyond the pop psychology, the wireless era we seem to be entering demands a location. somebody firing up their bluetooth GPS/pda device and looking for the nearest microbrewery might be a bit put out if none turned up, even though they might standing right in front of one. all the locational technology in the world is so much snake-oil without good geographic data to feed it. so get yourself located. how hard is it these days to find out your geographic location? not very, you can use the resources on the geoURL project or the geotags project. failing these, drop me a line with your city, state/province, country and i'll see what i can do (yes, i feel that strongly about this). as the saying goes: know where you are. be where you're at.

December 13, 2003

supported encodings

while this is sort of an older bit of information, a few encoding issues recently popped up in the forums, so i guess it bears repeating once again. the latest sun jre (1.4.2) that ships with mx default installs only a few encodings (latin-1, latin-9, greek, eastern european, cyrllic, unicode, etc). no arabic, hebrew, asian, etc. languages. for that you need to do a custom install. if you try to use an encoding from the custom (or international) set with a default install, you will see "UnsupportedEncodingException" errors. so if you want to use codepage encodings beyond the default installs you will need to do an international install (unless of course the installer recognizes these locales on your server during setup). you can read more about this here. so now you know (again).

December 10, 2003

ICU4J RuleBasedNumberFormat spellOut

one of the other bumps we encountered during the move from cf5 to mx for our municipal info system was a c++ cfx tag that spelled out numbers for reporting, reciepts, etc. that started going nutso with values greater than 65 million when we moved to mx (not to brag much but since we started working with this particular municipality its annual tax base has increased from 28 million baht to over 67 million baht). not sure if this was a side effect of the multi-byte cfxNeo.dll we introduced or that the original c++ code used datatypes that fell over after 64 million (we're still hunting for that code) but since we're porting to java based i18n functionality anyway i thought i'd see if there were any "stock" spellout methods around. once again, ibm's icu4j comes through. its com.ibm.icu.text.RuleBasedNumberFormat class has a nifty format method with spellout rulesets for some locales (in this case we're only interested in thai but there are others available in the class). once i slapped a wrapper class around it's format method it was good to go. you can see it in action on this testbed. i'll make it and the wrapper class available once i get currency formatting setup and tested as well as figure out how to add other locale's rulesets (as well as get other rulesets' data, for instance i'd really like to see arabic locales rulesets'). one bone i have to pick w/mx's java support is the constant need to write wrapper classes to handle (dumb down) various format() methods. it makes distributing and maintaining some i18n CFCs more of a pain than need be. i was hoping some java guru might explain the whys and the wherefores, any takers?

December 08, 2003

java resource bundles

while i've been using UTF-8 based resource bundles for some time now, larger, more complex projects really need tools like IBM's rbManager to help manage resource bundle creation/translation. the problem with these are that their text messages are stored as ANSI escaped chars: Go=\u0E44\u0E1B (in thai, ไป). this requires quite a bit of extra cf processing to parse these types of "pure" java resource bundles (rb). i've been trying off and on for some months now to make use of the underlying java resourceBundle classes to handle rb files but haven't had much success (mainly because java expects rb files in class paths and thats not something i can live with on some projects nor could i find a simple workaround). while staring at some limestone rocks on saturday i had a micro epiphany about java.util.PropertyResourceBundle class. this class handles rb files from an input stream (ie you can pump in the rb file content from anyplace on the server). badda bing (i actually thought that at the time ;-) here's some test code i whipped up: <cfscript> thisDir= GetDirectoryFromPath(expandpath("*.*")); rbFile=thisDir & "test_th_TH.properties"; rb = createObject("java", "java.util.PropertyResourceBundle"); fis = CreateObject("java", "java.io.FileInputStream"); fis.init(rbFile); rb.init(fis); keys=rb.getKeys(); writeoutput("resourceBundle = #rbFile#
"); while (keys.hasMoreElements()) { thisKEY=keys.nextElement(); thisMSG=rb.handleGetObject(thisKey); writeoutput("#thisKEY# = #thisMSG#
"); } <cfscript> as you can see its quite simple, so simple i built it into a javaRB.cfc. you can see it in action here. limestone rocks, who would have thought?

lest we forget: cfxNeo.dll

i've spent the better part of the last week upgrading one of my municipal sites from cf5 to mx. some of the core bits of our municipal information system app were written in cf3 (we're slowly doing a complete re-write to port the whole mess to mx, though i'm still amazed at how fast the cf3 core ran & how little needed changing as we slid in newer versions of cf). i18n was a dim glimmer in my eye when this was written originally, we had to fight thru very basic things like thai date formatting, thai collation (sql server 6.5 never had a clue about thai collation), etc. we overcame these using c++ cfx tags (written by a brilliant c++ programmer who worked with me at the time, chatchawan boonraksa). these cfx tags were trouble free thru several cf upgrade cycles until mx. with the move to mx, these tags started producing garbage instead of thai language dates, transliteration, etc. this was a real head scratcher (were i had the poor municipality staff dance an irish jig in hopes of some insight ;-) mainly because i had this idea--even though i went thru this all in agonizing detail during redsky beta--that mx redsky had included the multi-byte CFXNeo.dll file needed to make older c++ cfx tags work properly with unicode. bad assumption. mx (except for the japanese and korean versions) ships with a single-byte CFXNeo.dll that doesn't know unicode from the backside of an elephant. swapping in the multi-byte CFXNeo.dll returned everything to normal. you can see all the gory details in this technote. so if your older c++ cfx i18n tags start exhibiting strange behavior, don't panic, dance the cfxNeo.dll jig ;-) we're of course moving all this to proper java-based i18n CFCs but this should hold the fort in the meantime.