March 29, 2006

heads up: timezone CFC updated

well, the icu4j versions were anyway. dan switzer seems to have turned up a problem with the icu4j version that i also encountered over the weekend. the icu4j version extended the core java version by simply substituting com.ibm.icu.util.TimeZone for the core java TimeZone class. unfortunately if you didn't explicitly pass in a timezone (tz), you were supposed to get the server's tz. however icu4j differs in the way this is done:
core java:
default="#tzObj.getDefault().ID#"
icu4j:
default="#variables.timeZone.getDefault().getDisplayName()#"


the tz that the core java default method was returning wasn't understood by icu4j but it didn't throw an error but silently returned the UTC tz instead. whoops.

you can pick up the new version here.

March 27, 2006

stealth seems to be icu4j's middle name

once again, icu4j has quietly slipped out another stealthy upgrade to version 3.4.4. this update fixes "crashing bugs in the data". i'm not really sure how critical this update is but better safe than sorry.

March 26, 2006

Australian DST change: a day late and a dollar short?

while i should have been more than vaguely aware of this issue, it seems even Sun was laying down on the job a bit. Australia observes DST (Daylight Saving Time or Summer Time as they say down under) just like the US and other countries. DST in Australia normally ends March 26, 2:59AM (local time). however this year, to accomodate the Commonwealth games, the DST end date was pushed back to April 2. most older JRE's (like the version that coldfusion runs on, even the updated JRE that the flex/coldfusion connector "installs") still run off the older Olsen data with Australian DST ending March 26. on March 25th i got an email from the Sun Developer Network pointing at this article about the issue including links to updated JREs. talk about cutting it close.

icu4j on the other hand, has had this and other updated timezone info for some time now.

March 11, 2006

javaRB/RBjava CFCs updated

i've added the new messageFormat method to the existing CFCs and re-arranged the versions a bit. you can download this tool here with a simple testbed here and a testbed for the new messageFormat method here.

there are now six versions of the resource bundle (rb) tool, the three major versions include:
  • coreJava: if you don't need other calendars, locales, etc. offered by IBM's ICU4J library. this version uses core Java's Locale and MessageFormat classes. It will operate on any coldfusion host that permits createObject().
  • icu4j: Requires the installation of IBM's ICU4J java library which can be obtained here. this version uses the library installed on cf's classpath. it makes use of ICU4J's ULocale, UResourceBundle, and MessageFormat classes. this allows for more locales than are supported by core Java as well as additional locale "keywords" such as calendar, currency and collation (for example, th_TH@calendar=buddhist).
  • remoteICU4J: also requires the installation of IBM's ICU4J java library. this version uses a slightly modified "remote" classpath technique for installations where you don't have access to the classpath. you will need to specify the full path to a copy of the icu4j.jar file.

In each of these versions you will find two CFCs:
  • javaRB which handles rb files that aren't on coldfusion's classpath (usually deployed on shared hosts)
  • rbJava which uses rb files that are on coldfusion's classpath, this is usually the more robust form of this tool

You will also find:
  • javaRB.cfm a simple testbed for the javaRB CFC
  • rbJava.cfm a simple testbed for the rbJava CFC
  • messageFormat.cfm a simple testbed demonstrating the messageFormat method
  • testJavaRB.properties base rb file
  • testJavaRB_en_US.properties en_US locale rb file
  • testJavaRB_th_TH.properties th_TH locale rb file

public methods in the CFCs:
  • getResourceBundle returns a structure containing all key/messages value pairs in a given resource bundle file. required argument is rbFile containing absolute path to resource bundle file. optional argument is rbLocale to indicate which locale's resource bundle to use, defaults to us_EN (american english)
  • getRBKeys returns an array holding all keys in given resource bundle. required argument is rbFile containing absolute path to resource bundle file. optional argument is rbLocale to indicate which locale's resource bundle to use, defaults to us_EN (american english)
  • getRBString returns string containing the text for a given key in a given resource bundle. required arguments are rbFile containing absolute path to resource bundle file and rbKey a string holding the required key. optional argument is rbLocale to indicate which locale's resource bundle to use, defaults to us_EN (american english)
  • formatRBString returns string w/dynamic values substituted. performs messageFormat like operation on compound rb string: "You owe me {1}. Please pay by {2} or I will be forced to shoot you with {3} bullets." this function will replace the place holders {1}, etc. with values from the passed in array (or a single value, if that's all there are). required arguments are rbString, the string containing the placeholders, and substitute. Values either an array or a single value containing the values to be substituted. note that the values are substituted sequentially, all {1} placeholders will be substituted using the first element in substitute. Values, {2} with the second, etc. DEPRECATED. only retained for backwards compatibility. please use messageFormat method instead
  • messageFormat returns string w/dynamic values substituted. performs MessageFormat operation on compound rb string. required arguments: pattern string to use as pattern for formatting, args array of "objects" to use as substitution values. optional argument is locale, java style locale ID, "th_TH", default is "en_US". for details about format options please see http://java.sun.com/j2se/1.4.2/docs/api/java/text/MessageFormat.html
  • verifyPattern verifies MessageFormat pattern. required argument is pattern a string holding the MessageFormat pattern to test. returns a boolean indicating if the pattern is ok or not

In addition, the remoteICU4J CFCs also have another public method:
  • getAvailableLocales returns an array of available locales. note that this method is only supplied as a convenience


PS: i've finally added a license.

March 07, 2006

"remote" classpath revisited

i seem to have gotten myself into the habit of calling spike's cool "Loading java class files from a relative path" technique as the "remote classpath" technique--i guess i can blame christian cantrell for that. in any case, this technique works very well in most cases where you don't have access to a server's classpath (most shared hosts for example). where it tends not to work is, from my experience, with java classes that don't have "blind" constructors, ie where no arguments are required to initialize that class. classes like icu4j calendars, formatters, etc. usually work just fine but classes like icu4j's ULocale or MessageFormat don't as these require something to be passed to their constructors. for these classes (which are darned important to me) something like this fails:

<cfscript>
// remote init jarFile=jarLocation & "icu4j.jar";
URLObject = createObject('java','java.net.URL');
URLObject.init("file:" & jarFile);
URLArray = createObject("java","java.lang.reflect.Array").
newInstance(URLObject.getClass(),1);
arrayClass = createObject("java","java.lang.reflect.Array");
arrayClass.set(URLArray,0,URLObject);
loader = createObject("java","java.net.URLClassLoader");
loader.init(URLArray);
uLocale=loader.loadClass("com.ibm.icu.util.ULocale").newInstance();   
</cfscript>
<cfdump var="#uLocale#">


while i've managed to workaround this issue (ULocales are everywhere in icu4j, most classes that deal with locales have a getAvailableULocales() method) it's always kind of nagged at me. after a bit of poking and prodding i started looking into ways to get at the actual constructors for a given class:

// remote init
jarFile=jarLocation & "icu4j.jar";
URLObject = createObject('java','java.net.URL');
URLObject.init("file:" & jarFile);
URLArray = createObject("java","java.lang.reflect.Array").
newInstance(URLObject.getClass(),1);
arrayClass = createObject("java","java.lang.reflect.Array");
arrayClass.set(URLArray,0,URLObject);
loader = createObject("java","java.net.URLClassLoader");
loader.init(URLArray);
uLocale=loader.loadClass("com.ibm.icu.util.ULocale"); // don't init c=uLocale.getConstructors();
for (j=1; j LTE arrayLen(c); j=j+1) {
   params=c[j].getParameterTypes();
   for (i=1; i LTE arrayLen(params); i=i+1) {
      writeoutput("ULocale[#j#]: #i# #params[i].getName()#<br>");
   }
   writeoutput("<br>");
}   
</cfscript>


which in this case returned 3 constructors (just like the API says but not in the javadocs order):

ULocale[1]: 1 java.lang.String ULocale[1]: 2 java.lang.String ULocale[1]: 3 java.lang.String
ULocale[2]: 1 java.lang.String
ULocale[3]: 1 java.lang.String ULocale[3]: 2 java.lang.String
which i can easily match to the one i want (ULocale("th_TH")):

<cfscript>
// remote init jarFile=jarLocation & "icu4j.jar";
URLObject = createObject('java','java.net.URL');
URLObject.init("file:" & jarFile);
URLArray = createObject("java","java.lang.reflect.Array").
newInstance(URLObject.getClass(),1);
arrayClass = createObject("java","java.lang.reflect.Array");
arrayClass.set(URLArray,0,URLObject);
loader = createObject("java","java.net.URLClassLoader");
loader.init(URLArray);
uLocale=loader.loadClass("com.ibm.icu.util.ULocale");   
c=uLocale.getConstructors();
// the newInstance method wants an array
obj=listToArray("th_TH");
// we want the 2nd constructor
thaiLocale=c[2].newInstance(obj.toArray());
</cfscript>

<cfdump var="#thaiLocale#">


which indeed returns an object of com.ibm.icu.util.ULocale.

since in most cases, i only use one way to init a given class, this technique will work OK for us. my only question is will the order of constructors remain the same? can i always count on the 2nd constructor to be ULocale("th_TH")? or should i build metadata functionality to probe the constructors to see which one matches?

ps: i did indeed learn my lesson, notice how i passed the coldfusion array using toArray() ;-)

March 06, 2006

MessageFormat or how not to read error messages

in an earlier post i was babbling on about how neat the com.ibm.icu.text.MessageFormat class was. i was also on about how you'd need a java wrapper class to really make use it. i thought that because whenever i tried something like:

<cfscript>
ozLocale="en_AU@calendar=gregorian";
thisPattern="On {0,date,short} at {0,time,short}, I left {1} for the {2}. I took {3,number,currency}";
thisLocale=createObject("java","com.ibm.icu.util.ULocale").init(ozLocale);
args=arrayNew(1);
args[1]=now();
args[2]="the office";
args[3]="microbrewery";
args[4]=javacast("int",100);
mf=createObject("java","com.ibm.icu.text.MessageFormat").
init(thisPattern,thisLocale);
thisMsg=mf.format(args);
</cfscript>

<cfdump var="#thisMSG#">


coldfusion would always throw an error at the thisMsg=mf.format(args) bit along the lines of: Error casting an object of type to an incompatible type. This usually indicates a programming error in Java, although it could also mean you have tried to use a foreign object in a different way than it was designed. which for some reason made me think it was because the format() method is overloaded and i couldn't figure out the right combination of argument classes to get it to work. my knee jerk reaction to this is to build a wrapper class and move on, which i promptly did.

i was puttering around with something this weekend (a method to count business days using icu4j's Holiday class) when i actually got the overloaded method error (while trying to add my birthday as a national holiday in the US virgin islands, en_VI). re-visiting the format() method errors it finally dawned on me that the error message was perfectly accurate and the real issue (besides me being a knee jerk reactionist and thick as a brick) was with the args array. coldfusion arrays aren't exactly java Arrays (if i recall correctly they're java.util.Vectors). back in the Triassic era, christian cantrell's blog had an entry concerning this problem where he pointed out a simple solution using the inherited toArray() method. so changing thisMsg=mf.format(args) to thisMsg=mf.format(args.toArray()) made that method work plenty fine. initial benchmarks show this java-based method to be considerably faster than our in-house one, not to mention saving all the locale formatting code we had to use prior to substituting the actual data. we'll be releasing updates to our resource bundle CFCs incorporating this new method sometime this week.

the sharp-eyed among you probably noticed the peculiar way i defined the locale en_AU@calendar=gregorian. icu4j locales (ULocales to be precise) have, besides the usual language, country, variant identifiers, keywords. keywords allow you to create a locale using a specific calendar, collation or currency (see the ICU user guide for details). in practice that means you can control the way MessageFormat formats your dates and currencies without having to mess around with them prior to submitting the data to the format() method. you can use any of the seven odd calendars that icu4j knows about, for instance en_AU@calendar=buddhist would produce dates formatted using the Buddhist calendar (BE), en_AU@calendar=islamic-civil would format dates using the civil version of the Islamic calendar, etc. very cool if you ask me. this is another area where icu4j kind of glances in the rear-view mirror as it blows by core java's i18n bits ;-)

March 02, 2006

an unstealthy icu4j upgrade

IBM has announced a maintenance release for icu4j, version 3.4.3. among the goodies for this version are:
  • Olson 2006a time zone data (just in time to get ready for the new DST in the US)
  • corrects mistakes in the CLDR data found in icu4j 3.4.2
  • MessageFormat (like core java's but it can use icu4j's super cool ULocale class) upgraded to @stable"
  • fixed bugs in DateFormat, SimpleDateFormat, etc.
  • and a bit more trivial (to me) but should make some folks happy this release no longer tags "@draft" APIs with "@deprecated" by default--though why they ever did that in the first place is a bit of a mystery to me


the MessageFormat class is kind of cool in that it handles compound rb strings (which i'd rather have never learned about) such as: "At {1} on {2}, there was {3} on planet {4}". in the past, we normally handled this with in-house methods which are somewhat cumbersome in that we needed to do any date/numeric/currency formatting on the substituted values for the message's placeholders (the bits in between the {}) prior to formatting the message. now using the com.ibm.icu.text.MessageFormat you could do something like:

<cfscript>
   mfObject=createobject("java","com.ibm.icu.text.MessageFormat");
   args=arrayNew(1);
   args[1]=now();
   args[2]="the office";
   args[3]="microbrewery";
   // pass in the message string and substitution arguments    thisMsg=mfObject.format("On {0,date,full} at {0,time,full}, I left {1} for the {2}.",args);
   writeoutput(thisMsg);
</cfscript>


which would produce something like (in the en_US locale) "On Wednesday, March 1, 2006 at 8:44:22 PM GMT+07:00, I left the office for the microbrewery.".

to explain a bit more : {0,date,full} is a placeholder that takes the first element in the args array (java arrays start at 0) and applies localized date formatting with the "full" style. {0,time,full} ditto but uses time formatting and {1} and {2} are placeholders for simple strings.

however in order to make this more flexible (ie. use locales other than the server's default), you'll have to use a simple java wrapper class--the MessageFormat format method is overloaded and coldfusion can't easily use it's other "flavors" which require StringBuffer and FieldPosition classes.