October 26, 2004

persian calendar

persian calendars in cf seem to have come up a bit lately and since monday was a holiday here in the big mango i had a few hours to put into slapping together something. you can see the first cut at a persian calendar CFC here. it doesn't do much except format/convert gregorian dates to the persian calendar and back again (right now it can only parse medium/short persian date formats). still lacks calendar math, real persian date string parsing, arabic-hindic digits date formats, etc.

so what's a persian (or iranian) calendar? why it's the formal calendar in general use in iran, also known as the solar hijri calendar and sometimes as the jalali calendar. i've also seen it described as the shamsi calendar. frankly i have no idea which is correct so i'll stick with "persian". since it's one the few calendars designed in the era of accurate positional astronomy, it's probably the most accurate solar calendar around. you can read more here or here.

i've also been looking at this java calendar class. it has a boatload of calendars (besides persian it has mayan, nepali, hindu, coptic and believe it or not a french revolutionary calendar).

October 24, 2004

geoLocator updated

after months of being beseiged by angry, pitchfork and torch armed mobs, nigel's finally updated the InetAddressLocator.jar ;-) besides the updated IP database, we've included a remote classpath version of the CFC that uses spike's method in cases where you can't put the JAR in a classpath.

you can see the CFC in action (and pick up a copy) here or grab it from sourceforge.

October 22, 2004

new i18n w3c faq

if you want to know how the W3C defines g11n, i18n, and l10n have a look at this. it was prepared by susan k. miller over at Boeing.

but you already know all that....

October 21, 2004

cldr 1.2 in beta

the latest version of the cldr (1.2) has entered beta. of particular interest are the 'interim vetting charts' which gives you a sneak preview of what's been changed & what's coming for the release version. many of these are "common" changes such as localized territory names, etc. but there are some local stuff that's been "fixed".

in case you're interested, there's also a cldr wiki.

October 18, 2004

when a locale isn't a Locale

there was a recent discussion concerning using farsi (persian) language with cf. my first reaction was to point out that farsi locales (fa_IR iran and fa_AF afghanistan) weren't supported java locales, so that was that.

at about the same time there was an announcement on the icu4j mailing list about the next version being built on CLDR data. so i asked if that meant that we'd be able to make use of all the "new" locales in CLDR like farsi, etc. one of the icu4j guys (steven loomis) replied "yes" and further pointed out that icu4j 2.8 was already making use of icu4c's locale data. further discussion with steven helped debunk one of my long held misconceptions, that a java "locale" was a real world "Locale" (ie. the locale bundled up with all it's attendant resource data such as day/month names, etc.). "Locales are just identifiers" says steven, "duh!" says i. while it's convenient to think locales == Locales, formally in java "locale" refers to the identifier and not the data.

so what? what that means, if you're using icu4j for your i18n work (and you should), is that you have access to all the nifty locales that icu4j has no matter what core java supports (or doesn't support in this case). so something like this becomes possible (and easy):

<cfscript>
fullFormat=javacast("int",0);
farsiLocale=createObject("java","java.util.Locale").init("fa","IR");
utcTZ=createObject("java","com.ibm.icu.impl.JDKTimeZone").getTimeZone("UTC");
aDateFormat = createObject("java","com.ibm.icu.text.DateFormat");
aCalendar =createObject("java","com.ibm.icu.util.GregorianCalendar").init(utcTZ,farsiLocale);
dF=aDateFormat.getDateInstance(aCalendar,fullFormat,farsiLocale);
writeoutput("#farsiLocale.getDisplayName(farsiLocale)# #dF.format(now())#<br>");
</cfscript>

which produces: Persian (Iran) دوشنبه، ۱۸ اکتبر ۲۰۰۴

note that the core java getDisplayName method falls back on "Persian (Iran)" which while not perfect is better than nothing. icu4j 3.0 ULoclae class would actually produce the correctly localized name.

the more i work with icu4j, the more impressed i am with how well-thought it is. it really is the bees' knees for i18n work.

thanks to steven for enlightening me.

October 01, 2004

what you don't know about latin-1 might hurt you

french cf users might want to pay attention to this...

there is an on-going discussion on the unicode list about "internationalization assumption" which simplistically goes something along the lines of if latin-1 is tested ok can we assume all latin-1 languages are "a-ok"? as it turns out, "no". some of the folks participating in this discussion have pointed out that, for example, not all french chars are found in latin-1. my first thought on reading that was, "oh yeah, the euro" but as it turns out there are a couple of french chars (no idea of their frequency of use but they are used in the french words for eye, egg, beef and heart) that are not in latin-1 but are in latin-9. for example see jukka pela's excellent latin-1/latin-9 comparison page. these chars are also found in windows 1252 code page (which i guess helps support the idea that it's actually a superset of latin-1).

the moral of the story? just use unicode

cldr 1.2

unicode has just announced the public release of the alpha version of the cldr (Common Locale Data Repository). some of the highlights include:
  • better documentation for date/number format patterns (one of my favorites)
  • added stuff about references/validity/etc.
  • new timezone localization model
  • weekend data
  • added Oriya, Malayalam, Assamese, Welsh, Dzongkha, Bhutan, Khmer and Lao (woohoo se asian) locales
  • added more country,language,currency, and type display name data for ar, bg, cs, el, he, hr, hu, is, mk, pl, ro, ru, sk, sl, sr, tr, uk (the arabic stuff is way cool)
read more on the cldr website. you can compare the cldr versus platform data here. and you can report bugs here. via the unicode mailing list.