if you do i18n work you should probably already know about IBM's cool resource bundle manager. while plumbing the depths of the java i18n forums, i stumbed onto another one, attesoro that looks equally functional. like IBM's tool, its a pure java solution and produces proper java resourceBundles (ie. unicode chars are encoded using escaped ascii, \u0000 style). these are a little difficult to deal with in cf as you have to spend some resources to parse the data--i normally save resource bundles as utf-8 to get around this, it also helps with managing translations as humans can see human readable text data. in any case this looks like another decent weapon for your i18n arsenal.
those ms bullies are up to their old tricks again, darn their eyes they're diabolically giving away MSDE without any strings....the swine ;-)
if you don't know MSDE, its sort of like sql server's idiot brother-in-law. it has some limitations:
- 2gb db size limit
- 5 concurrent users (batch workloads)
- no GUI for management (which makes for a good way to learn tSQL)
- no DTS designer (though DTS itself works in MSDE)
- it default installs with trusted (windows based) authentication only (which causes folks using MX's JDBC sql server drivers plenty of mystery headaches)
but for development/learning and lower volume websites (25 concurrent users) its a sweet deal. in the past there was always something you had to buy from ms to be "legal" with msde (but everybody and their brother had ways through that). i guess they got tired of swimming against the tide and made things easier.
so now you know.
tex texin's got a very nifty explanation on the hebrew numbering system (still in use for calendars and religious texts). quoting from his article, "each letter in the hebrew alphabet (or aleph-bet) has a numerical value". there's no zero (the way hebrew numbers are formed it doesn't matter, western numbers, being positioned based would be a mess without a zero value). the first 10 letters of the hebrew alphabet are also the numbers 1-10 with the next 9 letters representing the values 20, 30, 40,50,60,70,80,90,100, the remaining letters represent 200, 300, and 400. i find the way numbers are formed quite interesting--but i leave you to read that in tex's article.
well not really but from an advertizing point of view maybe they won't much longer. i often argue that "backyard globalization" is an important point to consider when developing cf applications. if you're not looking to develop fully global apps, at least consider non-english speakers in your "backyard" (let's say it's in the US). well according to this news article (yes, its also a snazzy cf-powered site), hispanics in the US now outnumber canadians in canada. you're looking at a 38.8 million people growing marketplace with an estimated $675 billion annual purchasing power. something to chew over next time you're designing out an application.
you can find some more interesting reading on g11n business aspects here. the article on chinese whispers is particularly cool.
ps: yes i know canada is bilingual and a very compelling case for "backyard globalization" too but i just couldn't resist ;-)
IBM's has done a maintenance release for ICU4J 2.6.1. you can pick it up here.
quoting the ICU4J site:
list of significant changes for the 2.6.1 release:
- UCA 4.0 ICU has been updated to use the latest version of UCA - 4.0.
- Thai Royal Dictionary Collation: Thai collation tailoring has been updated to reflect the Thai Royal Dictionary ordering. Changes have been made to collation code in order to properly support invalid Thai sequences (chaiyo!).
- Collation: parser/builder bug fixes: Several bugs in collation rule parser and builder have been fixed.
- Unicode character properties data has been synched with ICU4C
- Other bug fixes: Bugs have been fixed in layout engine (jitterbug number 3041), BiDi (3174), string functions (3243) and platform support (3097).
stars and garters, the w3c has added an RSS feed for their FAQs, you can find it here. the latest FAQ deals with setting encoding in web authoring applications including many macromedia products (the cf stuff is kind of soggy and hard to light but at least its mentioned, i'll see if it can't be stiffened up a bit).
the common XML locale data repository (CLDR) has gone to beta. the purpose of this project, in case you can't recall, is two-fold (quoting the CLDR site):
1) devise a general XML format for the exchange of culturally sensitive (locale) information for use in application and system development
2) gather, store, and make available data generated in that format
this "kitchen sink" approach goes way beyond the simple HTML concept of locale (which is basically language as used in a location) and includes such groovy stuff like collation, calendars, timezones, measurements, delimiters, etc.
similarly, those cool ICU4J folks have just proposed a LocaleMisc class to be added to their nifty java library that would expose locale info such as exemplar characters, measurements, and paper size (never would have thought of that one).
onward and upward.