January 01, 2004

unicode compression

unicode is good. unicode is great. it's by far the best choice for g11n work (i "bite my thumb" at codepage encodings). but like everything else in the real world it has its seamier side. in order to encode all the world's scripts unicode must often use more than 1 or even 2 bytes for many characters. ASCII only folks with an occasional need for non-ASCII characters wouldn't give this a second thought, the rest of the world (especially CJK folks) however aren't so fortunate. "unicode bloat" is a sad but true fact of g11n life. unicode compression is therefore a fairly interesting topic to many unicode users and developers. doug ewell has posted a pretty understandable article on unicode compression. the article provides a nice background to unicode and compression. it's a good read and well worth the time. and by now i hope everyone's had a safe and happy new year's eve celebration. for those of us already past that and wondering "what the heck went on", i refer you to the jean luc-ponty tune (composed and arranged by frank zappa) and ask the question "how would you like to have a head like that".


