October 01, 2004

what you don't know about latin-1 might hurt you

french cf users might want to pay attention to this...

there is an on-going discussion on the unicode list about "internationalization assumption" which simplistically goes something along the lines of if latin-1 is tested ok can we assume all latin-1 languages are "a-ok"? as it turns out, "no". some of the folks participating in this discussion have pointed out that, for example, not all french chars are found in latin-1. my first thought on reading that was, "oh yeah, the euro" but as it turns out there are a couple of french chars (no idea of their frequency of use but they are used in the french words for eye, egg, beef and heart) that are not in latin-1 but are in latin-9. for example see jukka pela's excellent latin-1/latin-9 comparison page. these chars are also found in windows 1252 code page (which i guess helps support the idea that it's actually a superset of latin-1).

the moral of the story? just use unicode

2 Comments:

At 10/01/2004 10:16 PM, Anonymous Anonymous said...

Unicode is great for encoding, but if you are embedding fonts in flash, you need to know what is the smallest set of characters for a given language. (Embedding all unicode characters would probly be a couple of GB?)

It would be nice if there was table somewhere with what chars are needed for what languages.

 
At 10/01/2004 11:28 PM, Blogger Paul Hastings said...

no idea about flash but arial unicode is only 20 odd mb. there are plenty of fonts that contain varying amounts of "unicode".

 

Post a Comment

<< Home