June 18, 2003

determine input language: pure cf solution

as mentioned earlier (june15th posting), i finally got my act somewhat together and created a pure coldfusion solution uBlocks.cfc to the problem of determining input language (or should i say unicode block). this CFC trys to determine unicode block (or subrange) from a sample of text. it returns a cf query with unicodeblock stats for the test phrase (ie. basic latin x%, CJK y%, etc.). you can try it out here. it will eventually bubble up on the devnet gallery. i'd appreciate any feedback. on the 'to do' list for this thing:
  • combine unicode block information into scripts and then into individual languages (chances of this: slim or next to none) or at least language clusters
  • return array of unicodeblocks corresponding to position within text sample, ie. map unicodeblock *positions* within text sample. probably useful for parsing "tower of babel" text.


Post a Comment

<< Home