determine input language: pure cf solution
as mentioned earlier (june15th posting), i finally got my act somewhat together and created a pure coldfusion solution uBlocks.cfc to the problem of determining input language (or should i say unicode block). this CFC trys to determine unicode block (or subrange) from a sample of text. it returns a cf query with unicodeblock stats for the test phrase (ie. basic latin x%, CJK y%, etc.). you can try it out here. it will eventually bubble up on the devnet gallery. i'd appreciate any feedback. on the 'to do' list for this thing:- combine unicode block information into scripts and then into individual languages (chances of this: slim or next to none) or at least language clusters
- return array of unicodeblocks corresponding to position within text sample, ie. map unicodeblock *positions* within text sample. probably useful for parsing "tower of babel" text.
0 Comments:
Post a Comment
<< Home