determine input language
finding a specific language from input text is rather hard but you can come "close" by determining the UnicodeBlock a particular char falls in. it won't help much with larger clumps of languages such as latin-1 (western european languages) or CJK (chinese, japanese, korean). this java code snippet is a simple wrapper class that returns the UnicodeBlock of a given char.import java.lang.*; import java.lang.Character; import java.lang.Character.UnicodeBlock; public class determineLanguage { public final static String whatLanguage(char aChar){ java.lang.Character.UnicodeBlock aBlock = java.lang.Character.UnicodeBlock.of(aChar); String thisLanguage = String.valueOf(aBlock); return thisLanguage; } }this cfmx code snippet illustrates how to use the wrapper:
<cfsilent> <cfprocessingdirective pageencoding="utf-8"> <!--- remove for cf5 ---> <cfcontent type="text/html; charset=utf-8"> <!--- remove for cf5 ---> <cfscript> if (isDefined("form.testLanguage") and trim(len(form.testLanguage))) { determineLanguage = createobject("java","determineLanguage"); test=asc(form.testLanguage); thisLang = determineLanguage.whatLanguage(test); } </cfscript> </cfsilent> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Test Language Determination</title> <style type="text/css"> BODY { font-size : 85%; font-family : "Arial Unicode MS"; } INPUT { font-family : "Arial Unicode MS"; } </style> </head> <body> <form action="testLang.cfm" method="post"> test language: <input type="text" name="testLanguage" size="50"> <input type="submit" value="try"> </form> <cfif isDefined("variables.thisLang")> <b>unicode subset</b>: <cfoutput>#thisLang#</cfoutput> </cfif> </body> </html>as soon as i can get my unicode char db sorted out i suppose we can dispense with the java class altogether and just use cf (mx) code.
<< Home