determine input language
finding a specific language from input text is rather hard but you can come "close" by determining the UnicodeBlock a particular char falls in. it won't help much with larger clumps of languages such as latin-1 (western european languages) or CJK (chinese, japanese, korean). this java code snippet is a simple wrapper class that returns the UnicodeBlock of a given char.
import java.lang.*;
import java.lang.Character;
import java.lang.Character.UnicodeBlock;
public class determineLanguage {
public final static String whatLanguage(char aChar){
java.lang.Character.UnicodeBlock aBlock = java.lang.Character.UnicodeBlock.of(aChar);
String thisLanguage = String.valueOf(aBlock);
return thisLanguage;
}
}
this cfmx code snippet illustrates how to use the wrapper:
<cfsilent>
<cfprocessingdirective pageencoding="utf-8"> <!--- remove for cf5 --->
<cfcontent type="text/html; charset=utf-8"> <!--- remove for cf5 --->
<cfscript>
if (isDefined("form.testLanguage") and trim(len(form.testLanguage))) {
determineLanguage = createobject("java","determineLanguage");
test=asc(form.testLanguage);
thisLang = determineLanguage.whatLanguage(test);
}
</cfscript>
</cfsilent>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Test Language Determination</title>
<style type="text/css">
BODY {
font-size : 85%;
font-family : "Arial Unicode MS";
}
INPUT {
font-family : "Arial Unicode MS";
}
</style>
</head>
<body>
<form action="testLang.cfm" method="post">
test language: <input type="text" name="testLanguage" size="50"> <input type="submit" value="try">
</form>
<cfif isDefined("variables.thisLang")>
<b>unicode subset</b>: <cfoutput>#thisLang#</cfoutput>
</cfif>
</body>
</html>
as soon as i can get my unicode char db sorted out i suppose we can dispense with the java class altogether and just use cf (mx) code.




<< Home