August 26, 2008

case shennigans

i really feel for our turkish cf brethren, they always seem to be getting the short end of the stick. a couple of weeks ago there was an issue in the support forums with someone using turkish locale (tr_TR) that was having problems getting case right using coldfusion's uCase() & lCase() functions. there's a couple of special characters, "i" & "ı" (that's small letter i & small letter dotless i) that are special cases when it comes to case mappings (bad pun willfully intended) which cf's functions weren't handling correctly. i was a bit perplexed by this, mainly as we usually deal with locales which have writing systems that don't have a concept of case but after poking around core java's String class it seems that cf wasn't using the overloaded versions of the toUpperCase()/toLowerCase() methods which pass in a locale to use to handle locale sensitive case. easy enough to fix in cf (i really love how easily coldfusion lets you workaround these little issues):

<cffunction name="toLowerCase" output="false" returntype="string" access="public">
<cfargument name="inString" required="true" type="string" hint="string to lower case">
<cfargument name="locale" required="false" default="en_US" type="string" hint="java style locale identifier to use to lower case input string">
<cfscript>
var thisLocale="";
var l=listFirst(arguments.locale,"_"); // language
var c=""; // country, we'll ignore variants
if (listLen(arguments.locale,"_") GT 1)
      c=uCase(listGetAt(arguments.locale,2,"_"));
// build locale
thisLocale=createObject("java","java.util.Locale").init(l,c);
return arguments.inString.toLowerCase(thisLocale);
</cfscript>
</cffunction>


<cffunction name="toUpperCase" output="false" returntype="string" access="public">
<cfargument name="inString" required="true" type="string" hint="string to upper case">
<cfargument name="locale" required="false" default="en_US" type="string" hint="java style locale identifier to use to upper case input string">
<cfscript>
var thisLocale="";
var l=listFirst(arguments.locale,"_"); // language
var c=""; // country, we'll ignore variants
if (listLen(arguments.locale,"_") GT 1)
      c=uCase(listGetAt(arguments.locale,2,"_"));
// build locale
thisLocale=createObject("java","java.util.Locale").init(l,c);
return arguments.inString.toUpperCase(thisLocale);
</cfscript>
</cffunction>

<cfscript>
s="#chr(105)##chr(305)##chr(223)#";
upperS=toUpperCase(s,"tr_TR");
lowerS=toLowerCase(upperS,"TR_TR");
writeoutput("input string: #s#<br> upper case: #upperS#<br>lower case: #lowerS#");
</cfscript>


notice how i didn't have to mess with the core java String class, i could just use it's methods on a cf string. even if you're not using tr_TR locale, you should note that "ß" (small letter sharp s) is also a special case, upper casing it actually turns it into 2 letters, "SS". i think there might also be some issues with some Greek characters as well.

Labels: ,

0 Comments:

Post a Comment

<< Home