Sunday, June 19, 2011

Java’s toLowerCase() has got a surprise for you!

Have you ever encountered a surprise while using toLowerCase()? This is a widely used method when it comes to strings and case conversion. There is a nice little thing you should be aware of.





toLowerCase() respects internationalization (i18n). It performs the case conversion with respect to your Locale. When you call toLowerCase(), internally toLowerCase(Locale.getDefault()) is getting called. It is locale sensitive and you should not write a logic around it interpreting locale independently.


import java.util.Locale;
 
public class ToLocaleTest {
    public static void main(String[] args) throws Exception {
        Locale.setDefault(new Locale("lt")); //setting Lithuanian as locale
        String str = "\u00cc";
  System.out.println("Before case conversion is "+str+" and length is "+str.length());// Ì
        String lowerCaseStr = str.toLowerCase();
  System.out.println("Lower case is "+lowerCaseStr+" and length is "+lowerCaseStr.length());// i?`
    }
} 
 
In the above program, look at the string length before and after conversion. It will be 1 and 3. Yes the length of the string before and after case conversion is different. Your logic will go for a toss when you depend on string length on this scenario. When your program gets executed in a different environment, it may fail. This will be a nice catch in code review.

To make it safer, you may use another method toLowerCase(Locale.English) and override the locale to English always. But then you are not internationalized.
So the crux is, toLowerCase() is locale specific.

No comments:

Post a Comment