Hey .intern() … get me a String!

A lot of us that work in Java take Strings for granted. Some of us toss them around like primitive types paying little regard to what they represent or how they should be used. Because of this, some classic problems like “==” vs. “.equals()” arise. Just in case you don’t know what I’m talking about here, in Java using == to test for equality between two objects (let’s call them A and B), results in a comparison  of the reference to which A and B refer, NOT the value to which they refer. Notice that I said OBJECTS. Thats the key word here. I’m not talking about primitive types like int, short, char, boolean, etc… I’m talking about OBJECTS. Sometimes, people forget about Strings being objects so you’ll see something like if ( A == B ){ … I recently ran into a bug in a piece of code that was caused by just that. It actually went a little something like if( “VALUE” ==  srt ) { …

If you read the above and scratched your head in confusion, I suggest taking some time to learn a little bit more about Strings, comparison operators, and a little bit more about Java in general before continuing onward.

If you’ve read the above and I’ve bored you, GOOD! This NEXT section is a little more interesting…

AND NOW, HERE’S WHERE THE WHEELS FALL OF THE CART!

There are times, in Java land, if you REALLY, REALLY, REALLY, know what you’re doing, when you CAN use == to evaluate 2 Strings. I’m going to warn you though – I don’t think I’d ever even use what I’m about to talk about unless I ABSOLUTELY HAD TO for some reason or another and ONLY after thorough testing.

And so, I introduce…

String.intern()

Since String objects are immutable in Java land, there are tons of little things you can do to produce all sorts of new String objects during runtime. Take for example, the following code:

String x = “FOO”;

….

public String bar(String z)

{

return z+z;

}

String y = bar(x);

So now, our variable y is assigned the value “FOOFOO”. What if, however, somewhere out in our program we had another variable w. And, lets just pretend that our variable w was assigned the value “FOOFOO” during runtime. Since Strings are immutable, we could have just used y to point to our w reference since we could, for all intents and purposes, consider them in this case to be equal.

Enter String.intern().  So, in a nutshell, .intern() is basically your way of saying “Hey… if there’s already a String out there in Java land, point my value to that guy instead. I know he’s not changing. I’ll just use him!”

Under the hood, your JVM keeps a little pool of unique Strings floating around and happy for just this occasion.  If you call .intern(), the JVM lifeguard takes a gander into that pool to see if you look like anyone in there. If you do, he points you to that guy. If you don’t, he throws you into that pool.

All of your String literals and compile time constants are in here. Runtime strings aren’t. Not unless you invoke .intern().

So, that being said, if you use this method consistently throughout your program, you COULD replace your .equals() evaluations with ==.  Why would one do such  a thing?

Take a look at the String.equals() source (right around line 854). There’s actually a LOT going on here. Especially down in that for loop! Using == might be a bit faster for evaluating equality in this case!

Here’s the catch – you incur some up front loss in performance for some gains later on down the road. Calling .intern() isn’t free OR cheap. In fact, you might see some people out there writing about how using if ( s0.intern() == s1.intern() ){…  is faster than if (s0.equals(s1)){…

I’ve run a few performance tests. Nothing too in depth (I was just curious to see for myself) but in most simple cases, the second evaluation is your safer bet. I encourage you to try it out for yourself! If you’re really curious, take a look at the byte code! In some not-so simple cases however, you MAY achieve some performance gains. I wouldn’t recommend s0.intern() == s1.intern() for evaluation, but rather s0 == s1 IF AND ONLY IF you have guaranteed that these Strings have been intern() ‘d !

 

3 Comments

Filed under Java

3 responses to “Hey .intern() … get me a String!

  1. I’m really glad you brought up the performance ramifications at the end. I immediately started thinking about the performance overhead of using .intern() and was concerned by your excitement to use it.

    I am, however, still concerned that you’re leaving yourself on your knees begging for trouble with this whole thing. It’s one thing if you have a pool of strings encapsulated inside a method that you can keep a close eye on, but it’s a whole ‘nother story if you’re taking Strings external to your method and just praying that they used .intern(). I can only imagine the frustrating debug session that may lead to…

    • Which is EXACTLY why I had prefaced this with “I don’t think I’d ever even use what I’m about to talk about unless I ABSOLUTELY HAD TO for some reason or another and ONLY after thorough testing.“.
      I would sincerely hope that this approach would only be used in blocks of code with sound functional cohesion. This isn’t an approach to be used with anyone who still hasn’t shaken off some of the bad programming practices that hopefully fade as a developer learns and grows. In fact, this isn’t even an approach that should be taken by seasoned vets who haven’t done a proper DCUT and planned for the long term effects that introducing this approach might incur. This is one of those approaches that could be used by a team who’s seeking to optimize some block of code, has hopefully tried a few other techniques first, and absolutely, positively, without a doubt knows what they’re doing 😉

  2. Matt

    From http://www.ibm.com/developerworks/xml/library/x-perfap1/index.html

    String internalization
    SAX specifies a feature that’s identified by the feature URI http://xml.org/sax/features/string-interning. When set to true, it instructs the parser to report XML names — such as the names of elements and attributes — and namespace URIs as internalized strings that have been interned by invoking java.lang.String.intern().

    So this approach can’t be all that bad!

Leave a comment