Tuesday, September 23, 2008

String constructor considered useless turns out to be useful after all (film at 11)

When you were still a Java neophyte, chances are you wrote some code that looked like this:
String s = new String("test");
Brings back some embarrassing memories, doesn't it? You soon learned that instantiating Strings via the constructor was hardly ever done, and the String constructors seemed to be, well, utterly useless. When do you really ever need to do "new String(oldString)"? Come on, IntelliJ IDEA even flags all occurrences of this as "redundant"!

It turns out that this constructor can actually be useful in at least one circumstance. If you've ever peeked at the String source code, you'll have seen that it doesn't just have fields for the char array value and the count of characters, but also for the offset to the beginning of the String. This is so that Strings can share the char array value with other Strings, usually results from calling one of the substring() methods. Java was famously chastised for this in jwz'  Java rant from years back:
The only reason for this overhead is so that String.substring() can return strings which share the same value array. Doing this at the cost of adding 8 bytes to each and every String object is not a net savings...
Byte savings aside, if you have some code like this:
// imagine a multi-megabyte string here
String s = "0123456789012345678901234567890123456789";
String s2 = s.substring(0, 1);
s = null;
You'll now have a String s2 which, although it seems to be a one-character string, holds a reference to the gigantic char array created in the String s. This means the array won't be garbage collected, even though we've explicitly nulled out the String s!

The fix for this is to use our previously mentioned "useless" String constructor like this:
String s2 = new String(s.substring(0, 1));
It's not well-known that this constructor actually copies that old contents to a new array if the old array is larger than the count of characters in the string. This means the old String contents will be garbage collected as intended. Happy happy joy joy.

Sometimes, seemingly useless constructs reveal hidden gems of usefulness. Be sure to check out the source for String to find how this works!

9 comments:

messi said...

Actually, it is not guaranteed that this works with every classlib, because the Javadoc of String(String) does not state such a requirement.

e.g.: Apache Harmony's code:

public String(String string) {
value = string.value;
offset = string.offset;
count = string.count;
}

levi_h said...

Another corner case where the String(String) constructor comes in handy is when using an IdentityHashMap.

Kjetil Ødegaard said...

@messi: Good point, I forgot to specify that this is from Sun's String implementation and not in the spec.

Nick said...

A good case for using the 'new String(String)' constructor is when you don't want to hold a reference to a literal string.

When you declare a string in the following form...
String s = "a string";

"a string" will be cached in permgen space. Any further uses of the literal string will return the same cached object.

E.g. if you're using a String to lock around in a synchronized statement (handy for readability and debugging), locking around a literal is a BAD idea! If multiple classes use the same code...

synchronized("My Lock") {
.....
}

They'll actually be locking around the same object (and they'll probably have no idea why their app keeps blocking).

If you use the following...

synchronized(new String("My Lock")) {
.....
}

you still get all the benefits of keeping a description of the lock handy and get none of the locking problems.

Nick

Markus Kohler said...

True this is not well know,
Also I covered a similar issue here some time ago:
https://www.sdn.sap.com/irj/sdn/weblogs?blog=/pub/wlg/5100

The sharing behavior has changed after JDK 1.4 in certain places.

Anonymous said...

exactly what are you going to do when you write something like this ??
i would say nothing is synchronized here..


>>If you use the following...
>>
>>synchronized(new String("My >>Lock")) {
>>.....
>>}
>>
>>you still get all the benefits >>of keeping a description of the >>lock handy and get none of the >>locking problems.

Lawrence said...

@Nick: Using

synchronized(new String("My Lock")) {

is *badly* broken because it will not mutually exclude anything since every thread will have a separate object; in fact newer compilers will no-op that puppy right out of there.

Nick said...

Yeah my mistake... not really sure what I was thinking when I typed that. What I meant was closer to...


---------------
this.myLock = new String("My Lock");
...
synchronized (this.myLock) {
...
}
---------------


synchronized (new ... is just plain crazy

messi said...

There's no advantage in using a String solely as a lock object. It only wastes memory. Use a plain object instead.

static final Object DESCRIPTIVE_NAME = new Object();

...

synchronized (DESCRIPTIVE_NAME) {
...
}