Tuesday, September 23, 2008

String constructor considered useless turns out to be useful after all (film at 11)

When you were still a Java neophyte, chances are you wrote some code that looked like this:
String s = new String("test");
Brings back some embarrassing memories, doesn't it? You soon learned that instantiating Strings via the constructor was hardly ever done, and the String constructors seemed to be, well, utterly useless. When do you really ever need to do "new String(oldString)"? Come on, IntelliJ IDEA even flags all occurrences of this as "redundant"!

It turns out that this constructor can actually be useful in at least one circumstance. If you've ever peeked at the String source code, you'll have seen that it doesn't just have fields for the char array value and the count of characters, but also for the offset to the beginning of the String. This is so that Strings can share the char array value with other Strings, usually results from calling one of the substring() methods. Java was famously chastised for this in jwz'  Java rant from years back:
The only reason for this overhead is so that String.substring() can return strings which share the same value array. Doing this at the cost of adding 8 bytes to each and every String object is not a net savings...
Byte savings aside, if you have some code like this:
// imagine a multi-megabyte string here
String s = "0123456789012345678901234567890123456789";
String s2 = s.substring(0, 1);
s = null;
You'll now have a String s2 which, although it seems to be a one-character string, holds a reference to the gigantic char array created in the String s. This means the array won't be garbage collected, even though we've explicitly nulled out the String s!

The fix for this is to use our previously mentioned "useless" String constructor like this:
String s2 = new String(s.substring(0, 1));
It's not well-known that this constructor actually copies that old contents to a new array if the old array is larger than the count of characters in the string. This means the old String contents will be garbage collected as intended. Happy happy joy joy.

Sometimes, seemingly useless constructs reveal hidden gems of usefulness. Be sure to check out the source for String to find how this works!

Friday, August 22, 2008

Multiline grep

I recently needed to do some multiline grepping through some log files. The problem is, the default grep and egrep on Linux systems don't seem to support regex patterns that extend over several lines.

Luckily, at least RHEL come with pcregrep installed, which does Perl-compatible regex matching. And by adding a -M switch you get multiline matching!

So you can just go
pcregrep -M 'a\nb' files...
Easy peasy!

Thursday, May 15, 2008

EBCDIC trick

You can use good old /bin/dd to convert EBCDIC files to their ASCII equivalent.

Just do
dd if=infile.txt of=outfile.txt conv=ascii
Which does the conversion automatically.

Monday, April 7, 2008

Maven 2 help plugin

(I always forget this bit, so I'm putting it here for posterity.)

The help plugin for Maven 2 can print the possible goals and other info for a plugin:
mvn help:describe -Dplugin=eclipse -Dmedium=true
-Dfull=true gives some more detailed output.

For more info see the help plugin pages.