March 07, 2004

Wasting time on micro-optimizations

Richard Schaut writes about an appallingly stupid interview question -- asking the candidate to explain what this line of code does:

a ^= b ^= a ^= b;

As Richard explains, the question is stupid because the expression is undefined. The resulting code may do anything from swap a and b to cause a butterfly's wings to flap in China; either behavior would be correct.

That's only sort of the point, though. You'd only write this line if you were trying to swap two variables and you figured that this would be faster than creating a temporary variable to store one in while you copied the other. Richard mentions that a modern compiler will optimize things well enough that the temporary variable solution may actually be faster, so you wouldn't be helping things at all.

Even that's not the point. The point is that a good programmer shouldn't even think about optimizing something like this in the first place. Lots of people have their own favorite tricks -- things like ++i being faster than i++, or macros being faster than function calls. Cute as those kinds of things may be, they sacrifice readability and provide an insignificant performance win.

If you're coding for performance, you should figure out what's slow before worrying about tweaking your code. Find the scenario that should be faster, measure it, and see what's slow. I bet it won't be variable copying. In most cases, it'll either be algorithmic or architectural. Either you're using an inefficient algorithm to accomplish a task that could be done faster in a different way or part of your application's design forces you to do things slowly. Correcting those problems will give you far greater performance gains than any cascading series of xors ever could.

All of this is at the forefront of my mind right now because I've been spending some of my free time over the past few weeks helping out with a new version of an internal application that has some performance issues. Everyone knew it felt slow, but nobody sat down to measure it. I did, and the first thing that jumped out was that it spent far too long getting strings out of XML data. How long? Well, the algorithm it was using looked like this and was invoked many times for every new document:

 Walk through all children of the current node to count them
 for i = 1 to numChildren
 	Walk through all children until we get to child i
 	Get the text of child i
They had a function elsewhere that would apply another function to all children of a node. With that, rather than walking through all of the children and then repeating that again for every child, they could just do it once. I quickly wrote the code, measured it again, and ended up with a performance improvement of more than 10x for my test case. A document that previously took more than three minutes to open now opens in about fifteen seconds. You'd be hard-pressed to get a similar improvement from any sort of micro-optimization.

That isn't to say that micro-optimizations are never useful. If you call a three-line function millions of times, your code will be faster if you inline the function instead of calling it as a real function. But you'll probably get a much bigger win if you call the function half as often instead...and you shouldn't waste your time at all unless you know that calling the function is affecting your application's performance in the first place.