Wasting time on micro-optimizations

Richard Schaut writes about an appallingly stupid interview question — asking the candidate to explain what this line of code does:

a ^= b ^= a ^= b;

As Richard explains, the question is stupid because the expression is undefined. The resulting code may do anything from swap a and b to cause a butterfly’s wings to flap in China; either behavior would be correct.

That’s only sort of the point, though. You’d only write this line if you were trying to swap two variables and you figured that this would be faster than creating a temporary variable to store one in while you copied the other. Richard mentions that a modern compiler will optimize things well enough that the temporary variable solution may actually be faster, so you wouldn’t be helping things at all.

Even that’s not the point. The point is that a good programmer shouldn’t even think about optimizing something like this in the first place. Lots of people have their own favorite tricks — things like ++i being faster than i++, or macros being faster than function calls. Cute as those kinds of things may be, they sacrifice readability and provide an insignificant performance win.

If you’re coding for performance, you should figure out what’s slow before worrying about tweaking your code. Find the scenario that should be faster, measure it, and see what’s slow. I bet it won’t be variable copying. In most cases, it’ll either be algorithmic or architectural. Either you’re using an inefficient algorithm to accomplish a task that could be done faster in a different way or part of your application’s design forces you to do things slowly. Correcting those problems will give you far greater performance gains than any cascading series of xors ever could.

All of this is at the forefront of my mind right now because I’ve been spending some of my free time over the past few weeks helping out with a new version of an internal application that has some performance issues. Everyone knew it felt slow, but nobody sat down to measure it. I did, and the first thing that jumped out was that it spent far too long getting strings out of XML data. How long? Well, the algorithm it was using looked like this and was invoked many times for every new document:

 Walk through all children of the current node to count them
 for i = 1 to numChildren
 	Walk through all children until we get to child i
 	Get the text of child i

They had a function elsewhere that would apply another function to all children of a node. With that, rather than walking through all of the children and then repeating that again for every child, they could just do it once. I quickly wrote the code, measured it again, and ended up with a performance improvement of more than 10x for my test case. A document that previously took more than three minutes to open now opens in about fifteen seconds. You’d be hard-pressed to get a similar improvement from any sort of micro-optimization.

That isn’t to say that micro-optimizations are never useful. If you call a three-line function millions of times, your code will be faster if you inline the function instead of calling it as a real function. But you’ll probably get a much bigger win if you call the function half as often instead…and you shouldn’t waste your time at all unless you know that calling the function is affecting your application’s performance in the first place.

7 Comments

  1. louis Said,

    March 7, 2004 @ 5:27 pm

    When I am writing code I often am trying to get the thing running at all, as opposed to totally correctly. So I end up sprinkling a ton of //FIXME’s all over my code, which need to be taken care of before it can be committed, but are things I can hardcode or ignore while I am doing the initial bring up. While I am at it, I also tend to leave //PERF’s next to anywhere I think that there is a win to be had but that I don’t want to deal with because it is either more complex, or I want to see profiling data first. Its a quick hint, and it lets me express my first instincts on optimizations without actually impairing my code readability or complicating my initial debugging.

    As for ++i… I need to disagree with you a little on this one. Not about the optimization thing, your right, people who believe the compiler can’t transform a post-increment to a pre-increment when it needs to should not be doing that sort of optimization. The thing is that i++ is a more complicated semantic, copy, then increment original. ++I is simply an increment. In general if I have two interfaces that allow me to achieve the same result, but one may have an incidental side-effect that in a particular context I don’t need then I use the simpler one, because it means that I am less likely to spend a bunch of time debugging some stupid bug in a copy constructor that I did not even need or want called. This is especially irritating because it is the sort of bug that may go away or come back when you change optimization levels. This is some ways only really an issue because people can overload operators with things that are not semantically compatible with their general arithmetic meaning. I suppose what I am saying is the fewer external functions you invoke the simpler things are to debug.

    In any event, I find that the best thing to do generally is to follow the style of the code I am modifying, and to only changes things when there is heavy refactoring going on anyway, because adding something that is “stylistically better” just adds one more inconsistency to the code base.

  2. Alexei Kosut Said,

    March 8, 2004 @ 6:04 pm

    Yep. C++ is the one area where ++i vs. i++ matters, since if those operators are overloaded, the compiler can’t substitute operator++() for operator++(int), even though that’s probably what you meant. With the STL, ++ is a common operator on objects, although most STL implementations try hard not to make an incidental iterator copy here and there cost too much. And, of course, in most cases iterator copy cost is not something you need to waste your time optimizing.

    I think it’s a shame that the C++ language doesn’t define some semantic requirements for overloaded operators that would allow the compiler to do these sorts of optimizations, but I guess that wouldn’t exactly make C++ any simpler to understand.

  3. Brooks Moses Said,

    March 13, 2004 @ 7:01 pm

    Huh. It seems like a perfectly valid interview question to me — one of a sort that I’d be likely to ask, were I interviewing. It seems a valuable filter for the people who have a proper understanding of the meaning of “operation undefined” (and an ability to recognize when it applies), which is a quality worth filtering for in people one’s hiring to do programming.

  4. Eric Said,

    March 14, 2004 @ 5:06 pm

    It’d be a valid interview question if it was along the lines of “This is undefined behavior in C. Why, and what does that mean?” Simply giving them the line of code and asking what it does isn’t fair, though. I can’t reasonably expect anyone to understand exactly what’s undefined in every programming language. It’s like copying code from an obfuscated C contest and asking an interview candidate what it’s doing. They’re not going to write code like that (well, hopefully not, and other questions will show that) and we shouldn’t have anyone around who’d write code like that. If they’re never going to have to deal with anything like it, it doesn’t really matter if they don’t know exactly why it works (or doesn’t work) in the way that it does.

  5. Buggin' My Life Away Said,

    March 15, 2004 @ 9:57 am

    Coding Tricks and Interviewing

  6. Buggin' My Life Away Said,

    March 15, 2004 @ 9:58 am

    Coding Tricks and Interviewing

  7. Anonymous & Coward Said,

    March 16, 2004 @ 3:59 am

    Even if the code was stated to have an undefined behavior, it’s a stupid question. Nobody codes like that since 1970 and I agree it’s like asking what value is in PC when you finish a blur filter on a tiff picture in Photoshop 3.0.1.

    A wonderful article was published in an issue of Develop (from Bob3 Johnson : Killing the Time Killers IIRC) explaining why this sort of code is deeply stupid.

    The FIXME thing is interesting as I’m using a similar method. The issue I see with this is that sometimes I don’t remember what fix I needed to make :)

RSS feed for comments on this post