Using profiling for optimization

Add comment!

June 6th, 2010

Most of game code optimization comes down to reducing the CPU cycles that you need for each frame. One way to do this is to just optimize every routine as you write it, and make sure it's as fast as possible. However, there is a common saying that 90% of the CPU cycles are spent in 10% of the code. This means that directing all of your optimization work to these bottleneck routines will have 10x the effect of optimizating everything uniformly. So how do you identify these routines? Profiling makes it easy.

The easiest and most intuitive form of profiling is based on recording the call-stack samples. That is, recording the program's call stack at random intervals to determine where it's spending most of its time. It's possible to do this manually), but it's a lot easier to use specialized profiling software. Here's an Overgrowth call stack tree created with Apple's free profiler, Shark, during a multiple character stress test.

Shark profiling

This shows that we are spending 100% of our time in the SDL_main routine, which is expected, since that contains the whole program. Within that, we are spending 60% of our time in the Update function, and 40% of our time in the Draw function. Let's expand the Update function to see more details:

Shark profiling

Here we can see that 35% of all our CPU cycles are going to updating character bone matrices! This seems like the heaviest single function, so let's focus on this for now. The IK routines are in a proof-of-concept stage, so they could use some optimization. Let's go in and clean it up a bit and see what happens:

Shark profiling

Now that the IK is a bit faster, its percentage has decreased and all of the other percentages have increased (because the numerator remained constant while the denominator decreased). How can we optimize this further? Well, the bone matrices are still being updated at 120 Hz, which seems like overkill -- let's see what happens if change that to 60 Hz:

Shark profiling

We have reduced the CPU usage of UpdateBoneMatrices to 15% from its original 35%! After compensating for the percentage inflation, this means that the whole program uses 25% fewer CPU cycles than it did before. If we need to optimize further, then we can just use profiling to find the new bottleneck, and repeat as necessary. This ensures that all CPU optimization effort is directed to where it is most effective.