Fast object instancing

Add comment!

November 29th, 2009

When I was testing out character shadows on Hale's Riverhurst level, the framerate was not as smooth as I would like it to be. To remedy this, I'm taking a few days to make the engine handle complex scenes more efficiently.

My first step was to create a system for fast object instancing (copy-drawing). Because of our modular plants and buildings, most Overgrowth levels consist of many instances (copies) of relatively few unique models, so this is an important feature to optimize. As an example, Riverhurst contains about 4000 instances of the same wooden plank model.

Trees are also very good candidates for fast instancing. Here is one of the trees in Overgrowth:

As you can see in the bottom-left of the screenshot, the framerate is very smooth: 90 frames per second. However, if I create a few dozen more trees, the framerate drops down to 14 -- unacceptably low.

This is because the renderer is not taking advantage of the fact that the trees are using the same model. The rendering algorithm looks much like this:

for each object in scene {  
    setup rendering state  
    setup shader  
    setup texture  
    setup vertex arrays  
    setup transform  
    draw  
}

This is a problem, because for simple models like this tree, the setup code takes much more time than the actual drawing. This is compounded by the fact that the tree is actually a collection of even simpler objects!

I decided to restructure the rendering loop to look more like this:

for each object-type in scene {  
    setup rendering state
    setup shader  
    setup texture  
    setup vertex arrays  
    for each object in object-type {
        setup transform  
        draw  
    }  
}

The drawing itself is already as fast as possible, since it's just drawing a vertex buffer object with one line of code, but I further optimized the "setup transform" step by encoding the transformation matrix in the texture coordinates (aka GLSL pseudo-instancing).

Now we can draw that same group of trees at 90 frames per second!

It's still possible to slow the framerate down, but it takes many more trees than before:

Shadows are disabled in these pictures because I'm currently working on efficient texture atlases for shadow maps.

This kind of optimization may seem premature, but I think it's necessary so that we can see what level designs are technologically possible. When planning out the story and gameplay, it's important to know whether or not dense forest scenes and complex town scenes can be rendered smoothly.

Do you have any ideas about further optimizations I could do, or any questions about how the object instancing works?