Tuesday, February 24, 2009

Uber-Shaders: Evolution or Optimization

Let's just imagine that you have an uber-shader, and an uber-interface for it.  The uber-shader can do about 100 cool shading tricks, and is set with a struct like this:
struct shader {
 int tex_mode;
 int tex_ref;
 int want_shadows;
 int want_emissive_tex;
 int lit_ref;
A single function "setup" takes a shader struct and sets all of the OpenGL parameters to make it happen.  This function knows what the GLSL code looks like and does the right thing.

This design has been a win for us with X-Plane because:
  • The encapsulated setup function can deal with hw-specific issues.  For example, if you can approximate the shader state request using the fixed-function pipeline on old hardware, this gets hidden in "setup" and client code doesn't care.
  • Since you have access to all state at the same time, you can do things like pick from a set of customized shaders based on state combinations.  (In other words, you can create a large number of highly optimized shaders for specific cases.)

What do you do if you need to change one parameter of the shader?  The naive answer is:
In other words, you tear down OpenGL state, change the request, then build it up again.

Well, that seems inefficient, doesn't it?  What if there is a fast path?  (For example, if all you are changing is polygon offset, all you really need to do is call glPolygonOffset.

One extension to the uber-shader interface is a series of 'evolution' APIs that change a single parameter, e.g.
Naively this is equivalent to the reset/change/setup code above, but the implementation might do something clever, like only rebind the texture unit but leave the shader object alone.

Is this a win?  It seems reasonable to hope so.  For example, if the state being changed is effectively a uniform passed to the GPU (or GPU state not related to shading), we might be a lot closer to minimal state change.


What happens when your shader gets really big and complex?  One problem is that the logic in client code that sets up the shader gets big and complex.  For example: if the source texture has no alpha channel and there is no overlay texture, you can disable alpha blending.  Disabling alpha blending might be a huge performance win - maybe your app is bottlenecked on raster ops.  But having this logic everywhere in the client code isn't good - it means that you're not sure that you have ideal optimization at every shader point.

One way around this is to write an optimization function as part of the uber-shader code, e.g.
The optimizer goes through all the requested shader state and "harmonizes" it.  Because the optimizer is part of the shader setup code, the knowledge of how the shader really works is now isolated to the one place in the app that should know such things.  Now you can put fairly complex logic in place to detect fast paths and take them every time.

Clash of Optimizations

The problem with optimization vs. evolution is they don't play nice together.  The evolution functions assume that you know the start state of your shader before you change it.  But the optimization API might have changed your shader in an unexpected way.  For example:
  • You set up a shader with a texture and blending.
  • You run the optimizer on it.  The optimizer turns off blending because the texture doesn't actually have an alpha channel.
  • You run the evolution API to change to a texture that does have an alpha channel.
At this point you're screwed: blending has been turned off and is gone.

My solution to this is a bit crude but goes like this:
  • There are no evolution APIs.
  • Changing state requires changing the original shader and re-optimizing.
  • Inside the shader, all state changes are lazy and tracked (e.g. we only change GL state if we really need to).
  • We never reset state while in the middle of shader ops.
So in the above case what's going to happen is:
  1. We calculate the optimal shader for filling.
  2. When we go to change state, the "reset" of the shader actually does nothing.
  3. We calculate a new optimal shader.
  4. When we go to set up that new shader, almost all of the GL state change is a no-op.  In particular, if we could have "evolved" (E.g. really we only need to change the texture) that's all we will really change.
This design isn't perfect - it's burning CPU to calculate ideal GL state change at runtime rather than compile time.  That's the down-side.  The up-side is that we get optimal GL state under all conditions.

Friday, February 20, 2009

Who Needs the Inverse-Transpose?

This was pointed out to me by another OpenGL developer: if you don't have non-uniform scaling as part of your model view matrix, the upper 3x3 of the model-view matrix is just as good as the "normal matrix" (which is the transpose of the inverse of the model view matrix) for transforming normals. In fact, the model view matrix is better because it means you don't have to compute an inverse.

I hit this case while looking at a bug in X-Plane 930; for some reason the normal matrix produces junk results when used from a vertex (but not fragment) shader. Totally strange. The nice thing about using the model-view matrix is that it gives me a work-around that is theoretically a faster path. (The alternative of using the normal matrix in the fragment shader means slower per-pixel operations.)

(See FAQ 5.27 for a traditional explanation of why you'd use the inverse-transpose matrix on normals.)

We actually don't have scaling at all on our model view matrix for another reason: fast sphere culling. Basically to cull a sphere you need to:
  1. Transform it from model-view to eye space.
  2. Find the distance from the sphere to the six faces of the viewing volume.
If the distance is larger than the sphere radius and we're outside any of the viewing volume sides (the distance can be signed, so "outside" is part of the numeric test) then we are culled. The trick is: the model view matrix needs to not scale at all or our sphere radius might be wrong.

So in X-Plane, we don't ever scale the model view matrix. We don't offer it as a content option for authors (not that it would be useful anyway) and we do our zooming by manipulating FOV.

The side-effect of this is that we could use the transpose instead of inverse if we need to undo camera rotations, and we don't need to ue the inverse-transpose to transform normals.