Fabien Sanglard's Website

May 23th, 2013

Doom3 BFG Source Code Review: Renderer (Part 3 of 4) >>

Doom 3 BFG renderer is at its core still the same: Search for interactions (lights crossing the view frustum) and perform an additive blending pass for each interactions. A process that I described in my first series of articles about Doom3 engine.

The key innovation is that the once mono-threaded renderer is now "heavily" multi-threaded with up to four threads running concurrently.

Architecture

The renderer is still divided in two parts: Frontend (1) and Backend (2). The Frontend does the "smart" work of determining what should be drawn while the backend spends most of its time executing those commands and waiting for the GPU to execute them (that and also waiting for a vsync !).

Threading model

There are two important novelties:

Each ends (Front and Back) are running in their own thread by default.
The Frontend uses the Worker system in order to perform tasks that can be parallelized in three locations:
- Interactions detection :
  1. Find interacting lights (crossing the view frustum).
  2. Find each models visible or crossing an interacting light (for shadows).
- Shadow generation :
  1. Build dynamic shadow volumes.

The Job System is described in the previous article. What was interesting was to look how Jobs are parallelized without mutexes:

1. Find lights (R_AddLights): The idea is to perform a "Build, Mark and Sweep" on a linked list:

Build: The frontend thread builds a list of all lights in the level, each featuring a marker "visible".
Mark : All threads work and run jobs inside the Job System, they concurrently set the markers to 1 or 0.
Sweep: The frontend thread remove any light marked as non-visible.

2. Find Models (R_AddModels) : Each Job works on a specific model. When all workers are done, vertice to draw results are aggregated by a single thread.
3. Build Dynamic Shadow Volumes (R_AddModels): Same idea: Each worker stores shadow volumes results in the model it is working on. A single thread aggregates the results later.

One Path

The previous renderer featured many rendering path: One for each Nvidia and ATI GPUs. The implementation was not very elegant since it relied on switch cases.

The new renderer uses an abstraction layer based on OpenGL for method names. Under the hood either :

OpenGL (PC)
DirectX (Xbox360)
GCM (PS3)

can be used for implementation. The project linker decides which implementation to use.

Shaders

Doom III used ARB assembly shaders that looked like this:

  
    !!ARBfp1.0
    TEMP color;
    MUL color, fragment.texcoord[0].y, 2.0;
    ADD color, 1.0, -color;
    ABS color, color;
    ADD result.color, 1.0, -color;
    MOV result.color.a, 1.0;
    END

Doom III BFG uses OpenGL GLSL 1.50 shaders:

  
    #version 150
    #define PC
  
    void main() {
        vec4 color = ( tex2D ( samp0 , vofi_TexCoord0 ) * gl_Color ) + vofi_TexCoord1 ;
        gl_FragColor . xyz = color. xyz * color. w ;
        gl_FragColor . w = color. w ;
    }

Trivia : The renderer uses OpenGL 3.2 Compatibility Profile since many OpenGL methods have been deprecated since 2004 but the engine uses recent shaders. This is one of the reason Doom III BFG has not been ported to MacOS X: Even the latest Mountain Lion only offers OpenGL 3.2 Core profile.

Unused

The renderer features other cool things are are barely used:

An HLSL to GLSL converter (ConvertCG2GLSL): Used nowhere :( !
The fast DXT texture compressor (YCoCg-DXT5), one of the keystone of idTech5 virtual texturing mentioned in Beyond Programing Shaders 2009 talk: Used for a few tiny textures only.

Rendering Targets

Considering all the post-rendering screenspace processing (Fog and Occulus Rift VR barrel wrapping), I was expecting to see FBOs and framebuffer rendering Targets bindings all over the place. Surprisingly, when such effects are needed the process is to copy the GL_BACK buffer to a texture via glCopyTexImage2D and draw it again (with the proper shader) in the GL_BACK buffer.

Wait, there is more...

A reader "Ben" mentioned a few other collections of performance shifts that I did not have time to review:

GPU skinning.
Using VertexCache as massive global double buffers instead of each RenderModel handling their own VBO.
Using glMap instead of glBuffer for VBO (which originally causes major stalls, limiting enemy counts).

Doom Classic integration.