Fabien Sanglard's Website

June 8, 2012

Doom3 Source Code Review: Renderer (Part 3 of 6) >>

idTech4 renderer features three key innovations:

"Unified Lighting and Shadows": The level faces and the entities faces go through the same pipeline and shaders.
"Visible Surface Determination": A portal system allows VSD to be performed at runtime: No more PVS.
"Multi-pass Rendering".

By far the most important is that idTech4 is a multi-pass renderer. The contribution of each light in the view is accumulated in the GPU framebuffer via additive blending. Doom 3 takes full advantage of the fact that color framebuffer registers saturate instead of wrapping around.



    CPU register (wrap around) : 
    ============================

      1111 1111
    + 0000 0100
      ---------
    = 0000 0011


    GPU register ( saturate) : 
    ==========================

      1111 1111
    + 0000 0100
    ---------
    = 1111 1111

I build a custom level to illustrate additive blending. The following screenshot shows three lights in a room resulting in three passes with the result of each pass accumulated in the framebuffer. Notice the white illumination at the center of the screen where all lights blend together .

I modified the engine in order to isolate each light pass, they can be viewed using the left and right arrows:

Pass 1: Blue light

I modified the engine further in order to see the framebuffer state AFTER each light pass. Use left and right arrow to move in time.

GPU Framebuffer after the first pass.

Trivia : It is possible to take the result of each light pass, blend them manually with photoshop (Linear Dodge to mimic OpenGL additive blending) and reach the exact same visual result.

Additive blending combined to support of shadows and bumpmapping resulted in an engine that can still produce very nice result even by 2012 standards:

Architecture

The renderer is not monolithic like previous idTech engines but rather broken down in two parts called Frontend and Backend:

Frontend:

Analyze world database and determine what contributes to the view.
Store the result in an Intermediate Representation (def_view_t) and upload/reuse cache geometry in the GPU's VBO.
Issue a RC_DRAW_VIEW command.

Backend:

The RC_DRAW_VIEW wakes up the backend.
Use the Intermediate Representation as input and issue commands to the GPU using the VBOs.

The architecture of the renderer draws a striking similarity with LCC the retargetable compiler that was used to generate the Quake3 Virtual Machine bytecode:

I initially thought the renderer design as influenced by LCC design but the renderer is built in two parts because it was meant to be multi-thread on SMP systems. The front-end would run on one core and the back-end on an other core. Unfortunately due to instability on certain drivers the extra thread had to be disabled and both ends run on the same thread.

Genesis trivia : Archelology can be done with code as well: If you look closely at the unrolled code renderer (frontend,backend) you can clearly see that the engine switches from C++ to C (from objects to static methods):

This is due to the genesis of the code. idTech4 renderer was written by John Carmack using Quake3 engine (C codebase) before he was proficient in C++. The renderer was later integrated to the idtech4 C++ codebase.

How much Quake is there in Doom3 ? Hard to tell but it is funny to see that the main method in the Mac OS X version is:



   - (void)quakeMain;

Frontend/Backend/GPU collaboration

Here is a drawing that illustrate the collaboration between the frontend, the backend and the GPU:

The Frontend analyzes the world state and issues two things:
- An intermediate representation containing a list of each light contributing to the view. Each light contains a list of the entity surfaces interacting with it.
- Each light-entity interaction that is going to be used for this frame is also cached in a interaction table. Data is usually uploaded to a GPU VBO.
The Backend takes the intermediate representation as input. It goes through each lights in the list and makes OpenGL draw calls for each entity that interact with the light. The draw command obviously reference the VBO and textures.
The GPU receives the OpenGL commands and render to the screen.

Doom3 Renderer Frontend

The frontend performs the hard part: Visible Surface Determination (VSD). The goal is to find every light/entity combination affecting the view. Those combinations are called interactions. Once each interaction have been found the frontend makes sure everything needed by the backend is uploaded to the GPU Ram (it keeps track of everything via an "interaction table"). The last step is to generate an Intermediate representation that will be read by the backend so it can generate OpenGL Commands.

In the code this is how it looks:

	
	
  - idCommon::Frame
   - idSession::UpdateScreen
     - idSession::Draw
       - idGame::Draw
         - idPlayerView::RenderPlayerView
           - idPlayerView::SingleView
             - idRenderWorld::RenderScene
                - build params
                - ::R_RenderView(params)    //This is the frontend
                  {
                      R_SetViewMatrix
                      R_SetupViewFrustum
                      R_SetupProjection
              
                      //Most of the beef is here.
                      static_cast<idRenderWorldLocal *>(parms->renderWorld)->FindViewLightsAndEntities()
                      {
                          PointInArea              //Walk the BSP and find the current Area
                          FlowViewThroughPortals   //Recursively pass portals to find lights and entities interacting with the view.
                      }
              
                      R_ConstrainViewFrustum     //Improve Z-buffer accuracy by moving far plan as close as the farthest entity.
                      R_AddLightSurfaces         // Find entities that are not in a visible area but still casting a shadow (usually enemies)
                      R_AddModelSurfaces         // Instantiate animated models (for monsters)
                      R_RemoveUnecessaryViewLights
                      R_SortDrawSurfs            // A simple C qsort call. C++ sort would have been faster thanks to inlining.       
                      R_GenerateSubViews
                      R_AddDrawViewCmd 
                  }

Note : The switch from C to C++ is obvious here.

Iti s alwasy easier to understand with a drawing so here is a level: Thanks for the designer's visplanes the engine sees four areas:

Upon loading the .proc the engine also loaded the .map containing all the lights and moving entities definitions. For each light the engine has built a list of each area impacted:



   Light 1 :
   =========

        - Area 0
        - Area 1

   Light 2 :
   =========

        - Area 1
        - Area 2
        - Area 3

At runtime we now have a player position and monsters casting shadows. For scene correctness, all monsters and shadow must be found.

Here is the process:

Find in which area the player is by walking the BSP tree in PointInArea.
FlowViewThroughPortals : Starting from the current area floodfill into other visible area using portal system. Reshape the view frustrum each time a portal is passed: This is beautifully explained in the Realtime rendering book bible:

. Now we have a list of every lights contributing to the screen and most entities which are stored in the Interaction table:
```
   Interaction table (Light/Entity) :
   ==================================

       Light 1 - Area    0
       Light 1 - Area    1
       Light 1 - Monster 1

       Light 2 - Area    1
       Light 2 - Monster 1


    
```
The interaction table is still incomplete: The interaction Light2-Monster2 is missing, the shadow cast by Monster2 would be missing.

R_AddLightSurfaces will find the entity not in the view but casting shadow by going through each light's area list.



   Interaction table (Light/Entity) :
   ==================================

       Light 1 - Area    0
       Light 1 - Area    1
       Light 1 - Monster 1

       Light 2 - Area    1
       Light 2 - Monster 1
       Light 2 - Monster 2

R_AddModelSurfaces : All interaction have been found, it is now time to upload the vertices and indices to the GPU's VBO if they are not there already. Animated monster geometry is instantiated here as well (model AND shadow volume)
All "intelligent" work has been done. Issue a RC_DRAW_VIEW command via R_AddDrawViewCmd that will trigger the backend to render to the screen.

Doom3 Renderer Backend

The backend is in charge of rendering the Intermediate Representation while accounting for the limitations of the GPU: Doom3 supported five GPU rendering path:

R10 (GeForce256)
R20 (GeForce3)
R200 (Radeon 8500)
ARB (OpenGL 1.X)
ARB2 (OpenGL 2.0)

As of 2012 only ARB2 is relevant to modern GPUs: Not only standards provide portability they also increase longevity.

Depending on the card capability idtech4 enabled bump-mapping (A tutorial about using a hellknight I wrote a few years ago) and specular-mapping but all of them try the hardest to save as much fillrate as possible with:

OpenGL Scissor test (specific to each light, generated by the frontend
Filling the Z-buffer as first step.

	
	
      idRenderSystemLocal::EndFrame
       R_IssueRenderCommands
         RB_ExecuteBackEndCommands
           RB_DrawView
            RB_ShowOverdraw
            RB_STD_DrawView
            {
               RB_BeginDrawingView     // clear the z buffer, set the projection matrix, etc
               RB_DetermineLightScale
               
               RB_STD_FillDepthBuffer  // fill the depth buffer and clear color buffer to black.
               
                // Go through each light and draw a pass, accumulating result in the framebuffer
               _DrawInteractions  
               {
                   5 GPU specific path
                   
                   switch (renderer)
                   {
                      R10  (GeForce256)
                      R20  (geForce3)
                      R200 (Radeon 8500)
                      ARB  (OpenGL 1.X)
                      ARB2 (OpenGL 2.0)
                }
                
                // disable stencil shadow test
                qglStencilFunc( GL_ALWAYS, 128, 255 );

                RB_STD_LightScale
                
                //draw any non-light dependent shading passes (screen,neon, etc...)
                int  processed = RB_STD_DrawShaderPasses( drawSurfs, numDrawSurfs )   
                
                // fob and blend lights
                RB_STD_FogAllLights();

                // now draw any post-processing effects using _currentRender
                if ( processed < numDrawSurfs ) 
                   RB_STD_DrawShaderPasses( drawSurfs+processed, numDrawSurfs-processed );
	
	                        
             }

In order to follow the backend steps, I took a famous screen from Doom3 level And I froze the engine at every steps in the rendition :

Since Doom3 uses bumpmapping and specular mapping on top of the diffuse texture, to render a surface can take up to 3 textures lookup. Since a pixel can potentially be impacted by 5-7 lights it is not crazy to assume 21 textures lookup per pixels..not even accounting for overdraw. The first step of the backend is to reach 0 overdraw: Disable every shaders, write only to the depth buffer and render all geometry:

The depth buffer is now filled. From now on depth write is disabled and depth test is enabled.

Render first to the z-buffer may seem counter-productive at first but it is actually extremely valuable to save fillrate:

Prevent from running expensive shaders on non-visible surfaces.
Prevent from rendering non visible shadows to the stencil buffer.
Since surfaces are rendered in no particular order (back to front or front to back) there would be a lot of overdraw. This step totally remove overdraw.

Note that the color buffer is cleared to black: Doom3 world is naturally pitch black since there is no "ambient" light: In order to be visible a surface/polygon must interact. with a light. This explains why Doom3 was so dark !

After this the engine is going to perform 11 passes (one for each light). I broke down the rendering process . The next slideshow shows each individual light pass: you can move in time with the left and right arrow.

After Light Pass 1

Now the details of what happens in the GPU framebuffer:

After Light 1 Pass.

Stencil buffer and Scissors test:

Interactive surfaces

The last step in rendition is RB_STD_DrawShaderPasses: It render all surfaces that don't need light. Among them are the screen and the amazing interactive GUI surfaces that is one of the part of the engine John Carmack was the most proud off. I don't think this part of the engine ever got the respect it deserve. Back in 2004 the introduction cinematic used to be a video that would play fullscreen. After the video played the level would load and the engine would kick in... but not in Doom III:

Steps :

Level load.
Cinematic starts playing.
At 5mn5s the camera moves away.
The video we just saw was a SCREEN IN THE GAME ENGINE !

I remember when I saw this for the first time I thought it was a trick. I thought the video player would cut and the designer had a texture on the screen and a camera position that would match the last frame of the video. I was wrong: idTech4 can actually play videos in a GUI interactive surface elements. For this it reused RoQ: The technology that Graeme Devine brought with him when he joined id Software.

Trivia : The RoQ used for the intro was impressive for 2005 and it was an audacious move to have it in a screen within the game:

It is 30 frames per seconds.
Each frame is 512x512: Quite a high resolution at the time
Each frame is generated in idCinematicLocal::ImageForTime on the CPU and uploaded on the fly to the GPU as an OpenGL texture.

But the interactive surfaces can do so much more than that thanks to scripting and its ability to call native methods. Some people got really interested and managed to have Doom 1 run in it !

Trivia :The Interactive Surface technology was also reused in order to design all the menus in Doom3 (settings, main screen etc,....).

So much more....

This page is only the tip of the iceberg and it is possible to go so much deeper.