June 8, 2012

Doom3 Source Code Review: Renderer (Part 3 of 6) >>

idTech4 renderer features three key innovations:


By far the most important is that idTech4 is a multi-pass renderer. The contribution of each light in the view is accumulated in the GPU framebuffer via additive blending. Doom 3 takes full advantage of the fact that color framebuffer registers saturate instead of wrapping around.



    CPU register (wrap around) : 
    ============================

      1111 1111
    + 0000 0100
      ---------
    = 0000 0011


    GPU register ( saturate) : 
    ==========================

      1111 1111
    + 0000 0100
    ---------
    = 1111 1111





I build a custom level to illustrate additive blending. The following screenshot shows three lights in a room resulting in three passes with the result of each pass accumulated in the framebuffer. Notice the white illumination at the center of the screen where all lights blend together .



I modified the engine in order to isolate each light pass, they can be viewed using the left and right arrows:



I modified the engine further in order to see the framebuffer state AFTER each light pass. Use left and right arrow to move in time.


Trivia : It is possible to take the result of each light pass, blend them manually with photoshop (Linear Dodge to mimic OpenGL additive blending) and reach the exact same visual result.

Additive blending combined to support of shadows and bumpmapping resulted in an engine that can still produce very nice result even by 2012 standards:




Architecture

The renderer is not monolithic like previous idTech engines but rather broken down in two parts called Frontend and Backend:






The architecture of the renderer draws a striking similarity with LCC the retargetable compiler that was used to generate the Quake3 Virtual Machine bytecode:




I initially thought the renderer design as influenced by LCC design but the renderer is built in two parts because it was meant to be multi-thread on SMP systems. The front-end would run on one core and the back-end on an other core. Unfortunately due to instability on certain drivers the extra thread had to be disabled and both ends run on the same thread.

Genesis trivia : Archelology can be done with code as well: If you look closely at the unrolled code renderer (frontend,backend) you can clearly see that the engine switches from C++ to C (from objects to static methods):

This is due to the genesis of the code. idTech4 renderer was written by John Carmack using Quake3 engine (C codebase) before he was proficient in C++. The renderer was later integrated to the idtech4 C++ codebase.

How much Quake is there in Doom3 ? Hard to tell but it is funny to see that the main method in the Mac OS X version is:



   - (void)quakeMain;



Frontend/Backend/GPU collaboration

Here is a drawing that illustrate the collaboration between the frontend, the backend and the GPU:


  1. The Frontend analyzes the world state and issues two things:
    • An intermediate representation containing a list of each light contributing to the view. Each light contains a list of the entity surfaces interacting with it.
    • Each light-entity interaction that is going to be used for this frame is also cached in a interaction table. Data is usually uploaded to a GPU VBO.
  2. The Backend takes the intermediate representation as input. It goes through each lights in the list and makes OpenGL draw calls for each entity that interact with the light. The draw command obviously reference the VBO and textures.
  3. The GPU receives the OpenGL commands and render to the screen.


Doom3 Renderer Frontend

The frontend performs the hard part: Visible Surface Determination (VSD). The goal is to find every light/entity combination affecting the view. Those combinations are called interactions. Once each interaction have been found the frontend makes sure everything needed by the backend is uploaded to the GPU Ram (it keeps track of everything via an "interaction table"). The last step is to generate an Intermediate representation that will be read by the backend so it can generate OpenGL Commands.

In the code this is how it looks:

	
	
  - idCommon::Frame
   - idSession::UpdateScreen
     - idSession::Draw
       - idGame::Draw
         - idPlayerView::RenderPlayerView
           - idPlayerView::SingleView
             - idRenderWorld::RenderScene
                - build params
                - ::R_RenderView(params)    //This is the frontend
                  {
                      R_SetViewMatrix
                      R_SetupViewFrustum
                      R_SetupProjection
              
                      //Most of the beef is here.
                      static_cast<idRenderWorldLocal *>(parms->renderWorld)->FindViewLightsAndEntities()
                      {
                          PointInArea              //Walk the BSP and find the current Area
                          FlowViewThroughPortals   //Recursively pass portals to find lights and entities interacting with the view.
                      }
              
                      R_ConstrainViewFrustum     //Improve Z-buffer accuracy by moving far plan as close as the farthest entity.
                      R_AddLightSurfaces         // Find entities that are not in a visible area but still casting a shadow (usually enemies)
                      R_AddModelSurfaces         // Instantiate animated models (for monsters)
                      R_RemoveUnecessaryViewLights
                      R_SortDrawSurfs            // A simple C qsort call. C++ sort would have been faster thanks to inlining.       
                      R_GenerateSubViews
                      R_AddDrawViewCmd 
                  }
              
             
             

Note : The switch from C to C++ is obvious here.


Iti s alwasy easier to understand with a drawing so here is a level: Thanks for the designer's visplanes the engine sees four areas:


Upon loading the .proc the engine also loaded the .map containing all the lights and moving entities definitions. For each light the engine has built a list of each area impacted:



   Light 1 :
   =========

        - Area 0
        - Area 1

   Light 2 :
   =========

        - Area 1
        - Area 2
        - Area 3





At runtime we now have a player position and monsters casting shadows. For scene correctness, all monsters and shadow must be found.


Here is the process:
  1. Find in which area the player is by walking the BSP tree in PointInArea.
  2. FlowViewThroughPortals : Starting from the current area floodfill into other visible area using portal system. Reshape the view frustrum each time a portal is passed: This is beautifully explained in the Realtime rendering book bible:



    . Now we have a list of every lights contributing to the screen and most entities which are stored in the Interaction table:

    
    
       Interaction table (Light/Entity) :
       ==================================
    
           Light 1 - Area    0
           Light 1 - Area    1
           Light 1 - Monster 1
    
           Light 2 - Area    1
           Light 2 - Monster 1
    
    
        

    The interaction table is still incomplete: The interaction Light2-Monster2 is missing, the shadow cast by Monster2 would be missing.

  3. R_AddLightSurfaces will find the entity not in the view but casting shadow by going through each light's area list.

    
    
       Interaction table (Light/Entity) :
       ==================================
    
           Light 1 - Area    0
           Light 1 - Area    1
           Light 1 - Monster 1
    
           Light 2 - Area    1
           Light 2 - Monster 1
           Light 2 - Monster 2
    
    

  4. R_AddModelSurfaces : All interaction have been found, it is now time to upload the vertices and indices to the GPU's VBO if they are not there already. Animated monster geometry is instantiated here as well (model AND shadow volume)
  5. All "intelligent" work has been done. Issue a RC_DRAW_VIEW command via R_AddDrawViewCmd that will trigger the backend to render to the screen.


Doom3 Renderer Backend

The backend is in charge of rendering the Intermediate Representation while accounting for the limitations of the GPU: Doom3 supported five GPU rendering path:

As of 2012 only ARB2 is relevant to modern GPUs: Not only standards provide portability they also increase longevity.

Depending on the card capability idtech4 enabled bump-mapping (A tutorial about using a hellknight I wrote a few years ago) and specular-mapping but all of them try the hardest to save as much fillrate as possible with: