Doom3 Source Code Review: Renderer (Part 3 of 6) >>
idTech4 renderer features three key innovations:
- "Unified Lighting and Shadows": The level faces and the entities faces go through the same pipeline and shaders.
- "Visible Surface Determination": A portal system allows VSD to be performed at runtime: No more PVS.
- "Multi-pass Rendering".
By far the most important is that idTech4 is a multi-pass renderer. The contribution of each light in the view is accumulated in the GPU framebuffer via additive blending. Doom 3 takes full advantage of the fact that color framebuffer registers saturate instead of wrapping around.
CPU register (wrap around) : ============================ 1111 1111 + 0000 0100 --------- = 0000 0011 GPU register ( saturate) : ========================== 1111 1111 + 0000 0100 --------- = 1111 1111
I build a custom level to illustrate additive blending. The following screenshot shows three lights in a room resulting in three passes with the result of each pass accumulated in the framebuffer. Notice the white illumination at the center of the screen where all lights blend together .
I modified the engine in order to isolate each light pass, they can be viewed using the left and right arrows:
I modified the engine further in order to see the framebuffer state AFTER each light pass. Use left and right arrow to move in time.
Trivia : It is possible to take the result of each light pass, blend them manually with photoshop (Linear Dodge to mimic OpenGL additive blending) and reach the exact same visual result.
Additive blending combined to support of shadows and bumpmapping resulted in an engine that can still produce very nice result even by 2012 standards:
Architecture
The renderer is not monolithic like previous idTech engines but rather broken down in two parts called Frontend and Backend:
- Frontend:
- Analyze world database and determine what contributes to the view.
- Store the result in an Intermediate Representation (
def_view_t
) and upload/reuse cache geometry in the GPU's VBO. - Issue a RC_DRAW_VIEW command.
- Backend:
- The RC_DRAW_VIEW wakes up the backend.
- Use the Intermediate Representation as input and issue commands to the GPU using the VBOs.
The architecture of the renderer draws a striking similarity with LCC the retargetable compiler that was used to generate the Quake3 Virtual Machine bytecode:
I initially thought the renderer design as influenced by LCC design but the renderer is built in two parts because it was meant to be multi-thread on SMP systems. The front-end would run on one core and the back-end on an other core. Unfortunately due to instability on certain drivers the extra thread had to be disabled and both ends run on the same thread.
Genesis trivia : Archelology can be done with code as well: If you look closely at the unrolled code renderer (frontend,backend) you can clearly see that the engine switches from C++ to C (from objects to static methods):
This is due to the genesis of the code. idTech4 renderer was written by John Carmack using Quake3 engine (C codebase) before
he was proficient in C++. The renderer was later integrated to the idtech4 C++ codebase.
How much Quake is there in Doom3 ? Hard to tell but it is funny to see that the main method in the Mac OS X version is:
- (void)quakeMain;
Frontend/Backend/GPU collaboration
Here is a drawing that illustrate the collaboration between the frontend, the backend and the GPU:
- The Frontend analyzes the world state and issues two things:
- An intermediate representation containing a list of each light contributing to the view. Each light contains a list of the entity surfaces interacting with it.
- Each light-entity interaction that is going to be used for this frame is also cached in a interaction table. Data is usually uploaded to a GPU VBO.
- The Backend takes the intermediate representation as input. It goes through each lights in the list and makes OpenGL draw calls for each entity that interact with the light. The draw command obviously reference the VBO and textures.
- The GPU receives the OpenGL commands and render to the screen.
Doom3 Renderer Frontend
The frontend performs the hard part: Visible Surface Determination (VSD). The goal is to find every light/entity combination affecting the view.
Those combinations are called interactions. Once each interaction have been found the frontend makes sure everything needed by the backend
is uploaded to the GPU Ram (it keeps track of everything via an "interaction table"). The last step is to generate an Intermediate representation
that will be read by the backend so it can generate OpenGL Commands.
In the code this is how it looks:
- idCommon::Frame - idSession::UpdateScreen - idSession::Draw - idGame::Draw - idPlayerView::RenderPlayerView - idPlayerView::SingleView - idRenderWorld::RenderScene - build params - ::R_RenderView(params) //This is the frontend { R_SetViewMatrix R_SetupViewFrustum R_SetupProjection //Most of the beef is here. static_cast<idRenderWorldLocal *>(parms->renderWorld)->FindViewLightsAndEntities() { PointInArea //Walk the BSP and find the current Area FlowViewThroughPortals //Recursively pass portals to find lights and entities interacting with the view. } R_ConstrainViewFrustum //Improve Z-buffer accuracy by moving far plan as close as the farthest entity. R_AddLightSurfaces // Find entities that are not in a visible area but still casting a shadow (usually enemies) R_AddModelSurfaces // Instantiate animated models (for monsters) R_RemoveUnecessaryViewLights R_SortDrawSurfs // A simple C qsort call. C++ sort would have been faster thanks to inlining. R_GenerateSubViews R_AddDrawViewCmd }
Note : The switch from C to C++ is obvious here.
Iti s alwasy easier to understand with a drawing so here is a level: Thanks for the designer's visplanes the engine sees four areas:
Upon loading the .proc
the engine also loaded the .map
containing all the lights and moving entities definitions.
For each light the engine has built a list of each area impacted:
Light 1 : ========= - Area 0 - Area 1 Light 2 : ========= - Area 1 - Area 2 - Area 3
At runtime we now have a player position and monsters casting shadows. For scene correctness, all monsters and shadow must be found.
Here is the process:
- Find in which area the player is by walking the BSP tree in
PointInArea
. FlowViewThroughPortals
: Starting from the current area floodfill into other visible area using portal system. Reshape the view frustrum each time a portal is passed: This is beautifully explained in the Realtime rendering book bible:
. Now we have a list of every lights contributing to the screen and most entities which are stored in the Interaction table:
Interaction table (Light/Entity) : ================================== Light 1 - Area 0 Light 1 - Area 1 Light 1 - Monster 1 Light 2 - Area 1 Light 2 - Monster 1
The interaction table is still incomplete: The interaction Light2-Monster2 is missing, the shadow cast by Monster2 would be missing.
R_AddLightSurfaces
will find the entity not in the view but casting shadow by going through each light's area list.
Interaction table (Light/Entity) : ================================== Light 1 - Area 0 Light 1 - Area 1 Light 1 - Monster 1 Light 2 - Area 1 Light 2 - Monster 1 Light 2 - Monster 2
R_AddModelSurfaces
: All interaction have been found, it is now time to upload the vertices and indices to the GPU's VBO if they are not there already. Animated monster geometry is instantiated here as well (model AND shadow volume)- All "intelligent" work has been done. Issue a
RC_DRAW_VIEW
command viaR_AddDrawViewCmd
that will trigger the backend to render to the screen.
Doom3 Renderer Backend
The backend is in charge of rendering the Intermediate Representation while accounting for the limitations of the GPU: Doom3 supported five GPU rendering path:
- R10 (GeForce256)
- R20 (GeForce3)
- R200 (Radeon 8500)
- ARB (OpenGL 1.X)
- ARB2 (OpenGL 2.0)
As of 2012 only ARB2 is relevant to modern GPUs: Not only standards provide portability they also increase longevity.
Depending on the card capability idtech4 enabled bump-mapping (A tutorial about using a hellknight I wrote a few years ago) and specular-mapping but all of them try the hardest to save as much fillrate as possible with:
- OpenGL Scissor test (specific to each light, generated by the frontend
- Filling the Z-buffer as first step.
The backend unrolled code is as follow:
idRenderSystemLocal::EndFrame R_IssueRenderCommands RB_ExecuteBackEndCommands RB_DrawView RB_ShowOverdraw RB_STD_DrawView { RB_BeginDrawingView // clear the z buffer, set the projection matrix, etc RB_DetermineLightScale RB_STD_FillDepthBuffer // fill the depth buffer and clear color buffer to black. // Go through each light and draw a pass, accumulating result in the framebuffer _DrawInteractions { 5 GPU specific path switch (renderer) { R10 (GeForce256) R20 (geForce3) R200 (Radeon 8500) ARB (OpenGL 1.X) ARB2 (OpenGL 2.0) } // disable stencil shadow test qglStencilFunc( GL_ALWAYS, 128, 255 ); RB_STD_LightScale //draw any non-light dependent shading passes (screen,neon, etc...) int processed = RB_STD_DrawShaderPasses( drawSurfs, numDrawSurfs ) // fob and blend lights RB_STD_FogAllLights(); // now draw any post-processing effects using _currentRender if ( processed < numDrawSurfs ) RB_STD_DrawShaderPasses( drawSurfs+processed, numDrawSurfs-processed ); }
In order to follow the backend steps, I took a famous screen from Doom3 level And I froze the engine at every steps in the rendition :
Since Doom3 uses bumpmapping and specular mapping on top of the diffuse texture, to render a surface can take up to
3 textures lookup. Since a pixel can potentially be impacted by 5-7 lights it is not crazy to assume 21 textures lookup per
pixels..not even accounting for overdraw. The first step of the backend is to reach 0 overdraw: Disable every shaders, write only to the depth buffer and render all geometry:
The depth buffer is now filled. From now on depth write is disabled and depth test is enabled.
Render first to the z-buffer may seem counter-productive at first but it is actually extremely valuable to save fillrate:
- Prevent from running expensive shaders on non-visible surfaces.
- Prevent from rendering non visible shadows to the stencil buffer.
- Since surfaces are rendered in no particular order (back to front or front to back) there would be a lot of overdraw. This step totally remove overdraw.
Note that the color buffer is cleared to black: Doom3 world is naturally pitch black since there is no "ambient" light: In order to be visible a surface/polygon must interact.
with a light. This explains why Doom3 was so dark !
After this the engine is going to perform 11 passes (one for each light).
I broke down the rendering process . The next slideshow shows each individual light pass: you can move in time with the left and right arrow.
Now the details of what happens in the GPU framebuffer:
Before each light pass, if a shadow is cast by the light then the stencil test has to be enable. I won't elaborate on the depth-fail/depth pass controversy and the infamous move of Creative Labs. The source code released features the depth pass algorithm which is slower since it requires building better shadow volume. Some people have managed to put the depth fail algorithm back in the source but be aware that this is only legal in Europe !
In order to save fillrate the frontend generate a screen space rectangle to be used as scissor test by OpenGL. This avoid running shader on pixels where the surface would have been pitch black anyway due to distance from the light.
The stencil buffer just before light pass 8. Any non-black area will be lit while the other will prevent writing to the framebuffer: The mask principle sis clearly visible
The stencil buffer just before light pass 7. The scissor set to save fillrate is clearly visible.
Interactive surfaces
The last step in rendition is RB_STD_DrawShaderPasses
: It render all surfaces that don't need light. Among them
are the screen and the amazing interactive GUI surfaces that is one of the part of the engine John Carmack was the most proud off. I don't think this part of the engine ever got the respect it deserve. Back in 2004 the introduction cinematic used to be a video that would play fullscreen. After the video played the level would load and the engine would kick in... but not in Doom III:
Steps :
- Level load.
- Cinematic starts playing.
- At 5mn5s the camera moves away.
- The video we just saw was a SCREEN IN THE GAME ENGINE !
I remember when I saw this for the first time I thought it was a trick. I thought the video player would cut and the designer had a texture on the screen and a camera position that would match the last frame of the video. I was wrong: idTech4 can actually play videos in a GUI interactive surface elements. For this it reused RoQ: The technology that Graeme Devine brought with him when he joined id Software.
Trivia :
The RoQ used for the intro was impressive for 2005 and it was an audacious move to have it in a screen within the game:
- It is 30 frames per seconds.
- Each frame is 512x512: Quite a high resolution at the time
- Each frame is generated in
idCinematicLocal::ImageForTime
on the CPU and uploaded on the fly to the GPU as an OpenGL texture.
But the interactive surfaces can do so much more than that thanks to scripting and its ability to call native methods.
Some people got really interested and managed to have Doom 1 run in it !
Trivia :The Interactive Surface technology was also reused in order to design all the menus in Doom3 (settings, main screen etc,....).
So much more....
This page is only the tip of the iceberg and it is possible to go so much deeper.
Recommended readings
If you are reading this and you don't own a copy of Realtime rendering you are depriving yourself of priceless information.