Fabien Sanglard's Website

January 13th, 2010

Doom engine code review

Before studying the iPhone version, it was important for me to understand how Doom engine WAS performing rendition back in 1993. After all the OpenGL port must reuse the same data from the WAD archive. Here are my notes about Doom 1993 renderer, maybe it will help someone to dive in.

January 13th, 2010 : Reddit annihilated my bandwidth (4 hours after publication). Back online.
February 8th, 2010 : Slashdotted, moving videos to YouTube.com. I can't keep up with 5000 visitors a day.
July 23th, 2011 : Two videos surfaced about doom development: A Visit to id Software and Doom: Post-Mortem 2011.

From designer screen to player screen

Maps were designed in 2D by a level designer using Doom editor (DoomED). LINEDEFS were describing closed sectors (SECTORS in the source code), the third dimension (height) was defined on a per sector basis. The first level of Doom E1M1 looks like this:

When the map was finished, it was sliced via Binary Space Partitioning. Recursively a LINEDEF was chosen and its plan extended as splitting plan. LINEDEF were hence cut into segments (SEGS) until only convex SubSectors (SSECTOR in the code) remained.

Trivia: Both DoomED and iBSP were programmed using....Objective-C on a NextStep workstation. Fifteen years later, the same language, running on almost the same operating system are running the game in the palm of our hand! I did a bit of web archeology and managed to find the original source code of idbsp, it's worth taking a look:

Following is an example of the first map being recursively split:

Recursion level 1

In blue a wall is selected and extended as splitting plan (red). Splitter were selected in order to balance the BSP tree but also to limit the number of SEGS generated. The green bounding boxes are used later to discard entire chunks of map.

Recursion level 2 (only right subspace)

In the end, SECTORS were spliced into convex sub-sectors (called SSECTORS) and LINEDEFS were sliced into segments (called SEGS):

The big picture of runtime

Here is what the main rendering method (R_RenderPlayerView) looks like:



	void R_RenderPlayerView (player_t* player)
	{	
		[..]

		R_RenderBSPNode (numnodes-1);
        
		R_DrawPlanes ();
    
		R_DrawMasked ();

	}

Four things happen:

R_RenderBSPNode : All subSectors in the map are sorted using the BSP tree. Big chunks are discarded via bounding box (green in the previous drawing).
R_RenderBSPNode : Visible SEGS are projected on screen via a lookup table and clipped via an occlusion array. Walls are drawn as column of pixels. The size of a column is determined by the distance from the player POV, the Y position of a column via the height relative to the played. The base and the top of the walls generate visplanes, a structure used to render the floor and ceiling (called flats).
R_DrawPlanes : Visplanes are converted from column of pixels to lines of pixels and rendered to screen.
R_DrawMasked : The "things" (enemies, objects and transparent walls) are rendered.

Binary Space Partition sorting

Two examples with E1M1 (Doom first map) and a BSP looking as follow:

//Coordinate system origin in lower left corner // Plane equation ax + by + c = 0 with // unit normal vector = (a,b) // Root plane (splitting map between zone A and B): normal = (-1,0) c = 3500 // A plane (splitting zone A between zone A1 and A2): normal = (1,0) c = -2500 // B plane (splitting zone B between zone B1 and B2): normal = (-0.24,0.94) c = -650 // Injecting any point coordinate (x,y) in a // plane equation gives the distance from that plane.

BSP walking always start at the root node, sorting both subspaces. Recursion follows on both node children.

Example 1 : Player (green dot) watching through the window from point p=(2300,1900):


  // Player position = ( 2300, 1900 )

  // R_RenderBSPNode run against AB splitter (-x + 3500 = 0):
  
     -2300 + 3500 = 1200 
  
     Result is positive: Closest subspace is in the FRONT of the splitting plane. (A is closer than B).
  
  
  
  
  
  // R_RenderBSPNode now run recursively against the two child of the root node: A1/A2 splitter and B1/B2 splitter. 
  
  // R_RenderBSPNode run against A1/A2 splitter (x - 2500 = 0):
  
     2300 - 2500 = -200
  
     Result is negative so the closest subspace is in the BACK of the splitting plane. (A1 is closer than A2).
  
  
  // R_RenderBSPNode run against B1/B2 splitter (-0.24x +0.97y - 650 = 0):
  
     -0.24 * 2300 + 0.97 * 1900- 650 = 641
  
     Result is positive so the closest subspace is in the FRONT of the splitting plane. (B1 is closer than B2).
  
 
  
   
     Result: Sorted zones, from near to far:
 
     { A1, A2, B1, B2 }

Example 2 : Player (green dot) watching from the secret balcony a point p=(5040, 2400 ):


  // Player position = ( 5040, 2400 )

  // R_RenderBSPNode run against AB splitter (-x + 3500 = 0):
  
      -5040 + 3500 = -1540 
  
      Result is negative: Closest subspace is in the BACK of the splitting plane. (B is closer than A).
  
  
  
  
  
  // R_RenderBSPNode now recursively run against the two child of the root node: A1/A2 splitter and B1/B2 splitter. 
   
  // R_RenderBSPNode run against B1/B2 splitter (-0.24x +0.97y - 650 = 0):
  
     -0.24 * 5040 + 0.97 * 2400 - 650 = 468
  
     Result is positive so the closest subspace is in the FRONT of the splitting plane. (B1 is closer than B2).
  
 
  // R_RenderBSPNode run against A1/A2 splitter (x - 2500 = 0):
  
    5040 - 2500 = 2540
  
    Result is positive so the closest subspace is in the FRONT of the splitting plane. (A2 is closer than A1).
   
  	
  	
    Result: Sorted zones, from near to far:
 
    { B1, B2, A2, A1 }

BSP allowed SEGS sorting from anywhere in the map at a consistent speed, regardless of the player's location. At the cost of one cross-product per plane. Entire part of the map were also discarded via bounding box testing.

Note : It is not immediately apparent but BSP sorting all SEGS around the player, even the ones he/she is not looking at, frustrum culling is essential when using BSP.

Walls

With the BSP sorting walls (SEGS) near to far, the closest 256 walls were rendered. Every SEGS's two vertices were converted to two angles (relative to the player's position).

Note : In 1993, only the very high-end 486DX machines had a FPU (floating point unit) hence Doom engine was doing all angles calculation via Binary Angular Measurement (BAMs), relying on int only, float is rarely used. For the same reason, sub integer precision is achieved via fixed_t a 16.16 binary fixed point (more about this here and here).

Once converted to angle screen space X coordinate are retrieved via a lookup table (viewangletox). Because BAMs were int, angles were first scaled down from 32 bits to 13 bits via a 19bits right shift in order to fit the 8k lookup table.
The wall is then clipped against an occlusion array (solidsegs, some articles about Doom engine mention a linked list but it does not look like it). After clipping, space remaining was interpolated and drawn as column of pixels: The height and Y coordinate of the column of pixel were based respectively on the SEGS's sector height and its distance from player POV.

Note about surface culling: Backface culling was done via angle2-angle1 > 180 . Only walls within the Field of View were actually rendered.

Note : Not all walls were made of an unique texture, walls could have a lower texture, a upper texture and a middle texture ( that could be transparent or semi-transparent). This was handy to simulate windows as it can be seen in the next video: the "window" is actually a sector with higher floor and no middle texture.

Trivia : Because walls were rendered as vertical columns, wall textures were stored in memory rotated 90 degrees to the left. This trick allowed to take full advantage of the CPU precaching capability: To read a wall texel from the RAM also prepopulated the CPU cache with eight adjacent texels on each side. Hence the subsequent read was already in the RAM cache and a substantial read latency reduction was achieved. You can read "The Art of Assembly Language programming" (3.2.4 Cache Memory) for more information about precaching and memory alignment.

Flats (Ceiling and Floor) or the infamous visplanes

While drawing column of walls, top and bottom screen space coordinates were used to generate "visplanes": An area in screen space (not necessarily continuous horizontally). Here is a visplane_t as declared in Doom engine.



	//
	// Now what is a visplane, anyway?
	// 


	typedef struct
	{
		fixed_t		height;
		int		picnum;
		int		lightlevel;

		int		minx;
		int		maxx;
  
		byte		top[SCREENWIDTH];
		byte		bottom[SCREENWIDTH];
	
	} visplane_t;

The first part of the structure hold information about the "material", (height picnum lightlevel). The four last members defines the screenspace zone covered.

If two subSectors shared the same material (height, texture and illumination level) Doom engine tried to merge them together but because of the visplane_t structure limitation it was not always possible.

For the entire width of the screen, a visplane can store the location of a column of pixel (because visplanes are deduced from the walls projection on screen, they are created as column of pixels).

Here are the starting screen's three main visplanes:

The green one is particularly interesting as it illustrates visplane_t ability to store discontinued areas (but only horizontally). As long as the column is continuous, visplane can store it. This limitation shows in the engine: some subSectors can be merged and be rendered via only one visplane but if something come between vertically they cannot be merged.

Here is a screenshot illustrating visplane fragmentation and the video associated.

Trivia : Visplanes hardcoded limit (MAXVISPLANES 128) was a plague for modders as the game would crash and go back to DOS. Two issues could arise:

"R_FindPlane: no more visplanes" : The total number of different visplanes materials (height, texture and illumination level) is over 128.
R_DrawPlanes: visplane overflow (%i) : Visplanes fragmentation is important and number of visplanes is over 128.

Why limit to 128 ? Two stages in the renderer pipepline requested to search in the list of visplanes (via R_FindPlane), this was done via linear search and it was probably too expensive beyond 128. Lee Killough later lifted this limitation, replacing linear seach with a chained hash table implementation.

Things and transparent walls.

When all solid/" middle texture transparent" walls and ceiling/floors surfaces were rendered, only remained the "things", regrouping enemies, barrel, ammos and semi-transparent walls. Those are rendered far to near but are not projected into screen space using the wall's lookup table, it's all done with 16.16 binary fixed point calculations.

Note : This video illustrate some of the worse case scenario where some pixels are written three times.

Profiling

Loading "Chocolate Doom" in Mac OS X's Instrument allowed to do some profiling:

It seems the port is pretty fidele to Vanilla Doom: Most time is spent drawing walls (R_DrawColumn), ceiling/floor(R_DrawSpan) and things (R_DrawMaskedColumn ). Besides drawing I noticed the high cost of wall interpolation (R_RenderSegLoop) and visplane conversion from columns to lines of pixels (R_MakeSpans). Finally comes monsters IA (R_MobjThinker) and BSP traversal (R_RenderBSPNode)

With an inverted call tree, we can see that most of the work is indeed done in the BSP traversal, wall rendition and visplanes generation: R_RenderBSPNode (second column is for percentage of time).

All together

Finally, a video of the legendary first screen generation where you can see in order:

Walls, near to far, as column of pixels
Flats, near to far, as lines of pixels.
Things, far to near.

Trivia

Since Doom was developed on a NeXTSTEP system using a flat virtual memory model, id Software decided against using EMS or XMS like most games of the time. Instead they used DOS/4G: a memory extender that allowed a software to access protected mode RAM on a real mode operating systen (DOS).

The NexT workstation was so powerful that it was able to run the editor, the game and a debugger. When the game was stable enough the code was sent of the network to a PC where it was compiled for DOS/x86 by a Watcom compiler. Thanks to DOS/4G the code ran with the same memory model on PC and NeXT.
A nice video that is a good complement to the book "Masters of Doom":
Many details from John Romero on rome.ro and planetromero.com.