February 14th, 2013

Duke Nukem 3D: BUILD ENGINE INTERNALS (PART 2 OF 4) >>

Build powered Duke Nukem 3D and many other successful games such as Shadow Warrior and Blood . Upon release on January 29th, 1996 it obliterated Doom engine with innovative features:

The crown would be claimed back by highend Pentiums running Quake in June 1996 ... but for years Build delivered high value, freedom to designers and most important: Speed on the most common PCs of the time.

Many thanks to Ken Silverman for proof-reading this article: His patience and diligent replies to my emails were instrumental.

Key concept: The Portal system.

Most 3D engines partitioned their map via Binary Space Partition or an Octree. Doom for example preprocessed each map via a time consuming method (up to 30 minutes) resulting in a BSP Tree allowing :

This speed gain was a trade-off : Walls could not move. Build removed this limitation by not preprocessing its maps and relying on a Portal system:

In this map, the game designer drew 5 sectors (left) and connected them together by marking walls as portal (right).

The resulting world database of Build is ridiculously simple: One array for sectors and one array for walls.

  

  Sectors         (5 entries):                         Walls  (29 entries)   :
  ============================                 ==========================================
  0| startWall:  0 numWalls: 6                  0|  Point=[x,y], nextsector = -1    // Wall in sector 0
  1| startWall:  6 numWalls: 8                 ..|                                  // Wall in sector 0
  2| startWall: 14 numWalls: 4                 ..|                                  // Wall in sector 0
  3| startWall: 18 numWalls: 3                  3|  Point=[x,y], nextsector =  1    // Portal from sector 0 to Sector 1
  4| startWall: 21 numWalls: 8                 ..|                                  // Wall for sector 0
  ============================                 ..|                                  // Wall for sector 0
                                               ..|
                    Sector 1 first wall   >>    6|  Point=[x,y], nextsector = -1
                                                7|  Point=[x,y], nextsector = -1
                                                8|  Point=[x,y], nextsector = -1   
                                                9|  Point=[x,y], nextsector =  2    //Portal from Sector 1 to Sector 2
                                               10|  Point=[x,y], nextsector = -1   
                                               11|  Point=[x,y], nextsector =  0    //Portal from Sector 1 to Sector 0
                                               12|  Point=[x,y], nextsector = -1
                    Sector 1  last  wall  >>   13|  Point=[x,y], nextsector = -1
                                               ..|
                                               28|  Point=[x,y], nextsector = -1
                                               ===========================================



Another misconception about Build is that it is not a raycaster: The vertices are projected first in player space and then a column/distance from POV are generated.

Runtime Overview

High level summary of a frame rendition :

  1. The Game module provides the Engine module with the sector where rendition should start (usually the player sector but it may be a mirror sector).
  2. The Engine module navigates the portal system and visit interesting sectors. For each sector visited:
    • Group walls in sets called "bunches". Store those in a stack.
    • Determine which sprites in that sector are visible. Store those in a stack.
  3. Consume bunches in a near to far order: Render solid walls and portals.
  4. Stop the rendition: Let the Game module update the visible sprites.
  5. Render all sprites and transparent walls in a far to near order.
  6. Swap buffers.

Here are each steps in the code :


   //  1. Each time an entity is moved, its current sector has to be updated.
   updatesector(int x, int y, int* lastKnownSectorID)
   
   displayrooms()
   {
      // Render solid walls, ceilings and floors. Also populate a list of visible sprites (but DO NOT RENDER THEM).
      drawrooms(int startingSectorID) 
      {
          // Clear "gotsector" variable, the "visited sectors" bit array.
          clearbufbyte(&gotsector[0],(long)((numsectors+7)>>3),0L);
          
          // Reset umost and dmost array (occlusion tracker arrays).
          
          // 2. Visit sectors and portal: Build a list of "bunch". 
          scansector(startingSectorID)
          {
             // Visit all connected sectors and populate a BUNCH array.
             // Determine which sprites are visible and store a reference in tsprite, spritesortcnt++
          }
          
          //At this point, numbunches is set and bunches have been generated.
          
          // Iterate on the bunch stack. This is a (O)n*n operation since the algorith seach for the closest bunch each time.
          while ((numbunches > 0) && (numhits > 0))
          {
               //Find closest bunch via a (o) n*n  method
               for(i=1;i>numbunches;i++)
               {
                  //Hard to read code in here :( :( 
                  bunchfront test
               }
              
               //DRAW a bunch of wall identified by bunchID (closest)
               drawalls(closest);
          }
      }
   
      // 3. Stop rendition and run the game module so it can update ONLY the visible sprites.
      animatesprites() 
   
      // 4. Render partially transparent walls such as grid, windows and visible sprites (players, items).
      drawmasks()
      {
          while ((spritesortcnt > 0) && (maskwallcnt > 0))
          {
              drawsprite
                 or
              drawmaskwall
          }
      }
   }
   
   // GAME Module code. Draw 2D elements (HUD, hand with weapon)
   displayrest();
   
   // 5. Swap buffers
   nextpage() 
   
   


Trivia : If you study the code, here is the fully unrolled loop that I used as a map.

Trivia : Why is the swapping buffer method called nextpage(). Back in the 90's, the joy of VGA/VESA programming meant doing double buffering manually: Two portions of the video RAM were reserved and alternatively used. Each portion was called a "page" One portion was used by the VGA CRT Module while the other was updated by the engine. Swapping buffer was about setting the CRT to use the "next page" by changing the base address. You can read a lot about that in the Chapters 23 of Michael Abrash's Black Book of Graphic Programming: Bones and sinew.
Nowaday SDL alleviates this burden with a simple video mode flag SDL_DOUBLEBUF but the method name remain as an artefact of the past.


1. Where to start rendering ?

No BSP means it is not possible to take a point p(x,y) and navigate tree nodes until we reach a leaf sector. In Build the current sector of a player has to be tracked after each position update via updatesector(int newX, int newY, int* lastKnownSectorID). The Game module calls this method of the Engine module very often.

A naive implementation of updatesector would have scanned linearly each sectors and check each time if p(x,y) is inside the sector S. But updatesector is optimized with behavior pattern:

  1. By supplying the lastKnownSectorID, the algorithm assume the player hasn't moved much and start checking sector lastKnownSectorID.
  2. If #1 fails, the algorithm checks neighboring sectors of lastKnownSectorID using portals.
  3. Finally in a worse case scenario it checks all sectors with a linear search.


In the left map, the player last know sector is sector id 1: Depending how much the player has moved, updatesector will test in order:

  1. inside(x,y,1) (Entity did not move enough to leave the sector).
  2. inside(x,y,0) (Entity moved a little to a neighbor sector).
    inside(x,y,2)
  3. inside(x,y,0) (Entity moved a lot: every sectors in the game may potentially be scanned).
    inside(x,y,1)
    inside(x,y,2)
    inside(x,y,3)
    inside(x,y,4)
The worse case scenario can be costy. But most of the time the player/projectiles haven't moved much and the operation is fast.


Inside details

Inside is a noteworthy method for two reasons:

  • It can only use integers.
  • It must run against sectors that can be concave.

I detail this method because it perfectly illustrate how Build works: With good old cross-product and XORs.

Fixed point era and the ubiquitous cross-product

Since most of the PC of 90s did not have a Floating Point unit (386SX, 386DX and 486SX): Build exclusively uses integers.

The example is a Wall with points A and B at its extremitis: The goal is to determine if the point is on the right or on the left.


In Chapter 61 of Michael Abrash's Black Book of Programming: Frame of Reference this is a single matter of performing a dot product and a comparaison.



  bool inFrontOfWall(float plan[4], float point[3])
  {
       float dot = dotProduct(plan,point);

       return dot < plan[3];
  }




But in a world with no FP Unit this is done with a cross-product :


  bool inFrontOfWall(int wall[2][2], int point[2])
  {
       int pointVector[2], wallVector[2]  ;

       pointVector[0] = point[0] - wall[0][0];  //Green vector
       pointVector[1] = point[1] - wall[0][1];

       wallVector[0] = wall[1][0] - wall[0][0]; //Red vector
       wallVector[1] = wall[1][1] - wall[0][1];

       // crossProduct only compute the Z component: we just want the Z sign.
       return 0 < crossProduct(wallVector,wallVector);
  }


Trivia : If you search for "float" in the source code of Build you will not get a single hit.
Trivia : The float type usage was democratized by Quake since it was targeted for Pentium and their Floating Point Unit.

Inside a concave polygon

Now that we have seen how a cross-product can be used to classify a point with regard to a wall, we can take a closer look at inside.

An example with a concave polygon and two points: Point 1 and Point 2.
  • Point 1 is considered outside.
  • Point 2 is considered inside.

The "modern day" algorithm for point-in-polygon (PIP) is to cast a ray from the left and check how many edges are crossed. An odd number means the point is inside, an even number means the point is outside.


Build uses a variation of this algorithm: It counts the number of edges on each sides and combine the results with XORs :


Trivia : Doom engine had to do through the same kind of gymastic in R_PointOnSide. Quake used planes and Floating Point operations in Mod_PointInLeaf.

Trivia : If you found inside difficult to read, try to go through the Chocolate Duke Nukem Version : It features comments.

2. Portal and Opaque Walls

The starting sector being provided to Build by the Game Module, rendition start with opaque walls in drawrooms: Two steps connected by a stack.


  • A preprocessing step flood the portal system (starting from startingSectorID) and store walls in a stack: scansector().
  • The stack is made of "bunch" elements.
  • The stack elements are consumed by the renderer method: drawwalls().

What is a bunch anyway ?

A bunch is a set of walls that are considered "Potentially Visible". Those walls all belong to the same sector and are continuously (connected by a point) facing the player.

Most of the walls in the stack will be discard, only a few will end up rendered to the screen.

Note : The "wall proxy" is an integer referencing a wall in the list of "Potentially Visible" walls. The pvWalls array contains a reference to the wall in the world database plus its coordinates rotated/translated into player space and screen space.

Note : The datastructure is actually more complicated: Only the first wall in the bunch is stored on a stack. The rest is in an array used as an id linked list: This is done so bunches can be moved up or down very fast in the stack.

Trivia : The flooding process uses an array to mark visited "sectors". This array has to be cleaned up before each frame. It does not use the framenumber trick in order to determine if a sector has been visited for the current frame.

Trivia : Doom engine used quantification to convert angles to screen column. Build uses cos/sin matrix to convert worlspace vertice to playerspace.



The portal flooding heuristic is: Any portal facing the player and within the 90 degres FOV will be flooded. This part is hard to understand but it is interesting because it shows how the developers tried to save cycles everywhere:



    //Cross product -> Z component
    tempint = x1*y2-x2*y1;

    // Using cross product, determine if the portal is facing us or not.
    // If it is: Add it to the stack and bump the stack counter.
    if (((uint32_t)tempint+262144) < 524288) {
        //(x2-x1)*(x2-x1)+(y2-y1)*(y2-y1) is the squared length of the wall
        if (mulscale5(tempint,tempint) <= (x2-x1)*(x2-x1)+(y2-y1)*(y2-y1))
              sectorsToVisit[numSectorsToVisit++] = nextsectnum;
    }


  

The first test check if mulscale5 would overflow.
The second test check if the portal is within the 90 degres FOV frustrum.

Bunch generation

Walls inside a sector are grouped into "bunches". Here is a drawing to help understand the idea :

In the previous drawing we can see that three sectors have generated four bunches:

  • Sector 1 generated one bunch containing one wall.
  • Sector 2 generated one bunch containing three walls.
  • Sector 3 generated two bunches, both containing two walls.

Why are walls even grouped into bunches ? Because Build did not have any data structure to allow fast sorting. It extracts the nearest bunch via a (O²) process which would have been very costy if it had been done on a per wall basis. The cost is much lower on a per wall set basis.


Bunch consumption

Once the bunches stack is populated the engine will draw them near to far. The engine select the first bunch that is no occluded by an other bunch (there is always at least one bunch that satisfies this condition ):



    /* Almost works, but not quite :( */
    closest = 0; 
    tempbuf[closest] = 1; 

    for(i=1; i < numbunches; i++){
         if ((j = bunchfront(i,closest)) < 0) 
             continue;

         tempbuf[i] = 1;

         if (j == 0){
            tempbuf[closest] = 1;
            closest = i;
         }
    }



Note : Despite the name of the variable, the bunch selected is not necessarly the closest.

Explanation of the selection by Ken Silverman :


Given 2 bunches, first see if they overlap in screen-x coordinates. If they do not, then there is no overlapping violation and move on to the next pair of bunches. To compare bunches, find the first wall on each of the bunches that overlaps in screen-x coordinates. Then it becomes a wall to wall test.

The wall to wall sorting algorithm is: Find the wall that can be extended to infinity without intersecting the other wall. (NOTE: if they both intersect, then it is an invalid sector and you should expect a drawing glitch!) The points of the other (non-extended) wall must both be on the same side of the extended wall. If that happens to be the same side as the player viewpoint, then the other wall is in front, otherwise it is behind.



bunchfront is fast, complicated and not perfect so Build double checks before sending the result to the renderer. That makes the code ugly but the result is O(n) instead of O(n²).

Each bunch selected is sent to the renderer drawalls(closest): This part of the code will draw as much wall/ceiling/floor as possible.

Wall/Ceiling/Floor rendition

The key to understand this part is that everything is rendered vertically: Walls, floor and ceilings.

At the core of the renderer are two occlusion arrays. Together they keep track of the occlusion upper and lower bounds of each column of pixels on screen.:


   //The maximum software renderer resolution is 1600x1200
   #define MAXXDIM 1600

   //FCS: (up-most pixel on column x that can still be drawn to)
   short umost[MAXXDIM+1];

   //FCS: (down-most pixel on column x that can still be drawn to)
   short dmost[MAXXDIM+1];

Notes : The engine tends to rely on arrays of primitive types instead of array of struct.

The engine writes vertical span of pixels starting from the upper or lower bound. Bounds values progress toward each other. When they encounter the column of pixels is entirely occluded an a counter is decreased:


Note : Most of the time the occulusion array is only partially updated: Portals leaves "holes".



Each wall in the bunch is projected in screenspace and then :

  • If it is a solid wall:
    • Render ceiling if visible.
    • Render floor if visible.
    • Render wall if visible
    • Mark the entire column as occluded in the occlusion array.
  • If it is a portal:
    • Render ceiling if visible.
    • Render floor if visible.
    • Peek into the next sector:
      • If the next sector ceiling is lower than current sector ceiling: Draw a "DOWN" step partial wall.
      • If the next sector floor is higher than current sector floor: Draw an "UP" step partial wall.
    • Update the occlusion array according to what has been drawn.





Stop condition : :This loop will go on until all bunches have been consumed or all pixels columns are marked as completed.
It is much easier to understand with an example breaking down a scene such as the familiar auditorium :



The maps shows the portals in red and the solid walls in white:



The first bunch three walls are projected in screenspace: Only the bottom parts made it to the screen:



The engine can hence render the floor vertically:



The engine next peek into the next sector of each three walls: Since they are not -1 those walls are portals. By looking at the floor height difference the engine figures out it has to draw an "UP" step for each of them:



And that is all what is rendered with the first bunch. Now the second bunch is projected in screen space:




Again, only the lower part of the wall made it which means the engine can render the floor:



Peeking in the next sector for the longest portal allows to draw a "STEP UP". However, the second portal in the bunch leads to a lower sector: No STEP is drawn.



The process repeats itself. Here is a video that shows the full scene:

Theatre Inside:

Upon drawing the door portal, Build drew a step up and a step down with two different textures. How is this possible with only one picnum ??! :

This is possible because the wall structure has a "picnum", an "overpicnum" for 1-way or masked walls, and a flag that allows the lower texture index to be stolen from the opposite sector's wall. It was a hack to keep the size of the sector structure small.


Note : The Quicktime format is used because the players allows to grab the cursor and move back and forth with instant feedback. I found it very useful to study the engine and far superior to any other format/hosting solution since going backward requires buffering.

Two other scenes compiled into a video:

Street :
0:00 to 0:08 : The sidewalk lower line portal is used to draw a vertical piece of floor.

At 0:08 the engine lookup the sector floor level after the sidewalk. Since it is going up the portal wall is drawn: The sidewalk is rendered.

0:18 to 0:30 : A bunch made of the two sidewalks on the left is used to render a huge piece of floor.

Theatre Outside:
Interesting scene featuring a window.

This last video shows a window. Here are some details about how it is done :

The map :



The first bunch features four walls, once projected in screenspace they allow drawing of ceiling and floor:


The left wall in the bunch is a portal. Upon inspection it turns our the next sector floor is at the same height and the next sector ceiling is higher: No steps walls have to be rendererd here. The second wall is an opaque wall (white in the map). It result in a full column of pixels draw:



The third wall is a portal. Peeking at the next sector height shows that is is lower so a "DOWN STEP WALL" has to be rendered:



The same peeking process goes for the floor. Since it is higher, an "UP STEP WALL" is drawn:



Finally the last wall is processed. It is not a portal so full columns of pixels are drown.




Trivia : Since walls are draws vertically, Build stores textures rotated 90 degres in RAM. This improves the cacheline hit rate tremendously.

3. Pause

At this point the visible solid walls have made it to the offscreen buffer. The engine stops and let the Game Module runs so it can animate the visible sprites. Those sprites are exposed in the following array:



    #define MAXSPRITESONSCREEN 1024
    extern long spritesortcnt;
    extern spritetype tsprite[MAXSPRITESONSCREEN];

  

4. Sprite rendering

Once the Game Module is done animating the visibles sprites, Build draws then: Far to Near in drawmasks().

  1. The distance of each sprite to the POV is computed.
  2. The array of sprites is sorted via Shell Sort.
  3. The engine consumes alternatively from two lists (one for sprites and one for transparent walls).
  4. Once one list has been exhausted the engine tries to minimize branching and switch to a loop that only render one kind: Sprites or walls.

Profiling

Running Duke Nukem 3D with "Instruments" gave an idea of were CPU cycles are consumed :

Main method :

No surprise: Most of the time is spent rendering opaque walls and waiting for an opportunity to flip the buffer (vsync enabled).


displayrooms method :

93% time is spent drawing the solid walls. 6% is dedicated to sprites/transparent walls rendition.


drawrooms method :

Despite its complexity the Visible Surface Determination only accounts for 0.1% of the solid walls rendition step.


drawalls method :

The *scan functions are the interface between the engine and the ASM routines:

  • wallscan():: walls
  • ceilscan(): unsloped ceilings
  • florscan(): unsloped floors
  • parascan(): parallaxing skies (uses :wallscan(): inside)
  • grouscan(): sloped ceilings and floors - slower

Comments by Ken Silverman :


ceilscan() and florscan() are complicated by the fact that they convert a list of vertical spans to a list of horizontal spans. This is what all those "while" loops in there are for. The reason I do this is because it is much faster to texture-map unsloped ceilings and floors in the horizontal direction. This is a critical optimization in the engine that you seem to have overlooked. I have seen similar while loops in the Doom source code as well.




Ken sent an evaldraw script, span_v2h.kc that shows how ceilscan() and florscan() convert a list of vertical spans to a list of horizontal spans.:





displayrest method :

displayrest is from the Game Module. The major cost is once again drawing (the weapon). The status bar is not drawn each frame, it is only patched when the ammo counter is updated.


VGA and VESA Trivia

Most people ran Build in 320x240 thanks to VGA's X-Mode. But the engine also supported Super-VGA resolution thanks to the VESA standard. The Vanilla source code is packed with VESA code that allowed portable resolution detection and rendition.

The nostalgics can read about VESA Programming in this good tutorial. Today all what remains in the code is method names such as getvalidvesamodes().

Legacy

Build was very hard to read. I listed all the things making it difficult in the next page:

Duke Nukem 3D and Build engine: Source code issues and legacy

Recommended Readings


Add a comment



Name Homepage
E-mail
(Will not appear online)
Comment



Comments (50)


#1 - Justin Meiners - 02/15/2013 - 00:19
Awesome Thank You!
#2 - Gustavo - 02/15/2013 - 00:44
Check out archive.org for the forum: http://web.archive.org/web/20080222020457/http://www.jonof.id.au/forum
#3 - Lucas - 02/15/2013 - 00:49
Unless I'm mistaken, JonoF's forum is backed up in the web archive here: http://web.archive.org/web/20120619233705/http://www.jonof.id.au/forum
#4 - Jan Zwiener - 02/15/2013 - 02:03
Another very interesting article, thank you Fabien!
#5 - Ben. - 02/15/2013 - 03:08
Thanks for this. I really appreciate people having a closer look on something and sharing their insights.

It's a pitty that they could not just refurbish the graphics, add some new maps and release a new Duke. What they released was a "let's get things done" game and called it "Duke Nukem Forever".

Maybe there will be a "Duke Nukem Community Edition" with all the cool gameplay...

Thanks for your work!
#6 - Pierre Riteau - 02/15/2013 - 04:07
Direct link to Ken's posts on the forum: http://web.archive.org/web/20120318021908/http://www.jonof.id.au/forum/index.php?action=profile;u=12;sa=showPosts
#7 - allanaes - 02/15/2013 - 05:18
"...SDL seems to have some palette issues with Vista/Windows 7..."
same problem when running Starcraft or Diablo II, the easiest solution is to kill explorer.exe before run the game (use taskmanager or create a batch command :)
#8 - Jan Michalowsky - 02/15/2013 - 05:23
Thanks for the very good review
#9 - Pedantic Panda - 02/15/2013 - 05:25
"Slopped floors" sounds like an interesting technical challenge... I assume you mean sloped.
#10 - niyaro - 02/15/2013 - 07:52
That's gonna be awesome read, thank you! Coinsidently just couple days ago I've downloaded duke3d atomic edition and finished first episode :)

May I ask you to write an in depth article on how you actually *read* code? Where do you start, how much notes you make, maybe you document the code you read, and so on. In my experience, reading code is an essential skill for any programmer, but there seem to be absolutely no resources explaining how to do it properly. I've tried reading quake3 code (three times) and tried reading eAthena code (open-source server engine for MMORPG Ragnarok Online) but after couple days I've got lost in detailes and in the end had no clue about how thing works. You are on the other hand extremely skilful at the task and your insights are very valuable for an open community.
#11 - Matt - 02/15/2013 - 07:54
These reviews are very interesting and well done. thanks.
#12 - Gustavo De Micheli - 02/15/2013 - 09:27
Great reading as always Fabien. Why would Silverman put all code in one monolith file?
I laugh about your note about variables/files with numbers, I totally agree with you. I always remaind my coworkers of that (after some year developing seriously and reading books like Clean Code made ponder the value of good names).

So, what's the next step? do you have any engine you want to review next?
It's a shame that any of the Unreal games doesn't release their source codes.
#13 - Ohcmonakismet - 02/15/2013 - 10:03
Can you please not use JavaScript for images?

I can't read the article via Instapaper app, and the page is otherwise unreadable on mobile (and I don't have time to read at my desk)
#14 - Luc Trudeau - 02/15/2013 - 10:40
Wow, what a great review! Very educational.

I've always wondered, what software do you use to make your diagrams?
#15 - Hugo - 02/15/2013 - 10:42
Very nice work Hail to the King Baby !
#16 - Arthur Langereis - 02/15/2013 - 11:37
Your Mac OS X binary uses @rpath relative paths (such as @rpath/SDL.framework/Versions/A/SDL) that do not resolve to the frameworks inside the App Bundle (at least for me, OS X 10.8.3pre). I'm assuming you have SDL in another Library/Frameworks dir somewhere, but for people who don't it will not work right now, use otool to change the install name in the bin to use @executable_path/../Frameworks/SDL.framework/… for both.

Excellent post btw, nice deep analysis of the source and render mechanics. I remember looking at ENGINE.C when it was first released and indeed did not feel encouraged to continue investigating :) But I respect that he got a post-DOOM engine working smoothly on such feeble hardware. It took some sacrifices, especially then. Compilers now can turn a ton of types, lambdas and whatnot into a tight loop but back then you really had to write high-level assembly to get stuff fast. Props to Ken and you!
#17 - Daniel Platz - 02/15/2013 - 12:21
Hi Fabien,
As always a great read!
Once I saw it over at HackerNews while at work, I knew I had to make a short day :-)
One thing: Could you elaborate on why using cross products makes a difference to using a dot product if you have fixed-point/integer math?
Also, two small typos in the related source code snippet:
* If I am not color-blind, the one vector is blue and not green. ;-)
* The return-statement should compare against the 2D zero-vector not a scalar, right?

Cheers,
Daniel
#18 - John O'Grady - 02/15/2013 - 13:07
A Valentines day well spent. Very interesting read about a game I loved.
#19 - Ken Causey - 02/15/2013 - 13:54
s/fidele/faithful/ (I think)
#20 - Will Lam - 02/15/2013 - 18:11
So many awesome memories of this game - really appreciate your code review as it allows to to try to appreciate the game in a different way.
#21 - Allen - 02/16/2013 - 06:33
Hi, thanks for the interesting read. But one thing struck me as *very* annoying on reading - your blog loads graphics only when the user scrolls to the part where they are embedded. So upon scrolling it takes 1-2s before I can see the actual image and have to stare at a white space instead. Please think about removing that 'feature'. Thank you!
#22 - Hugh O'Brien - 02/16/2013 - 16:20
Seems the Internet Archive has you covered for those lost forum posts;

http://web.archive.org/web/*/http://www.jonof.id.au/forum/
#23 - Woody - 02/17/2013 - 05:54
Super interesting. Thank you very much!

But why the hell would you use quicktime videos... :D
#24 - Steve - 02/17/2013 - 12:20
cannot read the videos
Chrome give me an alrt that is due to a plugin named : quick time

use html video tag or flash for video other solution are lame !
#25 - Javier Villa - 02/18/2013 - 16:46
I really enjoy these reviews, glad to see Duke Nukem starring here, after all the awesome Dooms and Quakes it makes the coverage of the clasic FPSs pretty comprehensive.

Thanks!
#26 - David Bundgaard - 02/19/2013 - 04:19
Hello Fabien,

I wonder what tools you use to make your diagrams.


I appreciate to read your comments and notes on different games, its very nice of you to take the time and read and share it with all of us.


Have a good life


//David
#27 - Kam - 02/19/2013 - 11:50
Is there any chance you could alter this page so that it doesn't require Javascript to load the images, please?

Thanks.
#28 - Stephen - 02/19/2013 - 13:22
Great read about those legacy gaming engines.

I also enjoyed, very much, your article's page presentation.
Nicely done and pleasant to view.
I had no problem viewing any of the images or QT videos.
#29 - Arkitekt - 02/19/2013 - 13:44
Sweet. Uber good work sir.
#30 - Dexter - 02/19/2013 - 14:21
Awesome reading, thank you for your great review!
#31 - Evan B - 02/19/2013 - 15:09
I wonder if archive.org has it in the wayback machine?
#32 - Jonathan Bayer - 02/19/2013 - 16:32
Thanks for a great article. I tried installing the Mac binary, but it requires OSX 10.7, I have 10.6.8 (snow leopard)
#33 - pleegwat - 02/19/2013 - 16:33
You remark that functions are very badly commented, but you don't mention an obvious reason: Terminal size.

Depending on the editor, you have at most 24 lines of 80 columns on a VGA screen. The inside function you mention is 20 lines long - pretty close to that limit. The desire to keep logic sections within the size of the screen is much more important than comments for other people years down the line to enjoy.


Also, regarding globals: While I'm not sure about 486 specifically (I'm too young), a global array of a primitive type (1, 2, or 4 bytes per element) is very fast to access. I believe the address of the element could be calculated as part of the memory fetch instruction.
An array of structs, whether as a global or on the stack, would probably require 2 instructions to calculate the memory address, and a third for the actual fetch.
Aditionally, the size of the stack would have been tightly constrained.
#34 - Tym - 02/19/2013 - 17:40
If you liked duke nukem 3d, check out Shadow Warrior... Lo Wang come for you, snake coward! Loved that game, it was also developed on top of the 'Build' engine
#35 - Taamalus - 02/19/2013 - 17:41
This was a great read.
Any plans to mention the active Source Port EDuke32?
http://www.eduke32.com/
OpenGL is very much improved, there is now true Room over Room capabilties and much more.
The team has an very active forum
http://forums.duke4.net/
Best
Taamalus
#36 - Mark - 02/19/2013 - 18:22
I don't agree that it's hard to understand, but I also used to write 3D renderers back in those days so it makes sense to me.

* Putting as much code into a single file as possible was a good way to help the primitive compilers do some optimisations & inlining
* Arrays of scalars instead of structs are simply much faster to iterate through given limited CPU caches
* The example math method is straightforward bitmask operations on the sign bits
* Ahh, hand-coded assembly. I'll admit it takes the right kind of masochist to read it :)
* Nothing magic about those numbers! Just bitmasks.
* IDEs have no excuse to be slow these days. DJGPP etc handled huge files just fine back then!

I think you mean 120Hz, not 120 ticks per frame (that would be far too much IRQ overhead). But the PIT native frequency was 1193180Hz. You set a divisor value from 0-65535 which determines the frequency of IRQ0. You can't get 120Hz exactly (more like 120.002Hz) so it's unusual to quantize rendering / physics to this as you get graphics "tearing". What did they use this timer for?
#37 - Adam Baxter - 02/19/2013 - 19:32
http://web.archive.org/web/20120509103449/http://www.jonof.id.au/forum seems to work.
#38 - mbidule - 02/20/2013 - 11:06
Amazing work. I really enjoyed reading it !

Thanks for sharing all this knowledge.
#39 - kd - 02/20/2013 - 11:39
quicktime lol. why not just embed a youtube video. i havent seen a quicktime video for like 10 years
#40 - Ivo - 02/20/2013 - 12:41
I am guessing the binary was complied for 64 bit windows?

I get C:\Duke3D\ChocolateDuke3D.exe is not a valid Win32 application when tying to start.

When a get a bit more time I will try and get this complied locally, both on Win XP and Linux(Raspberry Pi)
#41 - Enze - 02/21/2013 - 06:52
#22, #23:
The Videos are in the folder fd.fabiensanglard.net/duke3d/movies/
the 3 *.m4v-files
VLC plays them just fine
#42 - Edgar Acosta - 02/21/2013 - 10:29
I want to learn how to program VideoGames, how can i start?. i am a software engineer but with experience in embedded and client server systems, data bases. How can i move to Video Games>? also i don't have experience in computer graphics...How to start? any Suggestion? what books do i have to read?.
#43 - Daniel "MontyOnTheRun" Monteiro - 02/21/2013 - 11:53
I really enjoy your articles! Keep on going!

Maybe, if you're running out of code bases to review, you should turn to Aleph One. Its kind of "Chocolate Marathon" ( http://en.wikipedia.org/wiki/Marathon_2:_Durandal ).

Its surprising that most of my pet game engines of the past have similar algorithms (in fact, the "current sector" tracking is exactly the same) to Build.

Greetings from Rio ;-)
#44 - Daniel "MontyOnTheRun" Monteiro - 02/22/2013 - 22:15
One more comment (possibly worth asking Ken) is:

"The renderer does not rely on recursive function (the way Doom did when walking the BSP). Instead a stack and a loop is used (for sector flooding)" (from your raw notes)

There is no recursion based on machine stack, but rather on software stack. Maybe this is because (if my memory serves me well) Ken prototyped the engine in QuickBASIC and (from my faded memory again) it didn't have recursion until 4.0;

Or perhaps not...maybe was stack size thing from those old dark (and very fun) days?

QuickBasic was a fine "IDE" for some quick hacking. I grew making lousy 3D games in it myself. My biggest inspiration at the time was Ken, but I wasn't aware he used the very same tool I was;
#45 - Tigrou - 02/27/2013 - 12:01
Hi Fabien,

Very interesting review as usual. I already took a look at code some years ago and discover most inner workings but some things remains mysterious, your review cleared them out.

I expected to see an explanation on how mirrors works (why they require an additional empty room with special texture). Still unclear to me.

One interesting trivia : the way updatesector() works caused some well know glitch/cheat during the game, that was used a lot in multiplayer (warp glitch) : if you make player run very fast in a place were sectors are very small (eg : stairs), updatesector get lost and start scanning all sectors. If there is two (or more) overlapping sectors where player is, game sometimes choose the wrong sector (depending their order), and it will teleport player there.
see video : http://www.youtube.com/watch?v=VUOhZIdCSiI
#46 - Fei Yuan - 03/03/2013 - 07:43
This might sound stupid, but after downloading the mac osx release of Chocolate Duke Nukem 3D, where do I place my DUKE3D.GRP file???
#47 - fabien sanglard - 03/03/2013 - 17:07
In the same directory as the Bundle :P !
#48 - Bruce - 03/14/2013 - 18:07
Very interesting to see what's behind this engine.
Ken was a true whiz kid - imagine a threesome, hopefully limited to engine concerns, between Carmack, Abrash and Silverman in the 90ies... the arch software rasterizer could have been born ^^

Thanks for your excellent review and bringing the code to life in classic fashion!
#49 - Fei Yuan - 03/22/2013 - 12:22
Why is there no support for Duke Nukem 3D v1.4?

I want to play the Atomic Edition but I can't patch to 1.5, please make Chocolate Duke Nukem 3D work for v1.4!
#50 - FistMarine - 03/23/2013 - 03:31
When I try to start a game by selecting any episode, the game crashes/freezes. Any solutions?
Note that the intro demos work fine.

 

@2013