Doom Iphone code review
I took some time away from programming something I hope will become a really good shmup and read the source code of Doom for iPhone. I was very interested in finding out how a pixel oriented engine made the transition to openGL. Here are my notes, as usual I got a bit carried away with the drawings.
Downloads
Overall design
Renderer: How Doom was working
Renderer: How DoomGL is working
Network system
Sound system
Comments
Recommended readings
Note: There is plenty of cool stuff to learn from John Carmack's Progress Report and Release Note.
And if this article stress you too much, check out a copy of Fluid 2 my relaxing application for iPhone.
Feb 8th, 2010 : Slashdotted pretty hard, need to buy more bandwidth :/ !
Feb 9th, 2010 : Can't keep up with 5000 daily visitors, switching videos to YouTube for now (at least they are 480) :/ !
Oct 29th, 2010 : Seems John Carmack liked this review too.
Downloads
Source code and binaries are available here:Overall design
An iPhone application never really controls the device, it is only granted runtime (and it is understandable, an iPhone/iTouch need to be able to receive calls or play music). Hence most applications run in two threads, none of them are controlled by your code:
- The openGL thread running in the GPU, where draw commands are buffered until rasterization is triggered.
- The main thread running on the CPU and owned by a
NSRunLoop
object.
At startup, the application's environment is initialized and NSRunLoop
calls applicationDidFinishLaunching
. This is where you get to run something, for 5 seconds, after what the function is interrupted. During this time period, you can register your code to receive notification of touch screen events via function pointers.
In order to refresh the screen on a regular interval, we need to create new events via an NSTimer
object that will be binded to NSRunLoop
. NSTimer
defines marks in time, NSRunLoop
will check "regularly" if a mark had been passed and will call your function pointer. Most developer setup NSTimer
to call a custom method: hostFrame
at a "wished" frequency of 30 Hz.
There is plenty of problem with such a design:
- Because a NSRunLoop object can be busy, refreshing accuracy is only 100ms :( !
- "Marks" are set in advance, this means that if the
hostGameFrame
method overRun the NSTimer marks, next trigger won't be as soon as possible but on the next scheduled mark ( See "Timer Sources" in Apple's documentation on NSRunLoop).
Quote:
You can configure timers to generate events only once or repeatedly. A repeating timer reschedules itself automatically based on the scheduled firing time, not the actual firing time. For example, if a timer is scheduled to fire at a particular time and every 5 seconds after that, the scheduled firing time will always fall on the original 5 second time intervals, even if the actual firing time gets delayed.
If this happen the CPU ends up being idle until the next NSTimer time mark is reached but more important: A frame is skipped.
Even one single frame taking too long to process can have disastrous impact on the framerate:
Doom for iPhone tries never to miss a NSTimer mark by running in three threads:
- Main Thread (very short body, highest priority, never misses a mark and trigger Game Thread to host a frame via semaphore).
- Game Thread.
- OpenGL Thread.
Critical sections of the code (mainly user inputs) are protected via an Unix mutex. The rendering loop run as fast as it can but is starved via a semaphore on sem_wait
. iPhoneAsyncTic
increase the counter sem_post
.
This design actually share a lot of similarities with the technique called "Triple buffering" which goal is to totally decouple the GPU from the display's refresh rate. Here the goal is to decouple the CPU from the NSTimer ticks:
- Shorten the runtime of the mainThread to never miss an NSTimer mark.
- The CPU is never idle, the high priority mainThread will interrupt the GameThread from time to time but that's it.
Note: The rendering context is grabed by the Game Thread at startup [EAGLContext setCurrentContext:context]
without usage of a EAGLSharegroup
, effectively sharing the context accross two threads: It's a bad thing but it seems to be working fine anyway.
I was surprised not to find any usage of CADisplayLink
, an object allowing to link a method call with a screen refresh that showed HUGE performance (at least according to my experimentations. But it is available on 3.0 firmware only so this design will allow more people to play the game, which makes a lot of sense commercially speaking.
Renderer: How Doom was working.
Entry was so large it is now a full article: Here
Renderer: How Doom iPhone is working.
Just like Wolfenstein 3D, Doom was rendering a screenframe pixel per pixel. The only way to do this on iPhone with an acceptable framerate would be to use CoreSurface/CoreSurface.h
framework. But it is unfortunatly restricted and using it would prevent distribution on the AppStore.
The only solution is to use OpenGL but this comes with a few challenges:
- Doom was faking 3D with a 2D map. OpenGL needs real 3D vertices.
- More than 3D vertices, OpenGL needs data to be sent as triangles (among other things because they are easy to rasterize). But Doom sectors were made of arbitrary forms.
- Doom 1993 perspective was also faked, it was actually closer to an orthogonal projection than a perspective projection.
- Doom was using VGA palette indexing to perform special effect (red for damage, silver for invulnerable...).
Early attempts to port Doom to OpenGL built new WAD (Doom archive format). They exploited the WAD format capability to store pretty much anything (the original Doom WAD contained graphics, sounds, map, via different lumps
types.) to create a new type of entry and store the 3D data organized as triangles. This is not the approach in Doom on iPhone, the world is "unified" in 3D primitives at the beginning of each level.
Building the third dimension
If you know Doom engine of if you read the previous paragraph, you remember there is three types:
- For the walls (made of lines called SEGS) it is fairly "easy" because all walls are vertical: Use the sector's heights to generate the third dimension, two triangles then form a rectangle wall.
- Floor and Ceiling are much harder as sectors were not convex (only subsectors were). Each sector in the level is preprocessed via Silicon Graphic's libtess ( read more here ) on a per sector basis.
- Things are not preprocessed but rendered as sprite impostors generated on the fly.
Only the flat tesselation is a bit hard to get, here is the processing of the main room sector in E1M1 (the space with a blue floor in the very beginning of Doom).
Note in the animation the way the set is drawn as a "fan" (although it is done via the
GL_TRIANGLES
primitive ).
Note: This video does not account for the deferred rendition process of the PowerVR chips. Fillrate consumption and overdraw are actually minimal.
Trivia : 3D unification is done in
gld_PreprocessLevel
and is quite nice to read. Unfortunatly some maps had errors and workaround had to be hardcoded. Here in gld_PreprocessSectors
with E3M8:// JDC: E3M8 has a map error that has a couple lines that should // be part of sector 1 instead orphaned off in sector 2. I could // let the non-closed sector carving routine handle this, but it // would result in some pixel cracks. Instead, I merge the lines // to where they should have been. // This is probably not the right solution, because there are // probably a bunch of other cases in the >100 Id maps. extern int gameepisode, gamemap; if ( gameepisode == 3 && gamemap == 8 ) { void IR_MergeSectors( int fromSector, int intoSector ); IR_MergeSectors( 2, 1 ); }
The big picture
Once the world is 3D consistent, each frame is rendered via an hybrid CPU/GPU process:
- Generate and upload OpenGL's
GL_PROJECTION
andGL_MODELVIEW
matrices. - Perform extra view transformation and read back GL_MODELVIEW matrix from the GPU so pre-calculations can be performed.
- Use the BSP to walk the world near to far. Nothing is rendered at this point, only visibility edicts are generated and stored in
gld_drawinfo
.- For each wall, use the matrix that was read back to precalulate the X screen space coordinate where OpenGL will render the wall and maintain an occlusion array
occlusion[MAX_SCREENWIDTH+2]
, nothing is rendered at this point. - If a subSector is visible, mark its parent sector as "to be rendered"
- If a sector contains a thing, generate an impostor to render it.
- For each wall, use the matrix that was read back to precalulate the X screen space coordinate where OpenGL will render the wall and maintain an occlusion array
- For the three groups of items, perform a quicksort based on the textureId.
- Render all walls, sectors and things. One batch per texture.
- Switch OpenGL to 2D and perform post effect to fake palette effects if necessary and draw player sprites etc..
All of this takes place in IR_RenderPlayerView
.
void IR_RenderPlayerView (player_t* player) { [..] // clean occlusion array memset( occlusion, 0, sizeof( occlusion ) ); // Reset the fake palette. gld_SetPalette(-1); // To make it easier to accurately mimic the GL model to screen transformation, // this is set up so that the projection transformation is also done in the // modelview matrix, leaving the projection matrix as an identity. This means // that things done in eye space, like lighting and fog, won't work, but // we don't need them. glMatrixMode(GL_PROJECTION); glLoadIdentity(); glMatrixMode(GL_MODELVIEW); glLoadIdentity(); infinitePerspective(64.0f, 320.0f/200.0f, 5.0f/100.0f); IR_RenderBSPNode( numnodes-1 ); NewDrawScene(player); // Perform fake palette effect gld_EndDrawScene(); }
Trivia :
- Palette: Because Doom classic was using a VGA indexed palette to perform some special effects, this feature had to be replicated via a post effect in
gld_EndDrawScene
. - Scene processing and drawing are now two distinct phases (
IR_RenderBSPNode
andNewDrawScene
), the way things started to be done in Quake. - Matrices are handled in an unusual fashion:
- Matrices are built, uploaded to OpenGL, manipulated via the drivers and then read back from OpenGL. The usual way is to build the final matrix and upload it.
- The perspective projection is stored in GL_MODELVIEW, leaving GL_PROJECTION has identity matrix.
- The BSP structure allows to walks the world near to far. But just before drawing, triangles are sorted via quicksort on the textureID. This allows to reduce the number of texture switch and reduce the number of drawing calls (by increasing the batch size).
Next: Video illustrating the drawing order: distance is not relevant anymore, triangles are drawn in batches of same texture. this is possible because Alpha testing is activated.
Contrary to Doom93, partly transparent walls are not drawn at the end with the "things" because of the GPU's ability to use Alpha testing and Depth testing. The same POV was used as a video in the Classic Doom article. Again the priority of texture ID over distance is obvious.
Trivia :
Using Aplha testing on a tile-based deferred renderer (TBDR) such as the iPhones is highly inefficient because it introduces uncertainty in the GPU pipeline. But it seems it gets the job for Doom, so why not ?
Networking
Not much to add to John Carmack's Release Note. Doom's original IPX system was first converted to UDP broadcast but packet drop was bad over WIFI (among other things). There is now a server to combine commands from clients and send updates to each devices.
Sound system
The music system is now using Core Audio Services with MP3 decompressed on dedicated hardware (as opposed to the OGG format decompressed on CPU with Wolfenstein Iphone). Sound effects are WAV, processed via OpenAL on the CPU.
Comments
Doom iPhone codebase is really really nice to read thanks to a lot of comments from JDC. Sections of the code that had to be rewritten (iphone_*.c
) are HEAVILY documented.
Example from iphone_render.c
(there is almost more comments than code):
// If a segment in this subsector is not fully occluded, mark // the line that it is a part of as needing to be drawn. Because // we are using a depth buffer, we can draw complete line segments // instead of just segments. for ( int i = 0 ; i < sub->numlines ; i++ ) { seg_t *seg = &segs[sub->firstline+i]; line_t *line = seg->linedef; // Determine if it will completely occlude farther objects. // Given that changing sector heights is much less common than // traversing lines during every render, it would be marginally better if // lines had an "occluder" flag on them that was updated as sectors // moved, but it hardly matters. boolean occluder; if ( seg->backsector == NULL || seg->backsector->floorheight >= seg->backsector->ceilingheight || seg->backsector->floorheight >= seg->frontsector->ceilingheight || seg->backsector->ceilingheight <= seg->frontsector->floorheight ) { // this segment can't be seen past, so fill in the occlusion table occluder = true; } else { // If the line has already been made visible and we don't need to // update the occlusion buffer, we don't need to do anything else here. // This happens when a line is split into multiple segs, and also // when the line is reached from the backsector. In the backsector // case, it would be back-face culled, but this test throws it out // without having to transform and clip the ends. if ( line->validcount == validcount ) { continue; } // check to see if the seg won't draw any walls at all // we won't fill in the occlusion table for this occluder = false; } }
Faking VGA palette
Doom 93 was taking advantage of VGA's DAC color table (colors indexed on a palette) to change all colors on the screen when picking an item (brighter), taking damage (red) or while invulnerable (silver).
But OpenGL doesn't work with palettes (actually you have access to GL_OES_compressed_paletted_texture
on iPhone 3GS but the plateform is still marginal and even worse: Colors are expanded (converted to RGBA) making their use counter-productive! The effect is replicated via a three stages process:
- If any effect during
D_Display
want to change the palette, it just modifies the global variablepalette
(damage palette=10,newItem palette=8) . - The palette change is detected in PRBoom's
V_SetPalette
and callsgld_SetPalette
to set the color of the QUAD to blend over. - Finally,
gld_EndDrawScene
check if a blend is required and performs it by drawing a semi-transparent QUAD on top of the entire screen.
Note: If a shift toward the same color can be achieved via "normal" blending (glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA)
), invulnerability is a bit tricker to fake: glBlendFunc(GL_ONE_MINUS_DST_COLOR, GL_ZERO)
is used instead to perform a negate.
The effect on iPhone (on the right) is similar but not as nice as what could be done with a palette (on the left).
Spinning wheel
Something new compared to the PC version is the spinnning wheel, prompting you to wait at the beginning of each level:
This happens while textures are uploaded to OpenGL and can be significant (up to 2s). This is done after the first frame has been drawn so player has something to look at. Precaching reduce frame skipping later in the level when new textures become visible, it is done in
gld_Precache()
.
Profiling.
Attached Instruments to Doom (running on iPhone, not XCode simulator) provided some cool datas regarding the cost of every operations:
Main Thread
as seen previously the brain of the application. Mainly here to catch NSTimer marks.Game Thread
actually performing the work.AURemoteIO Thread
Core Audio's unit, interface with the iPhone's HAL. Read plenty of cool things here.RunWeb Thread
usually active only when embedding a browser in the app. No idea why it's here.
As expected most of the time allocated to the process is spent in GameThread
, preparing the next frame via iPhoneFrame()
. Note the sound system is not consuming CPU as it is processed on a dedicated chip (only sound effects are passed on the CPU).
Focusing on iPhoneFrame
method:
This was a 120s session, this explain why texture uploading (
gld_Precache
) cost is so huge.
Focusing on
iPhoneDrawScreen
method:A lot of time is spent in
SwapBuffersAndTouches
, this method clear the user input and instruct OpenGL to flip the framebuffer:After more tracing it seems the mutex is not slowing down the touches swapping ( 2/3 micros on average) while swapping the framebuffer via
[context presentRenderbuffer:GL_RENDERBUFFER_OES]
run around 1-2 ms with jumps to 12000 to 24000 ms. iPhones are supposed to use triple buffering so I was unable to put an explanation on this :/ !
Doom 2
Does this new engine work well with Doom2 WAD archive ? Yes, absolutely. Just put doom2.wad
in the base
directory, remove doom.wad
, deploy and play !!
Double Shotgun
Even the new monsters (here a "mancubus") are working fine:
I was expecting the new monsters to crash the engine but I realized Doom iPhone is based on PRBoom, itself based on UltimateDoom. All the monsters behavior function pointer are valid.
Compiling on 3.0 firmware and above.
Attemp to compile with iPhone SDK 3.0 but gcc raised an unusual error: crosses initialization of
322 OSStatus BackgroundTrackMgr::SetupQueue(BG_FileInfo *inFileInfo) { 323 UInt32 size = 0; 324 OSStatus result = AudioQueueNewOutput(&inFileInfo->mFileFormat, QueueCallback, this, CFRunLoopGetMain() 325 326 AssertNoError("Error creating queue", end); 327 // channel layout 328 OSStatus err = AudioFileGetPropertyInfo(inFileInfo->mAFID, kAudioFilePropertyChannelLayout, &size,NULL); 329 if (err == noErr && size > 0) { 330 AudioChannelLayout *acl = (AudioChannelLayout *)malloc(size); 331 result = AudioFileGetProperty(inFileInfo->mAFID, kAudioFilePropertyChannelLayout, &size, acl); 332 AssertNoError("Error getting channel layout from file", end); 333 result = AudioQueueSetProperty(mQueue, kAudioQueueProperty_ChannelLayout, acl, size); 334 free(acl); 335 AssertNoError("Error setting channel layout on queue", end); 336 } 337 338 // volume 339 result = SetVolume(mVolume); 340 341 end: 342 return result; 343 }
/Users/[..]/iphone/BackgroundMusic.cpp:341: error: jump to label 'end' /Users/[..]/iphone/BackgroundMusic.cpp:326: error: from here /Users/[..]/BackgroundMusic.cpp:328: error: crosses initialization of 'OSStatus err'
Jumping into the scope of an automatic variable may bypass constructor calls and is hence forbidden. Solution was to replace the MACRO with its corresponding define and remove the goto.
Recommended readings
A trip Down The PipeLine : Oldie But Goodie (and a pure gem that helped me to understand homegenous coordinates system and perspective projection).