February 1st, 2010

Doom Iphone code review

I took some time away from programming something I hope will become a really good shmup and read the source code of Doom for iPhone. I was very interested in finding out how a pixel oriented engine made the transition to openGL. Here are my notes, as usual I got a bit carried away with the drawings.

Downloads
Overall design
Renderer: How Doom was working
Renderer: How DoomGL is working
Network system
Sound system
Comments
Recommended readings


Note: There is plenty of cool stuff to learn from John Carmack's Progress Report and Release Note.

And if this article stress you too much, check out a copy of Fluid 2 my relaxing application for iPhone.

Feb 8th, 2010 : Slashdotted pretty hard, need to buy more bandwidth :/ !
Feb 9th, 2010 : Can't keep up with 5000 daily visitors, switching videos to YouTube for now (at least they are 480) :/ !
Oct 29th, 2010 : Seems John Carmack liked this review too.


Downloads

Source code and binaries are available here:

  


Overall design

An iPhone application never really controls the device, it is only granted runtime (and it is understandable, an iPhone/iTouch need to be able to receive calls or play music). Hence most applications run in two threads, none of them are controlled by your code:

At startup, the application's environment is initialized and NSRunLoop calls applicationDidFinishLaunching. This is where you get to run something, for 5 seconds, after what the function is interrupted. During this time period, you can register your code to receive notification of touch screen events via function pointers.

In order to refresh the screen on a regular interval, we need to create new events via an NSTimer object that will be binded to NSRunLoop. NSTimer defines marks in time, NSRunLoop will check "regularly" if a mark had been passed and will call your function pointer. Most developer setup NSTimer to call a custom method: hostFrame at a "wished" frequency of 30 Hz.




There is plenty of problem with such a design:


Quote:


	You can configure timers to generate events only once or repeatedly. A repeating timer reschedules
	itself automatically based on the scheduled firing time, not the actual firing time. For example,
	if a timer is scheduled to fire at a particular time and every 5 seconds after that, the scheduled
	firing time will always fall on the original 5 second time intervals, even if the actual firing
	time gets delayed.






If this happen the CPU ends up being idle until the next NSTimer time mark is reached but more important: A frame is skipped.

Even one single frame taking too long to process can have disastrous impact on the framerate:



Doom for iPhone tries never to miss a NSTimer mark by running in three threads:

Critical sections of the code (mainly user inputs) are protected via an Unix mutex. The rendering loop run as fast as it can but is starved via a semaphore on sem_wait. iPhoneAsyncTic increase the counter sem_post.



This design actually share a lot of similarities with the technique called "Triple buffering" which goal is to totally decouple the GPU from the display's refresh rate. Here the goal is to decouple the CPU from the NSTimer ticks:




Note: The rendering context is grabed by the Game Thread at startup [EAGLContext setCurrentContext:context] without usage of a EAGLSharegroup, effectively sharing the context accross two threads: It's a bad thing but it seems to be working fine anyway.

I was surprised not to find any usage of CADisplayLink, an object allowing to link a method call with a screen refresh that showed HUGE performance (at least according to my experimentations. But it is available on 3.0 firmware only so this design will allow more people to play the game, which makes a lot of sense commercially speaking.

Renderer: How Doom was working.

Entry was so large it is now a full article: Here

Renderer: How Doom iPhone is working.

Just like Wolfenstein 3D, Doom was rendering a screenframe pixel per pixel. The only way to do this on iPhone with an acceptable framerate would be to use CoreSurface/CoreSurface.h framework. But it is unfortunatly restricted and using it would prevent distribution on the AppStore.

The only solution is to use OpenGL but this comes with a few challenges:


Early attempts to port Doom to OpenGL built new WAD (Doom archive format). They exploited the WAD format capability to store pretty much anything (the original Doom WAD contained graphics, sounds, map, via different lumps types.) to create a new type of entry and store the 3D data organized as triangles. This is not the approach in Doom on iPhone, the world is "unified" in 3D primitives at the beginning of each level.


Building the third dimension

If you know Doom engine of if you read the previous paragraph, you remember there is three types:



Only the flat tesselation is a bit hard to get, here is the processing of the main room sector in E1M1 (the space with a blue floor in the very beginning of Doom).



Note in the animation the way the set is drawn as a "fan" (although it is done via the GL_TRIANGLES primitive ).


Note: This video does not account for the deferred rendition process of the PowerVR chips. Fillrate consumption and overdraw are actually minimal.


Trivia : 3D unification is done in gld_PreprocessLevel and is quite nice to read. Unfortunatly some maps had errors and workaround had to be hardcoded. Here in gld_PreprocessSectors with E3M8:


	// JDC: E3M8 has a map error that has a couple lines that should
	// be part of sector 1 instead orphaned off in sector 2.  I could
	// let the non-closed sector carving routine handle this, but it
	// would result in some pixel cracks.  Instead, I merge the lines
	// to where they should have been.
	// This is probably not the right solution, because there are
	// probably a bunch of other cases in the >100 Id maps.

	extern int gameepisode, gamemap;

	if ( gameepisode == 3 && gamemap == 8 ) 
	{

		void IR_MergeSectors( int fromSector, int intoSector );
		IR_MergeSectors( 2, 1 );

	}




The big picture

Once the world is 3D consistent, each frame is rendered via an hybrid CPU/GPU process:



All of this takes place in IR_RenderPlayerView.



	void IR_RenderPlayerView (player_t* player) 
	{
		[..]

		// clean occlusion array
		memset( occlusion, 0, sizeof( occlusion ) );

		// Reset the fake palette.
		gld_SetPalette(-1);

		// To make it easier to accurately mimic the GL model to screen transformation,
		// this is set up so that the projection transformation is also done in the
		// modelview matrix, leaving the projection matrix as an identity.  This means
		// that things done in eye space, like lighting and fog, won't work, but
		// we don't need them.
		glMatrixMode(GL_PROJECTION);
		glLoadIdentity();	
		glMatrixMode(GL_MODELVIEW);
		glLoadIdentity();
		infinitePerspective(64.0f, 320.0f/200.0f, 5.0f/100.0f);

		IR_RenderBSPNode( numnodes-1 );
	
		NewDrawScene(player);

		// Perform fake palette effect
		gld_EndDrawScene();	

	}



Trivia :



Next: Video illustrating the drawing order: distance is not relevant anymore, triangles are drawn in batches of same texture. this is possible because Alpha testing is activated.




Contrary to Doom93, partly transparent walls are not drawn at the end with the "things" because of the GPU's ability to use Alpha testing and Depth testing. The same POV was used as a video in the Classic Doom article. Again the priority of texture ID over distance is obvious.



Trivia :

Using Aplha testing on a tile-based deferred renderer (TBDR) such as the iPhones is highly inefficient because it introduces uncertainty in the GPU pipeline. But it seems it gets the job for Doom, so why not ?

Networking

Not much to add to John Carmack's Release Note. Doom's original IPX system was first converted to UDP broadcast but packet drop was bad over WIFI (among other things). There is now a server to combine commands from clients and send updates to each devices.

Sound system

The music system is now using Core Audio Services with MP3 decompressed on dedicated hardware (as opposed to the OGG format decompressed on CPU with Wolfenstein Iphone). Sound effects are WAV, processed via OpenAL on the CPU.

Comments

Doom iPhone codebase is really really nice to read thanks to a lot of comments from JDC. Sections of the code that had to be rewritten (iphone_*.c) are HEAVILY documented.

Example from iphone_render.c (there is almost more comments than code):



	// If a segment in this subsector is not fully occluded, mark
	// the line that it is a part of as needing to be drawn.  Because
	// we are using a depth buffer, we can draw complete line segments
	// instead of just segments.

	for ( int i = 0 ; i < sub->numlines ; i++ ) {
		seg_t *seg = &segs[sub->firstline+i];
		
		line_t *line = seg->linedef;

		// Determine if it will completely occlude farther objects.
		// Given that changing sector heights is much less common than
		// traversing lines during every render, it would be marginally better if
		// lines had an "occluder" flag on them that was updated as sectors
		// moved, but it hardly matters.

		boolean	occluder;
		if ( seg->backsector == NULL || 
			seg->backsector->floorheight >= seg->backsector->ceilingheight ||
			seg->backsector->floorheight >= seg->frontsector->ceilingheight ||
			seg->backsector->ceilingheight <= seg->frontsector->floorheight ) 
		{
			// this segment can't be seen past, so fill in the occlusion table
			occluder = true;

		} else {
			// If the line has already been made visible and we don't need to
			// update the occlusion buffer, we don't need to do anything else here.
			// This happens when a line is split into multiple segs, and also
			// when the line is reached from the backsector.  In the backsector
			// case, it would be back-face culled, but this test throws it out
			// without having to transform and clip the ends.

			if ( line->validcount == validcount ) {
				continue;
			}
			
			// check to see if the seg won't draw any walls at all
			
			// we won't fill in the occlusion table for this
			occluder = false;
		}
	}


Faking VGA palette

Doom 93 was taking advantage of VGA's DAC color table (colors indexed on a palette) to change all colors on the screen when picking an item (brighter), taking damage (red) or while invulnerable (silver).

But OpenGL doesn't work with palettes (actually you have access to GL_OES_compressed_paletted_texture on iPhone 3GS but the plateform is still marginal and even worse: Colors are expanded (converted to RGBA) making their use counter-productive! The effect is replicated via a three stages process:


Note: If a shift toward the same color can be achieved via "normal" blending (glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA)), invulnerability is a bit tricker to fake: glBlendFunc(GL_ONE_MINUS_DST_COLOR, GL_ZERO) is used instead to perform a negate.

The effect on iPhone (on the right) is similar but not as nice as what could be done with a palette (on the left).


Spinning wheel

Something new compared to the PC version is the spinnning wheel, prompting you to wait at the beginning of each level:



This happens while textures are uploaded to OpenGL and can be significant (up to 2s). This is done after the first frame has been drawn so player has something to look at. Precaching reduce frame skipping later in the level when new textures become visible, it is done in gld_Precache().

Profiling.

Attached Instruments to Doom (running on iPhone, not XCode simulator) provided some cool datas regarding the cost of every operations:



As expected most of the time allocated to the process is spent in GameThread, preparing the next frame via iPhoneFrame(). Note the sound system is not consuming CPU as it is processed on a dedicated chip (only sound effects are passed on the CPU).

Focusing on iPhoneFrame method:



This was a 120s session, this explain why texture uploading (gld_Precache) cost is so huge.

Focusing on iPhoneDrawScreen method:



A lot of time is spent in SwapBuffersAndTouches, this method clear the user input and instruct OpenGL to flip the framebuffer:
After more tracing it seems the mutex is not slowing down the touches swapping ( 2/3 micros on average) while swapping the framebuffer via [context presentRenderbuffer:GL_RENDERBUFFER_OES] run around 1-2 ms with jumps to 12000 to 24000 ms. iPhones are supposed to use triple buffering so I was unable to put an explanation on this :/ !

Doom 2

Does this new engine work well with Doom2 WAD archive ? Yes, absolutely. Just put doom2.wad in the base directory, remove doom.wad, deploy and play !!



Double Shotgun



Even the new monsters (here a "mancubus") are working fine:



I was expecting the new monsters to crash the engine but I realized Doom iPhone is based on PRBoom, itself based on UltimateDoom. All the monsters behavior function pointer are valid.


Compiling on 3.0 firmware and above.

Attemp to compile with iPhone SDK 3.0 but gcc raised an unusual error: crosses initialization of


	322	OSStatus BackgroundTrackMgr::SetupQueue(BG_FileInfo *inFileInfo) {
	323	UInt32 size = 0;
	324	OSStatus result = AudioQueueNewOutput(&inFileInfo->mFileFormat, QueueCallback, this, CFRunLoopGetMain() 
	325	
	326	AssertNoError("Error creating queue", end);
	327	// channel layout
	328	OSStatus err = AudioFileGetPropertyInfo(inFileInfo->mAFID, kAudioFilePropertyChannelLayout, &size,NULL);
	329	if (err == noErr && size > 0) {
	330		AudioChannelLayout *acl = (AudioChannelLayout *)malloc(size);
	331		result = AudioFileGetProperty(inFileInfo->mAFID, kAudioFilePropertyChannelLayout, &size, acl);
	332		AssertNoError("Error getting channel layout from file", end);
	333		result = AudioQueueSetProperty(mQueue, kAudioQueueProperty_ChannelLayout, acl, size);
	334		free(acl);
	335		AssertNoError("Error setting channel layout on queue", end);
	336	}
	337
	338	// volume
	339	result = SetVolume(mVolume);
	340	
	341	end:
	342	return result;
	343	}




	/Users/[..]/iphone/BackgroundMusic.cpp:341: error: jump to label 'end'
	/Users/[..]/iphone/BackgroundMusic.cpp:326: error:   from here
	/Users/[..]/BackgroundMusic.cpp:328: error:   crosses initialization of 'OSStatus err'



Jumping into the scope of an automatic variable may bypass constructor calls and is hence forbidden. Solution was to replace the MACRO with its corresponding define and remove the goto.


Recommended readings

A trip Down The PipeLine : Oldie But Goodie (and a pure gem that helped me to understand homegenous coordinates system and perspective projection).

Comments

 

Fabien Sanglard @2010