Doom3 Source Code Review: Profiling (Part 4 of 6) >>
XCode comes with a great tool for profiling: Instruments. I used it in sampling mode during a playing session (removing the game loading and level GPU pre-caching altogether):
The high level loop shows the three threads running in the process:
- Main thread where gamelogic and rendition occur.
- Auxiliary thread were inputs are collected and sound effects are mixed.
- Music thread (consuming 8% of resources), created by CoreAudio and calling
idAudioHardwareOSXat regular intervals (note: sound effects are done with OpenAL but do not run in their own thread).
The Doom 3 MainThead runs...
QuakeMain! Amusingly the team that ported Quake 3 to Mac OS X must have reused some old code. Inside the time repartition is as follow:
- 65% dedicated to graphic rendition (
- 25% dedicated to gamelogic: This is surprisingly high for an id Software game.
The gamelogic occurs in gamex86.dll space (or game.dylib on Mac OS X):
The game logic account for 25% of the Main Thread time which is unusually high. Two reasons:
- I.A: The virtual machine is run and allows entities to think. All of the bytecode is interpreted and the scripting language seems to have been overused.
- The Physic engine is more complex (LCP solvers) and hence more demanding than previous games. It is run on each object and include ragdoll and interactions solving.
As previously described the renderer is made of two parts:
- Frontend (
idSessionLocal::Draw) accounting for 43.9% of the rendition process. Note that
Drawis a pretty poor name since the frontend does not perform a single draw call to OpenGL !
- Backend (
idRenderSessionLocale:EndFrame) accounting for 55.9% of the rendition process.
The load distribution is pretty much even and it is not that surprising since:
- The frontend performs a lot of calculation with regard to Visual Surface Determination.
- The frontend also performs model animation and shadow silhouette finding.
- The frontend upload vertices to the GPU.
- The backend spends a lot of time setting up parameters for the shaders and communicating with the GPU (i.e: submitting triangles indices or per vertex normal matrix for bumpmapping in
No surprise here, most of the time (91%) is spent uploading data to the GPU in VBOs (
R_AddModelSurfaces). A little bit of time (4%) is visible when going through areas, trying to find all interactions (
R_AddLightSurfaces). A minimal amount (2.9%) is spent in Visual Surface Determination: Traversing the BSP and running the portal system.
The backend obviously triggers a buffer swap (
GLimp_SwapBuffers) and spend some time synchronizing (10%) with the screen since the game was running in double buffering environment.
5% is the cost of avoiding totally overdraw with a first pass aiming to populate the Z-Buffer first (
If you feel like loading the Instruments trace and exploring yourself: Here it the profile file.