December 23, 2011

"Another World" Code Review

I spent two weeks reading and reverse engineering further the source code of Another World ("Out Of This World" in North America). I based my work on Gregory Montoir's "binary to C++" initial reverse engineering from the DOS executable.

I was amazed to discover an elegant system based on a virtual machine interpreting bytecode in realtime and generating fullscreen vectorial cinematic in order to produce one of the best game of all time.

All this shipping on a 1.44MB floppy disk and running within 600KB of RAM: Not bad for 1991 ! As usual I cleaned up my notes, it may save a few hours to someone.


But...What source code ?!

The source code of "Another World' was never officially released nor leaked. Some people were so passionate about this groundbreaking game that they reverse engineered the DOS executable.

This was possible partly because the binary was small (20KB). Why so small ? Because ANOTHER.EXE was not the game itself but just a virtual machine:

The bytecode performs all the game logic with its own opcodes but uses syscalls for "heavy" stuff like drawing, playing music, sound and managing assets.

To implement only the virtual machine for the target OS reduced the effort and the game was broadly ported to more than a dozen platforms:

Every time only the virtual machine had to be compiled to the target OS: The bytecode remained the same !


Architecture

The executable is 20KB. It can be summarized as:



We can see four modules:

Trivia : The palette memory segment actually contains several palettes, used for nice fading effects.

Upon startup, the executable sets the virtual machine's thread 0 program counter with 0x00 and start interpreting. Everything is commanded by the bytecode after that.


Rendition explained

In the previous drawing we see three framebuffers. Two because Another World implements Double Buffering" in software....and a third one as a clever optimization:

The third framebuffer is used to compose the background of a scene only once and then reuse it frame after frame with a simple memcpy:



In this video the legendary first level screen of Another World has been slowed down so we can actually see things being drawn. Everything is drawn with polygons and pixigons. Overdraw is very substantial but since this is only generated once it is not so bad.

Trivia : This famous background is made of 981 polygons.

In order to visualize the big picture I have slowed down and rendered the three framebuffer + what is seen on screen:


We can see very clearly:



If you want to analyze more : Full video.


Another World Virtual Machine

Eric Chahi's webpage explains a lot about how the machine is structured.

In the code on github you can see how every opcode have been implemented. All of them are pretty easy to understand except for the renditions ones. The trick is that the polygon segment source where the vertices should be read is embedded with the opcode id.

Finally a few screenshots from the vm bytecode editor (Called "script editor" by Eric Chahi):




You can see how the label have been lost: setvec 21 nag1 sets the thread 21 instruction counter at "nag1" label offset. In the bytecode we can only see a hardcoded offset.



Opcode cases


In the following drawings we can see the virtual machine calling an opcode that is actually a system call to the resource manager in order to load the four memory segments. This happens typically at the beginning of a game part (The entire game is made of 10 game parts):



In the next drawing the opcode is also a systemcall to the renderer asking to draw and fetch vertices. The rendition opcode are a bit more complex because they contains where to read the vertices from. To set the target framebuffer is an independent opcode altogether:


Note: Whether the render should read vertices from the cinematic polygon segment or the animation segment is encoded with the opcodeId.


Resource Management

Resources are identified by an unique integer id. Upon startup the resource manager opens MEMLIST.BIN and get records as follow:



           
            typedef struct memEntry_s
            {

            	int bankId;
            	int offset;
            	int size;
            	int unpackedSize;

            } memEntry_t;



			

When the vm requests a resourceId, the resource manager:


A few stats about the compression:

			
			
  Total # resources: 146
  Compressed       : 120
  Uncompressed     :  28
  Note: 82% of resources are compressed.


  Total size (uncompressed) : 1820901 bytes.
  Total size (compressed)   : 1236519 bytes.
  Note: Overall compression gain is : 32%.


  Total RT_SOUND          unpacked size:  699868 (38% of total unpacked size) packedSize  585052 (47% of floppy space) gain:(16%)
  Total RT_MUSIC          unpacked size:   33344 ( 2% of total unpacked size) packedSize    3540 ( 0% of floppy space) gain:(89%)
  Total RT_POLY_ANIM      unpacked size:  384000 (21% of total unpacked size) packedSize  106676 ( 9% of floppy space) gain:(72%)
  Total RT_PALETTE        unpacked size:   18432 ( 1% of total unpacked size) packedSize   11032 ( 1% of floppy space) gain:(40%)
  Total RT_BYTECODE       unpacked size:  203546 (11% of total unpacked size) packedSize  135948 (11% of floppy space) gain:(33%)
  Total RT_POLY_CINEMATIC unpacked size:  365960 (20% of total unpacked size) packedSize  291008 (24% of floppy space) gain:(20%)
  Note: Damn you sound compression rate!

  Total bank files:              148
  Total RT_SOUND          files: 103
  Total RT_MUSIC          files:   3
  Total RT_POLY_ANIM      files:  12
  Total RT_PALETTE        files:   9
  Total RT_BYTECODE       files:   9
  Total RT_POLY_CINEMATIC files:   9


			

I did not spent time reverse engineering the compression algorithm...the fact that sound doesn't compress very well leads me to believe it is entropy sensitive...so maybe a variation of huffman ?


Out of 146 resources: 120 are compressed:


Trivia : The introduction (resource 0x1C), 3 minutes long weights only 57,510 bytes once compressed.


Memory Management

Like all games from the 90s no memory is allocated during gameplay. Upon startup the game engine grabs 600KB of memory ( anybody remember DOS 640KB conventional memory here ?). Those 600KB are used as a stack allocator:


Free memory: The memory manager has the capability to unallocate one step back OR free the entire memory. In practice the entire memory is freed at the end of each 10 game parts.

Trivia : Originally the entire 600KB was storing bytecode and vertices. But after two years of generating the backgrounds with polygons/pixigons the game was still far from being done. In order to speed up the development speed Eric Chahi decided to integrate a hack in his beautiful architecture (at a performance cost): The resource manager can load background bitmap from the floppy disk to the background buffer (void copyToBackgroundBuffer(const uint8 *src);). Hence 32KB (320x200/2) are reserved at the end of the conventional memory.

Trivia : This hack was exploited for the release of Another World for Windows XP in 2005. All background were hand drawn and loaded directly from hard-drive without using the renderer and its pixigons:



Purist corner

If you are a purist and really want to play the original way, Another World works like a charm in DosBOX:



Or you can run the windows XP version. I recommend to get the Collector's edition since it feature a lot of additional informations, among them the techical notes from Eric Chahi:



One more thing

I worked on the code a lot, making it simpler to understand. You can see an example of how much clearer it is now.

Before:



   void Logic::runScripts() {                                                                                                
      for (int i = 0; i < 0x40; ++i) {                                                                                  
        if (_scriptPaused[0][i] == 0) {                                                                           
	     uint16 n = _scriptSlotsPos[0][i];                                                                 
	     if (n != 0xFFFF) {                                                                                
	         _scriptPtr.pc = _res->_segCode + n;                                                       
	         _stackPtr = 0;                                                                            
	         _scriptHalted = false;                                                                    
	         debug(DBG_LOGIC, "Logic::runScripts() i=0x%02X n=0x%02X *p=0x%02X", i, n, *_scriptPtr.pc);
	         executeScript();                                                                          
	         _scriptSlotsPos[0][i] = _scriptPtr.pc - _res->_segCode;                                   
	         debug(DBG_LOGIC, "Logic::runScripts() i=0x%02X pos=0x%X", i, _scriptSlotsPos[0][i]);      
	         if (_stub->_pi.quit) {                                                                    
	            break;                                                                                					
	         }                                                                                       
	     }                                                                                                 					
	    }                                                                                                         					                                                                                                             
	  }                                                                                                                 						                                                                                                              
	}                                                                                                                         			
			
			   

After:



  void VirtualMachine::hostFrame() {                                                                       
                                                                                                         
	// Run the Virtual Machine for every active threads (one vm frame).                                     
	// Inactive threads are marked with a thread instruction pointer set to 0xFFFF (VM_INACTIVE_THREAD).    
	// A thread must feature a break opcode so the interpreter can move to the next thread.                 
                                                                                                         
	for (int threadId = 0; threadId < VM_NUM_THREADS; threadId++) {                                         
                                                                                                         
		if (!vmIsChannelActive[CURR_STATE][threadId])                                                           
			continue;                                                                                             
		                                                                                                       
		uint16 pcOffset = threadsData[PC_OFFSET][threadId];                                                    
                                                                                                         
		if (pcOffset != VM_INACTIVE_THREAD) {                                                                  
                                                                                                         
			// Set the script pointer to the right location.                                                      
			// script pc is used in executeThread in order                                                        
			// to get the next opcode.                                                                            
			_scriptPtr.pc = res->segBytecode + pcOffset;                                                          
			_stackPtr = 0;                                                                                        
                                                                                                         
			gotoNextThread = false;                                                                               
			debug(DBG_VM, "VirtualMachine::hostFrame() i=0x%02X n=0x%02X *p=0x%02X", threadId, n, *_scriptPtr.pc);
			executeThread();                                                                                      
                                                                                                         
			//Since .pc is going to be modified by this next loop iteration, we need to save it.                  
			threadsData[PC_OFFSET][threadId] = _scriptPtr.pc - res->segBytecode;                                  
                                                                                                         
			debug(DBG_VM, "VirtualMachine::hostFrame() i=0x%02X pos=0x%X", threadId, threadsData[0][threadId]);
			   
			if (sys->input.quit) {                                                                                
				break;                                                                                               
			}                                                                                                     
		}                                                                                                      
		                                                                                                       
	}                                                                                                       
  }    
  
                                                                                                      


I used:


Here is the "human readable" source code :) ! Happy hacking.


Edit (video presentation)

Jeff Somers submitted a link to a fantastic video from GDC Vault in which Eric Chahi talks about the Genesis of Another World. Thanks a lot Jeff :) !

 

@