Fabien Sanglard's Website

June 8, 2012

Doom3 Source Code Review: Scripting VM (Part 5 of 6) >>

From idTech1 to idTech3 the only thing that completely changed every time was the scripting system:

idTech1: QuakeC running in a Virtual Machine.
idTech2: C compiled to an x86 shared library (no virtual Machine).
idTech3: C compiled to bytecode with LCC, running in QVM (Quake Virtual Machine). On x86 the bytecode was converted to native instructions at loadtime.

idTech4 is no exception, once again everything is different:

The scripting is done via an Object Oriented language similar to C++.
The language is fairly limited (no typedef, five basic types).
It is always interpreted via a virtual machine: There is no JIT conversion to native instruction like in idTech3 (John Carmack elaborated on this during our Q&A).

A good introduction is to read the Doom3 Scripting SDK notes.

Architecture

Here is the big picture:

Compilation : At loadtime the idCompiler is fed one predetermined.script file. A serie of #include directives will result in a script stack that contains all the scripts string and every functions source code. It is scanned by an idLexer that generates basic tokens. Tokens enter the idParser and one giant bytecode is generated and stored in idProgram singleton: This constitute the Virtual Machine RAM and contains both .text and .data VM segments.

Virtual Machine : At runtime the engine will allocate real CPU time to each idThread (one after an other) until the end of the linked list is reached. Each idThread contains an idInterpreter that saves the state of the Virtual CPU. Unless the interpreter go wild and run for more than 5,000,000 instructions it will not be pre-empted by the CPU: This is collaborative multitasking.

Compiler

The compilation pipeline is similar to what we can find reading any compiler such a V8 from Google or Clang except that there is no preprocessor. Hence functions such as "comment skipping", macro, directive (#include,#if) have to be done in the lexer and the parser.

Since the idLexer is reused all across the engine to parse every text assets (maps, entities, camera path) it is very primitive. As an example it only return five types of tokens:

TT_STRING
TT_LITERAL
TT_NUMBER
TT_NAME
TT_PUNCTUATION

So the parser actually has to perform much more than in a "standard" compiler pipeline.

At startup the idCompiler load the first script script/doom_main.script, a serie of #include will build a stack of scripts that are combined in one giant one.

The Parser seems to be a standard recursive descent top down parser. The scripting language grammar seems to be LL(1) necessitating 0 backtrack (even though the Lexer has the capability to "unread" up to one token). If you ever got a chance of reading the dragon book you will not be lost...otherwise this is a good reason to get started ;) !

Interpreter

At runtime, events trigger the creation of idThread that are not Operating System threads but Virtual Machine threads. They are given some runtime by the CPU. Each idThread has an idInterpreter that keeps track of the Intruction Pointer and the two stacks (one for the data/parameters and one to keep track of the function calls).

Execution occurs in idInterpreter::Execute until the interpreter relinquish control of the Virtual Machine: This is collaborative multi-tasking.


  idThread::Execute
   bool idInterpreter::Execute(void)
   {
       doneProcessing = false;
       while( !doneProcessing && !threadDying ) 
       {
           instructionPointer++;
       
           st = &gameLocal.program.GetStatement( instructionPointer );
           
           //op is an unsigned short, the VM can have 65,535 opcodes 
           switch( st->op ) {
                   .
                   .
                   .
           }
       }    
   }

Once the idInterpreter relinquish control the next idThread::Execute method is called until no more thread need execution time. The overal architecture reminded me a lot of Another World VM design.

Trivia : The bytecode is never converted to x86 instructions since it was not meant to be heavily used. But in the end too much was done via scripting and Doom3 would probably have benefited immensely from a JIT x86 converted just like Quake3 had.

Fabien Sanglard's Website

Doom3 Source Code Review: Scripting VM (Part 5 of 6) >>

Architecture

Compiler

Interpreter

Recommended readings