March 30th, 2014

Git Source Code Review: Architecture

I love you Hoi-En.
Git has an unusual code structure that provides flexibility to developers (use of C/shell script and in theory any language they want) and users for calling conventions.

Part 1: Overview
Part 2: Genesis
Part 3: Architecture
Part 4: Algorithms for DIFF

Git Architecture

As seen in the history page, git commands can be either C or shell script (in theory new commands can be added in any language). On top of that they can be called with two different syntax from the command line :

How does this translate in terms of code ?

Here is a drawing showing how code pieces of code are compiled and how they interact together:

The dispatcher

Since the git executable is the most used part in the community, I spent more timelooking how this part worked. It behaves as follow :

  1. Search if the command is builtin: git.c features an array commands [ ] with function pointers to all builtin entry points. Lookup is done with a simple linear search: Binary search/ Hashtable are not worth it for 100 items.
  2. If found, branch to the command and then return.
  3. If not found, look for the command as external dashed shell script.
  4. If found, fork, exec and return.
  5. If still nothing found, try to find a suggestion and present it to user.

Here is the flatened code :

   int main(int argc, char **av){



         run_argv(&argc, &argv){

            /* See if it's a builtin */
            handle_builtin(*argcp, *argv);
               run_builtin(struct cmd_struct *p, int argc,...)
                  status = p->fn(argc, argv, prefix);     /* Just branch to the command function pointer */

            /* .. then try the external ones */
                           fork                   // 1. fork
                                 execvp           // 2. exec

         if (errno != ENOENT)

         help_unknown_cmd(cmd);     // Could not find the command: maybe a typo ? Let's find a suggestion. 


      return 1;



Command Structure

Each command must have an entry point with the following signature: int cmd_foo(int argc, const char **argv, const char *prefix). Commands uses common API and modules as much as possible but also pointers to local functions in order to obtain specific behavior.

Example : git-add is contained in translation unit builtin/add.c. It uses common code (parse_options from parse-options.c) to transform commands parameters flags into lexemes but the parsing behavior is defined with local function pointers (builtin_add_options and builtin_add_usage from builtin/add.c) :

    int cmd_add(int argc, const char **argv, const char *prefix)

        // the ADD command used the common function parse_options to parse the options...
        // but pass the function pointers builtin_add_options and builtin_add_usage to define a specific behavior.

        argc = parse_options(argc, argv, prefix,builtin_add_options,builtin_add_usage, PARSE_OPT_KEEP_ARGV0);

        // Global variables (but local to the translation unit) have been populated.

        if (patch_interactive)  ...
        if (add_interactive) ...
        if (edit_interactive) ...

        write_cache(newfd, active_cache, active_nr) ;


The idea is better presented with a drawing :

Commands inter-communications

Builtin commands call each others by building only the parameters string and then calling the builtin C method: See how git-merge calls git-reset --merge:

    int cmd_merge(int argc, const char **argv, const char *prefix)


        const char *nargv[] = {"reset", "--merge", NULL};

        /* Invoke 'git reset --merge' */
        ret = cmd_reset(nargc, nargv, prefix);



Alternatively a builtin command can use the Command API (from run-command.c) which will do a fork/exec.

External shell script commands just call other command by building the full string (command+parameteres). Here is an example where calls git-diff-index.


Git algorithm: diff.



Fabien Sanglard @2014