Git Source Code Review: Architecture
Git has an unusual code structure that provides flexibility to developers (use of C/shell script and in theory any language they want)
and users for calling conventions.
Part 1: Overview
Part 2: Genesis
Part 3: Architecture
Part 4: Algorithms for DIFF
Git Architecture
As seen in the history page, git commands can be either C or shell script (in theory new commands can be added in any language). On top of that they can be called with two different syntax from the command line :
- Using the git executable :
git add newfile
. - Using the command directly with the dashed syntax :
git-add newfile
.
How does this translate in terms of code ?
- Shell script is authored as usual but C commands must all have the same entry point within a translation unit.
- C implemented commands are compiled once but linked twice: Once to be included in git executable and once to be included in their own executable.
- git executable acts as a dispatch for builtin commands but also as a launcher for shell scripts via
fork
/exec
. - Shell scripts with
.sh
extension are copied, marked as executable and renamed without extension to behave like binary executable.
Here is a drawing showing how code pieces of code are compiled and how they interact together:
The dispatcher
Since the git executable is the most used part in the community, I spent more timelooking how this part worked. It behaves as follow :
- Search if the command is builtin:
git.c
features an arraycommands [ ]
with function pointers to all builtin entry points. Lookup is done with a simple linear search: Binary search/ Hashtable are not worth it for 100 items. - If found, branch to the command and then return.
- If not found, look for the command as external dashed shell script.
- If found,
fork
,exec
and return. - If still nothing found, try to find a suggestion and present it to user.
Here is the flatened code :
int main(int argc, char **av){ handle_options(); setup_path(); while(1){ run_argv(&argc, &argv){ /* See if it's a builtin */ handle_builtin(*argcp, *argv); run_builtin(struct cmd_struct *p, int argc,...) status = p->fn(argc, argv, prefix); /* Just branch to the command function pointer */ /* .. then try the external ones */ execv_dashed_external(*argv); run_command_v_opt prepare_run_command_v_opt run_command start_command fork // 1. fork execv_shell_cmd sane_execvp execvp // 2. exec finish_command } if (errno != ENOENT) break; help_unknown_cmd(cmd); // Could not find the command: maybe a typo ? Let's find a suggestion. } return 1; }
Command Structure
Each command must have an entry point with the following signature: int cmd_foo(int argc, const char **argv, const char *prefix)
. Commands uses common API and modules as much as possible but also pointers to local functions in order to obtain specific behavior.
Example : git-add is contained in translation unit builtin/add.c
. It uses common code (parse_options
from parse-options.c
) to transform commands parameters flags into lexemes but the parsing behavior is defined with local function pointers (builtin_add_options
and builtin_add_usage
from builtin/add.c
) :
int cmd_add(int argc, const char **argv, const char *prefix) { // the ADD command used the common functionparse_options
to parse the options... // but pass the function pointersbuiltin_add_options
andbuiltin_add_usage
to define a specific behavior. argc = parse_options(argc, argv, prefix,builtin_add_options,builtin_add_usage, PARSE_OPT_KEEP_ARGV0); // Global variables (but local to the translation unit) have been populated. if (patch_interactive) ... if (add_interactive) ... if (edit_interactive) ... write_cache(newfd, active_cache, active_nr) ; }
The idea is better presented with a drawing :
Commands inter-communications
Builtin commands call each others by building only the parameters string and then calling the builtin C method: See how git-merge
calls git-reset --merge
:
int cmd_merge(int argc, const char **argv, const char *prefix) { [..] const char *nargv[] = {"reset", "--merge", NULL}; /* Invoke 'git reset --merge' */ ret = cmd_reset(nargc, nargv, prefix); }
Alternatively a builtin command can use the Command API (from run-command.c
) which will do a fork
/exec
.
External shell script commands just call other command by building the full string (command+parameteres). Here is an example where git-pull.sh
calls git-diff-index
.