Git Source Code Review: Genesis
To dive into a big codebase is often a daunting task that have discouraged many (I received many emails from game developers about Dooms and Quakes).
The advantage of Git project is that it was self-hosted from a very early stage: I was able to start with reading super-tiny-8-translation-units project from 9 years ago and then travel in time toward more complexity !
Part 1: Overview
Part 2: Genesis Part 3: Architecture Part 4: Algorithms for DIFF
Linus Torvalds gave an interesting talk at Google Talks 2007 :
A lot can be learned from it but one particular aspect is that Git reached self-hosting capability within two weeks of development (at 12m18s). That first commit can still be checked out: its SHA1 hash is
git checkout e83c5163316f89bfbde7d9ab23ca2e25604af290
Eight translation units and one header !
Fabiens-MacBook-Air:git fabiensanglard$ ls -l total 3048 -rw-r--r-- 1 fabiensanglard staff 957 17 Mar 23:23 Makefile -rw-r--r-- 1 fabiensanglard staff 8392 17 Mar 23:23 README -rw-r--r-- 1 fabiensanglard staff 2484 17 Mar 23:23 cache.h -rw-r--r-- 1 fabiensanglard staff 503 17 Mar 23:23 cat-file.c -rw-r--r-- 1 fabiensanglard staff 4103 17 Mar 23:23 commit-tree.c -rw-r--r-- 1 fabiensanglard staff 1198 17 Mar 23:23 init-db.c -rw-r--r-- 1 fabiensanglard staff 5681 17 Mar 23:23 read-cache.c -rw-r--r-- 1 fabiensanglard staff 986 17 Mar 23:23 read-tree.c -rw-r--r-- 1 fabiensanglard staff 2034 17 Mar 23:23 show-diff.c -rw-r--r-- 1 fabiensanglard staff 5395 17 Mar 23:23 update-cache.c -rw-r--r-- 1 fabiensanglard staff 1441 17 Mar 23:23 write-tree.c
This early version generated seven executables
were enough to generate snapshots.
Later during development shell-scripts using those tiny executables were added (i.e: contrib/examples/git-checkout.sh). At this stage, Git had
two distinct parts: Low-level C implemented Plumbing and High-level shell script based Porcelain (see left drawing).
As the project evolved, more shell scripts commands were added and some commands ended up being converted to "built-in" using C. Here is the example of
git-commit which started here in 2005 as
git-commit.sh and was ported to builtin C in commit
f5bbc3225c4b073a7ff3218164a0c820299bc9c6 here three years later in 2008.
Eventually a "dispatch" executable
git able to run C or shell script was added on top of everything :
All the commands
The old rule C = Plumbing and shell script = Porcelain is not more. A vast majority of the commands
have been converted to C. Here are the 102 git commands, with the C built-in in blue and shell script in black :
|Main (36)||Ancillary (30)||Interrogation (16)||Interaction (10)||Manip (17)||Interrogation (18)||Synching (12)||Internal (15)||