To dive into a big codebase is often a daunting task that have discouraged many (I received many emails from game developers about Dooms and Quakes).
The advantage of Git project is that it was self-hosted from a very early stage: I was able to start with reading super-tiny-8-translation-units project from 9 years ago and then travel in time toward more complexity !
Linus Torvalds gave an interesting talk at Google Talks 2007 :
A lot can be learned from it but one particular aspect is that Git reached self-hosting capability within two weeks of development (at 12m18s). That first commit can still be checked out: its SHA1 hash is e83c5163316f89bfbde7d9ab23ca2e25604af290.
Fabiens-MacBook-Air:git fabiensanglard$ ls -l
total 3048
-rw-r--r-- 1 fabiensanglard staff 957 17 Mar 23:23 Makefile
-rw-r--r-- 1 fabiensanglard staff 8392 17 Mar 23:23 README
-rw-r--r-- 1 fabiensanglard staff 2484 17 Mar 23:23 cache.h
-rw-r--r-- 1 fabiensanglard staff 503 17 Mar 23:23 cat-file.c
-rw-r--r-- 1 fabiensanglard staff 4103 17 Mar 23:23 commit-tree.c
-rw-r--r-- 1 fabiensanglard staff 1198 17 Mar 23:23 init-db.c
-rw-r--r-- 1 fabiensanglard staff 5681 17 Mar 23:23 read-cache.c
-rw-r--r-- 1 fabiensanglard staff 986 17 Mar 23:23 read-tree.c
-rw-r--r-- 1 fabiensanglard staff 2034 17 Mar 23:23 show-diff.c
-rw-r--r-- 1 fabiensanglard staff 5395 17 Mar 23:23 update-cache.c
-rw-r--r-- 1 fabiensanglard staff 1441 17 Mar 23:23 write-tree.c
This early version generated seven executables update-cacheshow-diffinit-dbwrite-treeread-treecommit-treecat-file which
were enough to generate snapshots.
Later during development shell-scripts using those tiny executables were added (i.e: contrib/examples/git-checkout.sh). At this stage, Git had
two distinct parts: Low-level C implemented Plumbing and High-level shell script based Porcelain (see left drawing).
As the project evolved, more shell scripts commands were added and some commands ended up being converted to "built-in" using C. Here is the example of git-commit which started here in 2005 as git-commit.sh and was ported to builtin C in commit f5bbc3225c4b073a7ff3218164a0c820299bc9c6here three years later in 2008.
Eventually a "dispatch" executable git able to run C or shell script was added on top of everything :
All the commands
The old rule C = Plumbing and shell script = Porcelain is not more. A vast majority of the commands
have been converted to C. Here are the 102 git commands, with the C built-in in blue and shell script in black :