March 30th, 2014

Git Source Code Review: Genesis

I love you Hoi-En.
To dive into a big codebase is often a daunting task that have discouraged many (I received many emails from game developers about Dooms and Quakes).

The advantage of Git project is that it was self-hosted from a very early stage: I was able to start with reading super-tiny-8-translation-units project from 9 years ago and then travel in time toward more complexity !

Part 1: Overview
Part 2: Genesis
Part 3: Architecture
Part 4: Algorithms for DIFF


The beginning

Linus Torvalds gave an interesting talk at Google Talks 2007 :
A lot can be learned from it but one particular aspect is that Git reached self-hosting capability within two weeks of development (at 12m18s). That first commit can still be checked out: its SHA1 hash is e83c5163316f89bfbde7d9ab23ca2e25604af290.



    git checkout e83c5163316f89bfbde7d9ab23ca2e25604af290

  

Eight translation units and one header !

 
    Fabiens-MacBook-Air:git fabiensanglard$ ls -l
    total 3048
    -rw-r--r--  1 fabiensanglard  staff      957 17 Mar 23:23 Makefile
    -rw-r--r--  1 fabiensanglard  staff     8392 17 Mar 23:23 README
    -rw-r--r--  1 fabiensanglard  staff     2484 17 Mar 23:23 cache.h
    -rw-r--r--  1 fabiensanglard  staff      503 17 Mar 23:23 cat-file.c
    -rw-r--r--  1 fabiensanglard  staff     4103 17 Mar 23:23 commit-tree.c
    -rw-r--r--  1 fabiensanglard  staff     1198 17 Mar 23:23 init-db.c
    -rw-r--r--  1 fabiensanglard  staff     5681 17 Mar 23:23 read-cache.c
    -rw-r--r--  1 fabiensanglard  staff      986 17 Mar 23:23 read-tree.c
    -rw-r--r--  1 fabiensanglard  staff     2034 17 Mar 23:23 show-diff.c
    -rw-r--r--  1 fabiensanglard  staff     5395 17 Mar 23:23 update-cache.c
    -rw-r--r--  1 fabiensanglard  staff     1441 17 Mar 23:23 write-tree.c


  


This early version generated seven executables update-cache show-diff init-db write-tree read-tree commit-tree cat-file which were enough to generate snapshots.

Later during development shell-scripts using those tiny executables were added (i.e: contrib/examples/git-checkout.sh). At this stage, Git had two distinct parts: Low-level C implemented Plumbing and High-level shell script based Porcelain (see left drawing).

As the project evolved, more shell scripts commands were added and some commands ended up being converted to "built-in" using C. Here is the example of git-commit which started here in 2005 as git-commit.sh and was ported to builtin C in commit f5bbc3225c4b073a7ff3218164a0c820299bc9c6 here three years later in 2008.

Eventually a "dispatch" executable git able to run C or shell script was added on top of everything :

All the commands

The old rule C = Plumbing and shell script = Porcelain is not more. A vast majority of the commands have been converted to C. Here are the 102 git commands, with the C built-in in blue and shell script in black :

Porcelain Plumbing
Main (36) Ancillary (30) Interrogation (16) Interaction (10) Manip (17) Interrogation (18) Synching (12) Internal (15)
git-add
git-am
git-archive
git-bisect
git-branch
git-bundle
git-checkout
git-cherry-pick
git-citool
git-clean
git-clone
git-commit
git-describe
git-diff
git-fetch
git-format-patch
git-gc
git-grep
git-gui
git-init
git-log
git-merge
git-mv
git-notes
git-pull
git-push
git-rebase
git-reset
git-revert
git-rm
git-shortlog
git-show
git-stash
git-status
git-submodule
git-tag
gitk
git-config
git-fast-export
git-fast-import
git-filter-branch
git-lost-found
git-mergetool
git-pack-refs
git-prune
git-reflog
git-relink
git-remote
git-repack
git-replace
git-repo-config
git-annotate
git-blame
git-cherry
git-count-objects
git-difftool
git-fsck
git-get-tar-commit-id
git-help
git-instaweb
git-merge-tree
git-rerere
git-rev-parse
git-show-branch
git-verify-tag
git-whatchanged
gitweb
git-archimport
git-cvsexportcommit
git-cvsimport
git-cvsserver
git-imap-send
git-p4
git-quiltimport
git-request-pull
git-send-email
git-svn
git-apply
git-checkout-index
git-commit-tree
git-hash-object
git-index-pack
git-merge-file
git-merge-index
git-mktag
git-mktree
git-pack-objects
git-prune-packed
git-read-tree
git-symbolic-ref
git-unpack-objects
git-update-index
git-update-ref
git-write-tree
git-cat-file
git-diff-files
git-diff-index
git-diff-tree
git-for-each-ref
git-ls-files
git-ls-remote
git-ls-tree
git-merge-base
git-name-rev
git-pack-redundant
git-rev-list
git-show-index
git-show-ref
git-tar-tree
git-unpack-file
git-var
git-verify-pack
git-daemon
git-fetch-pack
git-http-backend
git-send-pack
git-update-server-info
git-http-fetch
git-http-push
git-parse-remote
git-receive-pack
git-shell
git-upload-archive
git-upload-pack
git-check-attr
git-check-ref-format
git-column
git-credential
git-credential-cache
git-credential-store
git-fmt-merge-msg
git-mailinfo
git-mailsplit
git-merge-one-file
git-patch-id
git-peek-remote
git-sh-i18n
git-sh-setup
git-stripspace

Next Part

Next: Architecture

Comments

 

Fabien Sanglard @2014