Fabien Sanglard's Website

March 30th, 2014

Git Source Code Review: Overview

Since its release in December 2005, git has taken over the software industry. In combination with GitHub it is now a powerful tool to publish and share code: From big teams (linux kernel, id Software, Epic Unreal) to single individual (Prince of Persia, Another world, Rick Dangerous), many have adopted it as their main SCM.

I wanted to get a better understanding of the "stupid content tracker" and see how it was built so I spent a few weeks in my spare time reading the source code. I found it tiny, tidy, well-documented and overall pleasant to read.

As usual I have compiled my notes into an article, maybe it will encourage some of us to read more source code and become better engineers.

Part 1: Overview
Part 2: Genesis
Part 3: Architecture
Part 4: Algorithms for DIFF

First contact

Getting the source code is easy :



    git clone https://github.com/git/git

Compiling on Linux or MacOS X works "out of the box" as long as you only need English. If you want to build on Windows: Don't.



    cd git
    echo "NO_GETTEXT = 1;" > config.mak
    make

Trivia : NO_GETTEXT indicates that no string translation should be performed. Git source code wraps all its character strings in a _N(msgid) marker so translation can be optionally downloaded before compilation.

What language is used ?

In its infancy Git was programmed entirely in C (the very first Git commit was performed with just 5 tiny executables: More about this in the genesis part). Today, the codebase is a mix of C (for performance critical operations) and portable shell script with an interesting architecture allowing usage of any programming language.

How big is it ?

I ran cloc to get an idea of the volume of code :



    fabiensanglard$ cloc *.c *.h *.sh
     317 text files.
     317 unique files.                                          
       0 files ignored.

    http://cloc.sourceforge.net v 1.60  T=1.29 s (245.5 files/s, 93178.5 lines/s)
    -------------------------------------------------------------------------------
    Language                     files          blank        comment           code
    -------------------------------------------------------------------------------
    C                              187          12037          11507          78511
    Bourne Shell                    29            866            918           7586
    C/C++ Header                   101           1314           1978           5604
    -------------------------------------------------------------------------------
    SUM:                           317          14217          14403          91701
    -------------------------------------------------------------------------------

At 91,701 CLOC, the volume of code is roughly equivalent to Quake engine which I was able to read within a month. Per today standards it is a small project: Linux kernel 3.10 release had 15,803,499 lines of code !

Browsing the source code

The Linux way to browse and write code is not via traditional IDE like Visual Studio, XCode or Eclipse. Most Linux developers use either Vim, Emacs or MicroEmacs. I ended up learning Vim because:

I wanted to invest time in something that would be available everywhere.
I wanted to invest time in something that would be usable during SSH sessions.
I liked the old-school text only look and feel of Vim.

But coming from an Eclipse/XCode/Visual Studio world, the first contact with Vim was discouraging to say the least:

There is no syntax color highlights, code is visually hard to read (not even to mention the syntax or semantic).
There is no file explorer on the left side.
There is no way to go directly to a method or variable definition (Command-Click in XCode, Command-Click on Eclipse/Visual Studio).
Pressing the arrow keys does not move the cursor but instead produce garbage characters @#&**~` !
Compiler errors are not reported in advance.
There is no convenient search box.

In one sentence: I had to learn everything again :/ !

But it was fun to learn something new and after a few hours everything felt very fast and natural. Also, many plugins enable the commodities of a mainstream IDE :

Syntax color highlights is a built-in features of Vim: :syntax on
Vim can jump directly to a definition with ctags.
NerdTree plugin allows to browse the files in the project.
You Complete Me plugin allows to see compiler errors in advance and even provided semantic autocompletion

The end result was a tool more powerful than I ever used before :

Lightweight editor that consume very little RAM/CPU.
Never block or crash.
Fast as lighting since hands never leave the "asdf"/"jkl;" position.
Most mainstream IDE features available.
Can jump to function definition regardless of the language (C or Shell Script).

Trivia : In the 90s the editor War between Vim and Emacs users was a hot topic. I hope members of the Church of Emacs won't be offended.

Documentation

It is easier to find something when you know what you are looking for. Before reading Git I gathered as much documentation as possible. Usually open source projects are poor on this side but Git is an exception: Not only a few good articles and books have been published about it:

The Architecture of Open Source Applications (Volume 2): Git

The developers have also done an outstanding job with the Git Documentation directory which contains wealth of explanations and man pages that are well-written and actually useful:

Rare luxury they even feature drawings git-rebase(1).
.git directory layout gitrepository-layout(5).
Principles of Plumbing and Porcelain git(1).
Additional tools.
Diff API api-diff.txt.
Tutorial for the "core" git commands: gitcore-tutorial(7).
The User manual is also really good and does a good job of explaining concepts and everything down to the not so well downpack files that allow efficient data storage.

Finally the official website (git-scm.com) is a good hub of additional resources.

Trivia : Everything still ain't all sunshine and rainbows. The infamous "Forward-port local commits to the updated upstream head" from git-rebase(1) is still there !

Next Part

Next : Genesis