March 30th, 2014

Git Source Code Review: Overview

Since its release in December 2005, git has taken over the software industry. In combination with GitHub it is now a powerful tool to publish and share code: From big teams (linux kernel, id Software, Epic Unreal) to single individual (Prince of Persia, Another world, Rick Dangerous), many have adopted it as their main SCM.

I wanted to get a better understanding of the "stupid content tracker" and see how it was built so I spent a few weeks in my spare time reading the source code. I found it tiny, tidy, well-documented and overall pleasant to read.

As usual I have compiled my notes into an article, maybe it will encourage some of us to read more source code and become better engineers.

Part 1: Overview
Part 2: Genesis
Part 3: Architecture
Part 4: Algorithms for DIFF

First contact

Getting the source code is easy :



    git clone https://github.com/git/git

  


Compiling on Linux or MacOS X works "out of the box" as long as you only need English. If you want to build on Windows: Don't.



    cd git
    echo "NO_GETTEXT = 1;" > config.mak
    make

  

Trivia : NO_GETTEXT indicates that no string translation should be performed. Git source code wraps all its character strings in a _N(msgid) marker so translation can be optionally downloaded before compilation.

What language is used ?

In its infancy Git was programmed entirely in C (the very first Git commit was performed with just 5 tiny executables: More about this in the genesis part). Today, the codebase is a mix of C (for performance critical operations) and portable shell script with an interesting architecture allowing usage of any programming language.

How big is it ?

I ran cloc to get an idea of the volume of code :



    fabiensanglard$ cloc *.c *.h *.sh
     317 text files.
     317 unique files.                                          
       0 files ignored.

    http://cloc.sourceforge.net v 1.60  T=1.29 s (245.5 files/s, 93178.5 lines/s)
    -------------------------------------------------------------------------------
    Language                     files          blank        comment           code
    -------------------------------------------------------------------------------
    C                              187          12037          11507          78511
    Bourne Shell                    29            866            918           7586
    C/C++ Header                   101           1314           1978           5604
    -------------------------------------------------------------------------------
    SUM:                           317          14217          14403          91701
    -------------------------------------------------------------------------------


      

At 91,701 CLOC, the volume of code is roughly equivalent to Quake engine which I was able to read within a month. Per today standards it is a small project: Linux kernel 3.10 release had 15,803,499 lines of code !

Browsing the source code

The Linux way to browse and write code is not via traditional IDE like Visual Studio, XCode or Eclipse. Most Linux developers use either Vim, Emacs or MicroEmacs. I ended up learning Vim because:


But coming from an Eclipse/XCode/Visual Studio world, the first contact with Vim was discouraging to say the least:

In one sentence: I had to learn everything again :/ !

But it was fun to learn something new and after a few hours everything felt very fast and natural. Also, many plugins enable the commodities of a mainstream IDE :


The end result was a tool more powerful than I ever used before :



Trivia : In the 90s the editor War between Vim and Emacs users was a hot topic. I hope members of the Church of Emacs won't be offended.

Documentation

It is easier to find something when you know what you are looking for. Before reading Git I gathered as much documentation as possible. Usually open source projects are poor on this side but Git is an exception: Not only a few good articles and books have been published about it:

The developers have also done an outstanding job with the Git Documentation directory which contains wealth of explanations and man pages that are well-written and actually useful:

Finally the official website (git-scm.com) is a good hub of additional resources.

Trivia : Everything still ain't all sunshine and rainbows. The infamous "Forward-port local commits to the updated upstream head" from git-rebase(1) is still there !

Next Part

Next : Genesis

 

@