Git Source Code Review: Overview
Since its release in December 2005, git has taken over the software industry. In combination with GitHub it is now a powerful tool to publish and share code: From big teams (linux kernel, id Software, Epic Unreal) to single individual (Prince of Persia, Another world, Rick Dangerous), many have adopted it as their main SCM.
I wanted to get a better understanding of the "stupid content tracker" and see how it was built so I spent a few weeks in my spare time reading the source code. I found it tiny, tidy, well-documented and overall pleasant to read.
As usual I have compiled my notes into an article, maybe it will encourage some of us to read more source code and become better engineers.
Part 1: Overview
Part 2: Genesis
Part 3: Architecture
Part 4: Algorithms for DIFF
First contact
Getting the source code is easy :
git clone https://github.com/git/git
Compiling on Linux or MacOS X works "out of the box" as long as you only need English. If you want to build on Windows: Don't.
cd git echo "NO_GETTEXT = 1;" > config.mak make
Trivia : NO_GETTEXT
indicates that no string translation should be performed.
Git source code wraps all its character strings in a _N(msgid)
marker
so translation can be optionally downloaded before compilation.
What language is used ?
In its infancy Git was programmed entirely in C (the very first Git commit was performed with just 5 tiny executables: More about this in the genesis part). Today, the codebase is a mix of C (for performance critical operations) and portable shell script with an interesting architecture allowing usage of any programming language.
How big is it ?
I ran cloc to get an idea of the volume of code :
fabiensanglard$ cloc *.c *.h *.sh 317 text files. 317 unique files. 0 files ignored. http://cloc.sourceforge.net v 1.60 T=1.29 s (245.5 files/s, 93178.5 lines/s) ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- C 187 12037 11507 78511 Bourne Shell 29 866 918 7586 C/C++ Header 101 1314 1978 5604 ------------------------------------------------------------------------------- SUM: 317 14217 14403 91701 -------------------------------------------------------------------------------
At 91,701 CLOC, the volume of code is roughly equivalent to Quake engine which I was able to read within a month. Per today standards it is a small project: Linux kernel 3.10 release had 15,803,499 lines of code !
Browsing the source code
The Linux way to browse and write code is not via traditional IDE like Visual Studio, XCode or Eclipse. Most Linux developers use either Vim, Emacs or MicroEmacs. I ended up learning Vim because:
- I wanted to invest time in something that would be available everywhere.
- I wanted to invest time in something that would be usable during SSH sessions.
- I liked the old-school text only look and feel of Vim.
But coming from an Eclipse/XCode/Visual Studio world, the first contact with Vim was discouraging to say the least:
- There is no syntax color highlights, code is visually hard to read (not even to mention the syntax or semantic).
- There is no file explorer on the left side.
- There is no way to go directly to a method or variable definition (Command-Click in XCode, Command-Click on Eclipse/Visual Studio).
- Pressing the arrow keys does not move the cursor but instead produce garbage characters @#&**~` !
- Compiler errors are not reported in advance.
- There is no convenient search box.
In one sentence: I had to learn everything again :/ !
But it was fun to learn something new and after a few hours everything felt very fast and natural. Also, many plugins enable the commodities of a mainstream IDE :
- Syntax color highlights is a built-in features of Vim:
:syntax on
- Vim can jump directly to a definition with
ctags
. NerdTree
plugin allows to browse the files in the project.You Complete Me
plugin allows to see compiler errors in advance and even provided semantic autocompletion
The end result was a tool more powerful than I ever used before :
- Lightweight editor that consume very little RAM/CPU.
- Never block or crash.
- Fast as lighting since hands never leave the "asdf"/"jkl;" position.
- Most mainstream IDE features available.
- Can jump to function definition regardless of the language (C or Shell Script).
Trivia : In the 90s the editor War between Vim and Emacs users was a hot topic. I hope members of the Church of Emacs won't be offended.
Documentation
It is easier to find something when you know what you are looking for. Before reading Git I gathered as much documentation as possible. Usually open source projects are poor on this side but Git is an exception: Not only a few good articles and books have been published about it:
- The Architecture of Open Source Applications (Volume 2): Git
- Git Internals
- Pro Git
- LearnGitBranching
The developers have also done an outstanding job with the Git Documentation directory which contains wealth of explanations and man pages that are well-written and actually useful:
- Rare luxury they even feature drawings git-rebase(1).
.git
directory layout gitrepository-layout(5).- Principles of Plumbing and Porcelain git(1).
- Additional tools.
- Diff API api-diff.txt.
- Tutorial for the "core" git commands: gitcore-tutorial(7).
- The User manual is also really good and does a good job of explaining concepts and everything down to the not so well downpack files that allow efficient data storage.
Finally the official website (git-scm.com) is a good hub of additional resources.
Trivia : Everything still ain't all sunshine and rainbows. The infamous "Forward-port local commits to the updated upstream head" from git-rebase(1) is still there !
Next Part
Next : Genesis