The Compiler (3/5)

← →

driver
cpp	cc*	ld	loader

The compiler stage is the most complicated element in the pipeline. Because of the purpose of these articles, this is going to be the simplest part. If you want to learn how compilers work inside out, refer to the Dragon book.

Compiler big picture

The goal of the compiler is to open a translation unit, parse it, optimize it and output an object file (except in the case of LTO which is discussed later). These object files are also sometimes called relocatable.

All compilers are structured the same way with.

A Frontend which ingest the text and transform it into an Intermediate Representation (IR).
An Optimizer which iterates on the IR with a collection of optimizations.
A Backend which generates machine specific instructions. gcc generates assembly and convert it to machine code with binutils's as while clang has the assembling fully built-in.

The machine code output is packaged into an object file format container.

Clang is a frontend which generates an IR consumed by LLVM backend. Its well documented and kinda human readable IR format has opened the door to many tools. Among them is Rust's compiler, rustc, which is a LLVM frontend in charge of generating LLVM IR.

Output format

The input format, the translation unit, was studied in the previous section about the preprocessor. Let's focus on what the compiler has to output. The format is given to us via the tool file after requesting the driver to output a relocatable file instead of an executable.

// mult.c

int mul(int x, int y);

int pow(int x) { return mul(x, x) ; }

Note how using -c flag simply made the driver call itself in compiler mode (-cc1) and skip the linker stage.

$ clang -v -c mult.c -o mult.o
clang -cc1 mult.c -o mult.o
$ file mult.o
mult.o: ELF 64-bit LSB relocatable, ARM aarch64, version 1 (SYSV), not stripped

The relocatable files are commonly called "object" file and use a .o extension. Let's use binutils's readelf to peek inside it.

$ readelf -S -W mult.o
There are 9 section headers, starting at offset 0x1d8:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .strtab           STRTAB          0000000000000000 0001b1 000071 00      0   0  1
  [ 2] .text             PROGBITS        0000000000000000 000040 000028 00  AX  0   0  4
  [ 3] .rela.text        RELA            0000000000000000 000180 000018 18   I  9   2  8
  [ 4] .comment          PROGBITS        0000000000000000 000068 000026 01  MS  0   0  1
  [ 5] .note.GNU-stack   PROGBITS        0000000000000000 00008e 000000 00      0   0  1
  [ 6] .eh_frame         PROGBITS        0000000000000000 000090 000030 00   A  0   0  8
  [ 7] .rela.eh_frame    RELA            0000000000000000 000198 000018 18   I  9   6  8
  [ 8] .llvm_addrsig     LOOS+0xfff4c03  0000000000000000 0001b0 000001 00   E  9   0  1
  [ 9] .symtab           SYMTAB          0000000000000000 0000c0 0000c0 18      1   6  8

The output is organized in named sections. The most important one to know is .text, where the functions instructions are stored. We can experiment with the source code to see the two other most common sections.

// manySymbols.c

int myInitializedVar = 1;
int myUnitializedVar;

int add(int x, int y);

int mult(int x) { return add(x, x) ; }

Let's compile to a relocatable object and peek inside again.

$ clang -c -o manySymbols.o manySymbols.c
$ readelf -S -W manySymbols.o
There are 12 section headers, starting at offset 0x2c0:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .strtab           STRTAB          0000000000000000 000219 0000a5 00      0   0  1
  [ 2] .text             PROGBITS        0000000000000000 000040 000028 00  AX  0   0  4
  [ 3] .rela.text        RELA            0000000000000000 0001e8 000018 18   I 11   2  8
  [ 4] .data             PROGBITS        0000000000000000 000068 000004 00  WA  0   0  4
  [ 5] .bss              NOBITS          0000000000000000 00006c 000004 00  WA  0   0  4
  [ 6] .comment          PROGBITS        0000000000000000 00006c 000026 01  MS  0   0  1
  [ 7] .note.GNU-stack   PROGBITS        0000000000000000 000092 000000 00      0   0  1
  [ 8] .eh_frame         PROGBITS        0000000000000000 000098 000030 00   A  0   0  8
  [ 9] .rela.eh_frame    RELA            0000000000000000 000200 000018 18   I 11   8  8
  [10] .llvm_addrsig     LOOS+0xfff4c03  0000000000000000 000218 000001 00   E 11   0  1
  [11] .symtab           SYMTAB          0000000000000000 0000c8 000120 18      1   8  8

The addition of an initialized variable made the compiler use a .data section. The addition of an uninitialized variable made the compiler use a .bss section.

Symbols

A relocatable lists both export symbols and import symbols. These lists are in the .symtab sections, which refers to strings in the .strtab section.

$ // importExport.c

extern const int myConstant;
extern void foo(int x);

int myVar1;
int myVar2;
void bar() { 
	foo(myConstant); 
}

Let's look at the exported and imported symbols with nm.

$ clang -c  mult.c -o mult.o
$ nm mult.o
0000000000000000 T bar
                 U foo
                 U myConstant
0000000000000000 B myVar1
0000000000000004 B myVar2

As expected we find three symbols exported, a function bar (with an offset in .text of 0x0) and two uninitialized variables in the bss section. Variable myVar1 is at offset 0x0 and myVar2 is four bytes further at offset 0x4.

We also see two undefined (a.k.a imported) symbols, foo and myConstant with the U type. These obviously don't have an offset. The complete list of nm letter codes and their meaning is as follows.

A  A global, absolute symbol.
B  A global "bss" (uninitialized data) symbol.
C  A "common" symbol, representing uninitialized data.
D  A global symbol naming initialized data.
N  A debugger symbol.
R  A read-only data symbol.
T  A global text symbol.
U  An undefined symbol.
V  A weak object.
W  A weak reference.
a  A local absolute symbol.
b  A local "bss" (uninitialized data) symbol.
d  A local data symbol.
r  A local read-only data symbol.
t  A local text symbol.
v  A weak object that is undefined.
w  A weak symbol that is undefined.
?  None of the above.

We can write a rainbow source file which hits as many types of symbols as possible when compiled to object.

extern int undVar;                 // Should be U  
int defVar;                        // Should be B

extern const int undConst;         // Should be U
const int defConst = 1;            // Should be R

extern int undInitVar;             // Should be U
int defInitVar = 1;                // Should be D

static int staticVar;              // Should be b
static int staticInitVar=1;        // Should be d
static const int staticConstVar=1; // Should be r

static void staticFun(int x) {}    // Should be t

extern void foo(int x);            // Should be U

void bar(int x) {                  // Should be T 
  foo(undVar);
  staticFun(undConst);
}

Since we are using an OS with two great compilers available, we can compile with both gcc and clang to see the differences.

$ clang -c  rainbow.c -o rainbow.o && nm rainbow.o
0000000000000000 T bar
0000000000000000 R defConst
0000000000000000 D defInitVar
0000000000000000 B defVar
                 U foo
000000000000003c t staticFun
                 U undConst
                 U undVar

$ gcc -c  rainbow.c -o rainbow.o && nm rainbow.o
0000000000000014 T bar
0000000000000000 R defConst
0000000000000000 D defInitVar
0000000000000000 B defVar
                 U foo
0000000000000004 r staticConstVar
0000000000000000 t staticFun
0000000000000004 d staticInitVar
0000000000000004 b staticVar
                 U undConst
                 U undVar

Global symbol / Local symbol

nm outputs differentiate between local and global symbols. A local symbol is only visible within a relocatable unit. In C, this is achieved with a static storage class specifier.

Global are visible to all relocatable units. It is something that is revisited in the linker article.

Weak and strong symbols

nm output also differentiates between "strong" symbols (the default) and weak symbols.

A weak symbol can be overwritten by a strong symbol.

 // weak.c

#include "stdio.h"

extern int getNumber();

int main() {
  printf("%d\n", getNumber());
}

// number1.c





int getNumber() {
  return 1;
}

// number2.c





int getNumber() {
  return 2;
}

By default all symbols are strong. In this example, the linker fails because it does not know which getNumber to pick. when it is used in weak.c.

$ clang -o weak weak.c number1.c number2.c
  /usr/bin/ld: number2.o: in function `getNumber':
number2.c:(.text+0x0): multiple definition of `getNumber'); number1.o:number1.c:(.text+0x0): first defined here
clang: error: linker command failed with exit code 1 (use -v to see invocation

If we declare one of the duplicate functions as weak, the program compiles and run normally, regardless of the compilation and linking order.

 // weak.c

#include "stdio.h"

extern int getNumber();

int main() {
  printf("%d\n", getNumber());
}

// number1.c





__attribute__((weak)) int getNumber() {
  return 1;
}

// number2.c





int getNumber() {
  return 2;
}

$ clang -o weak weak.c number1.c number2.c
$./weak
2
$ clang -o weak weak.c number2.c number1.c
$./weak
2

Weak symbols and libc

Most libc implementations declare their methods "weak" so users can intercept them. This is not always as convenient as it seems. Let's look at how to intercept malloc.

// mymalloc.c

#define _GNU_SOURCE // Could have been defined with -D on command-line

#include "stddef.h"
#include "dlfcn.h"
#include "stdio.h"
#include "stdlib.h"


void* malloc(size_t sz) {
  void *(*libc_malloc)(size_t) = dlsym(RTLD_NEXT, "malloc");
  printf("malloced %zu bytes\n", sz);
  return libc_malloc(sz);
}

int main() {
  char* x = malloc(100);
  return 0;
}

This program will enter an infinite loop until it segfaults. This is because dlsym calls malloc.

$ clang mymalloc.c
$ ./a.out
Segmentation fault (core dumped)

For such cases, GNU's libc used to provide special hooks such as __malloc_hook...but they became deprecated. Now the best way is to MITM via the loader and LD_PRELOAD.

 // mtrace.c

#include <stdio.h>
#include <dlfcn.h>

static void* (*real_malloc)(size_t) = nullptr;

void *malloc(size_t size) {
    if(!real_malloc)  {
      real_malloc = dlsym(RTLD_NEXT, "malloc");
    }

    printf("malloc(%d) = ", size);
    return real_malloc(size);
}

$ clang -shared -fPIC -D_GNU_SOURCE -o mtrace.so mtrace.c
$ LD_PRELOAD=./mtrace.so ls
malloc(472) = 0xaaab24e4b2a0
malloc(120) = 0xaaab24e4b480
malloc(1024) = 0xaaab24e4b500
malloc(5) = 0xaaab24e4b910
...
$

Weak symbols are also paramount for C++ and especially the STL (see below).

How C++ template leverage weak symbols

There is one further usage of weak symbols. When using STL templates, each relocatable receives a copy of instructions and symbols when instantiation is involved. As a result, two translation units using vector<int> end up with the same symbols.

// c++foo.cc

#include <vector>

void foo() {
  auto v = std::vector<int>();
}

// c++bar.cc

#include <vector>

void bar() {
  auto v = std::vector<int>();
}

nm confirms the duplicates in both object files.

$ clang -c -o c++foo.o c++foo.cc
$ nm c++foo.o | grep -E 'vector|bar|foo'
0000000000000000 T foo()
0000000000000000 W std::vector<int, std::allocator<int> >::vector()
0000000000000000 W std::vector<int, std::allocator<int> >::~vector()

$ clang -c -o c++bar.o c++bar.cc
$ nm c++bar.o | grep -E 'vector|bar|foo'
0000000000000000 T bar()
0000000000000000 W std::vector<int, std::allocator<int> >::vector()
0000000000000000 W std::vector<int, std::allocator<int> >::~vector()

When the linker sees several symbols it favors the "strong" one. However if only "weak" ones are available it picks up any of them without throwing an error. This behavior can be exposed in an example using template and -D.

// weak_main.cc

const char* foo();
const char* bar();
#include "stdio.h"

int main() {
  printf("%s\n", foo());
  printf("%s\n", bar());
}

// c++foo.cc


#define NAME "foo"
#include "template.h"

const char* foo() {
  Name name;
  return name.get();
}

// c++bar.cc


#define NAME "bar"
#include "template.h"

const char* bar() {
  Name name;
  return name.get();
}

 // template.h


template<typename T> struct Name {
  T get() const {
    return T{NAME};
  }
};

At first sight, the program above should print to the console "foo" and then "bar" but it doesn't. Because of C++ One Definition Rule (ODR) all these symbols are marked as weak so a single one is picked, depending on the order the linker sees them.

$ clang++ -o main weak_main.cc c++foo.cc c++bar.cc
$ ./main 
foo
foo
$ clang++ -o main weak_main.cc c++bar.cc c++foo.cc
$ ./main 
bar
bar

The original illustration of this process was found here.

Relocation

The symbols list shows imports and exports names. That is enough for the linker to understand what an object provides and needs but that is not enough to relocate the relocatables. The linker needs the exact location of each symbols in an object. These are stored in relocation tables which readelf can show us.

$ readelf -r mult.o

Relocation section '.rela.text' at offset 0x1d8 contains 5 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000010  000800000137 R_AARCH64_ADR_GOT 0000000000000000 myConstant + 0
000000000014  000800000138 R_AARCH64_LD64_GO 0000000000000000 myConstant + 0
00000000001c  000900000113 R_AARCH64_ADR_PRE 0000000000000000 myVariable + 0
000000000020  00090000011d R_AARCH64_LDST32_ 0000000000000000 myVariable + 0
000000000024  000a0000011b R_AARCH64_CALL26  0000000000000000 add + 0

Relocation section '.rela.eh_frame' at offset 0x250 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000001c  000200000105 R_AARCH64_PREL32  0000000000000000 .text + 0

Every single usage of an imported variable/function is present in the relocation table. It provides everything the linker needs like the section to patch, the offset, the type of usage, and of course the symbol name.

Mangling

So far we used examples using the C language which results in simple symbol names where function/variable results in a symbol of the same name. Things get more complicated when a language allows function overloading.

To illustrate mangling, instead of letting the driver detect the language, we can declare it ourselves and see what happens with the symbols table.

// sample.c

void foo() {};

Let's first compile sample.c as a C file (with -x c) and then as a C++ file -x c++.

$ clang -c -x c sample.c -o sample.o
$ nm sample.o
0000000000000000 T foo
$ clang -c -x c++ sample.c -o sample.o
$ nm sample.o
0000000000000000 T _Z3foov

Thanks to mangling, C++ allows functions to have the same name. They get different symbol names thanks to the parameter types. Symbols avoid function name collision via a special encoding but name mangling can lead to linking issues.

	// bar.h void bar();
// main.cpp #include "bar.h" int main() { bar(); return 0; }	// bar.c void bar() {};

$ clang main.cpp bar.c -o main 
/usr/bin/ld: /tmp/m-7f361c.o: in function `main':
main.cc:(.text+0x18): undefined reference to `bar()'
clang: error: linker command failed with exit code 1 (use -v to see invocation)

The project won't link properly because the symbols for the function bar do not match (main.cpp was mangled as C++ but bar.c was mangled as C).

$  nm main.o
0000000000000000 T main
                 U _Z3barv
$ nm bar.o
0000000000000000 T bar

There is a simple solution. Just use the name mangling C++ expect to name your functions and variables in your C++.

	// bar.h void _Z3barv();
// main.cpp #include "bar.h" int main() { bar(); return 0; }	// bar.c void _Z3barv() {};

It works, problem solved!

$ clang main.cpp bar.c -o main
$

A more serious and realistic solution is to use a macro to let the compiler know that it should generate import symbol names without mangling them. This is done via extern "C".

	// bar.h extern "C" { void bar(); }
// main.cpp #include "bar.h" int main() { bar(); return 0; }	// bar.c void bar() {};

Compilation works, the export/import symbol tables have no mismatch.

$ clang main.cpp bar.c -o main 
$ nm main.o
0000000000000000 T main
                 U bar
$ nm bar.o
0000000000000000 T bar

Section management

We have seen earlier how variables, constants, and functions end up in three sections text, data, and bss but the compiler can operate at a lower granularity.

Instead of generating huge sections, the compiler can generate one section per symbol. This later allows the linker to pick only what is useful and reduce the size of the executable.

// sections.c

int a = 0;
int b = 0;
int funcA() { return a;}
int funcB() { return b;}

$ clang -c -o sections.o sections.c

$ readelf -S -W  sections.o
There are 11 section headers:

Section Headers:
  [Nr] Name              Type            
  [ 0]                   NULL            
  [ 1] .strtab           STRTAB          
  [ 2] .text             PROGBITS        
  [ 3] .rela.text        RELA            
  [ 4] .bss              NOBITS          
  [ 5] .comment          PROGBITS        
  [ 6] .note.GNU-stack   PROGBITS        
  [ 7] .eh_frame         PROGBITS        
  [ 8] .rela.eh_frame    RELA            
  [ 9] .llvm_addrsig     LOOS+0xfff4c03  
  [10] .symtab           SYMTAB

$ clang -c -o sections.o sections.c \
  -ffunction-sections -fdata-sections
$ readelf -S -W  sections.o
There are 15 section headers:

Section Headers:
  [Nr] Name              Type            
  [ 0]                   NULL            
  [ 1] .strtab           STRTAB          
  [ 2] .text             PROGBITS        
  [ 3] .text.funcA       PROGBITS        
  [ 4] .rela.text.funcA  RELA            
  [ 5] .text.funcB       PROGBITS        
  [ 6] .rela.text.funcB  RELA            
  [ 7] .bss.a            NOBITS          
  [ 8] .bss.b            NOBITS          
  [ 9] .comment          PROGBITS        
  [10] .note.GNU-stack   PROGBITS        
  [11] .eh_frame         PROGBITS        
  [12] .rela.eh_frame    RELA            
  [13] .llvm_addrsig     LOOS+0xfff4c03  
  [14] .symtab           SYMTAB

Optimization level

By far the most important flag to pass the compiler is the level of optimization to apply to the IR before generating the instructions. By default, no optimizations are performed. It shows, even with a program doing almost nothing.

// do_nothing.c

void do_nothing() {
}

int main() {
  for(int i= 0 ; i < 1000000000 i++) 
    do_nothing();
  return 0;  
}

Let's build and measure how long it takes to do nothing.

$ clang do_nothing.c 
$ time ./a.out 

real	0m2.374s
user	0m2.104s
sys  	0m0.015s

This program should have completed near instantly but because of the function call overhead, it took two seconds. Let's try again but this time, allowing optimization to occur.

$ clang do_nothing.c -O3
$ time ./a.out 

real	0m0.224s
user	0m0.011s
sys	0m0.014s

While some optimization focuses on runtime, others focus on code size. They are listed here.

If you have a few hours to spare, treat yourself Binary Ninja and take a look at the marvels optimizers come up with.

The translation unit barrier

Let's keep iterating with the previous program that does nothing. Compiler optimization -O3 is awesome but it has its limitations because it only operates at the translation unit level. Let's see what happens when the do_nothing function is in a different source file.

// opt_main.c

extern void do_nothing();

int main() {
  for(int i= 0 ; i <1000000000 ;i++)
    do_nothing();
}

// do_nothing_tu.c

void do_nothing() {
}

$ clang -O3 opt_main.c do_nothing_tu.c 
$ time ./a.out

real	0m2.056s
user	0m1.824s
sys	0m0.018s

Even with optimization enabled, we are back to the poor performance of an un-optimized executable. Due to the siloed nature of translation unit processing, the compiler could not decide whether calls to do_nothing should be pruned and generated a callsite anyway.

Breaking the barrier the old-school way

The solution to this problem would be to perform optimization not at the translation unit level but at the program level. Since only the linker has a vision of all components (and it can only see sections and symbols), this is seemingly not possible.

The trick to make it work is called "artisanal LTO". It consists in creating a super translation unit, containing all the source code of the program. We can do that with the pre-processor.

// all.c

#include "do_nothing.c"
#include "opt_main.c"

// opt_main.c

extern void do_nothing();

int main() {
  for(int i= 0 ; i <1000000000 ;i++)
    do_nothing();
}

// do_nothing.c

void do_nothing() {
}

Now able to see that do_nothing is a no-op, the compiler is able to optimize it away.

$ clang all.c -O3
$ time ./a.out
real  0m0.163s
user  0m0.012s
sys 0m0.014s

Of course the bigger and complex the program, the less practical it is which led to LTO.

LTO

Thankfully, the "artisanal LTO" trick is no longer needed. Compilers can outputs extra information in the relocatables for the linker to use. Both GNU's GCC and LLVM implement Link-Time Optimizations via -flto flag but they do it differently.

GCC's LTO

GCC compiler implements LTO in a way that lets the linker fail gracefully if it does not support it. The program will still be linked but without link-time optimizations. To this effect, GCC generates fat-objects which not only contains everything an .obj should have but also GCC's intermediate representation (GIMPLE bytecode).

// opt_main.c

extern void do_nothing();

int main() {
  for(int i= 0 ; i <1000000000 ;i++)
    do_nothing();
}

// do_nothing.c

void do_nothing() {
}

$ gcc -c main.c -o main.o
$ file main.o
main.o: ELF 64-bit LSB relocatable, ARM aarch64, version 1 (SYSV), not stripped
$ gcc  -c main.c -o main.o -flto
$ file main.o
main.o: ELF 64-bit LSB relocatable, ARM aarch64, version 1 (SYSV), not stripped

$ gcc main.c do_nothing.c -flto
$ time ./a.out

real	0m2.112s
user	0m2.107s
sys	0m0.004s
$ gcc -O3 -flto -c hello.c -o hello.o
$ time ./a.out

real	0m0.002s
user	0m0.000s
sys	0m0.002s

LLVM's LTO

LLVM's way to implement LTO is a bit more aggressive. Instead of producing a fat object file, it simply pushes the IR and disguises it as an object by using a .o extension. This is inconvenient because if the linker does not know how to handle bitcode, the compilation process will fail.

// opt_main.c

extern void do_nothing();

int main() {
  for(int i= 0 ; i <1000000000 ;i++)
    do_nothing();
}

// do_nothing.c

void do_nothing() {
}

$ clang -c main.c -o main.o
$ file main.o
main.o: ELF 64-bit LSB relocatable, ARM aarch64, version 1 (SYSV), not stripped
$ clang -c main.c -o main.o -flto
$ file main.o
hello.o: LLVM IR bitcode

$ clang main.c do_nothing.c -flto
$ time ./a.out

real	0m2.112s
user	0m2.107s
sys	0m0.004s
$ clang -O3 -flto -c hello.c -o hello.o
$ time ./a.out

real	0m0.002s
user	0m0.000s
sys	0m0.002s

Dialects

C++

C++ keeps on evolving. There was C++98, then C++03 , then C++11, then C++14, then C++17, then C++20, and now C++23. The default dialect of the compiler keeps on evolving. Checkout your compiler documentation and use the flag -std (e.g:-std=c++11) to make sure you are using the proper one.

-std=c++11.  // clang
-std=gnu++11 // gcc

C

Likewise, C has been revised over the years. The standardized C from 1988 was updated by C89, C90, C95, C99, and lately C11. Flags are to be used to indicate which version is used.

-std=c99

Standard libraries

C Standard Library

The implementations of the C Standard Library, commonly called libc, are fairly consistent. Switching between BSD's libc, GNU's glibc, or Android's bionic should not yield surprises.

STL

The STL is a different story. There are several implementations around and their implementations are always a source of discrepancies.

At the very least, try to use the STL of your toolkit (GNU's libstdc++, LLVM's libc++, and Microsoft STL). If you are working on portable code to run on multiple OS, try to use the same STL everywhere. That means using LLVM's libc++.

Operators new, new[], delete, and delete[] are not "built-in" C++ language. They actually come from the header <new>. Furthermore, if you peek inside that header, you will see that new is implemented with malloc.

_LIBCXXABI_WEAK void * operator new(std::size_t size) _THROW_BAD_ALLOC {

  if (size == 0)
      size = 1;

  void* p;
  while ((p = ::malloc(size)) == 0) {
      std::new_handler nh = std::get_new_handler();
      if (nh)
          nh();
      else
          break;
  }
  return p;
}

Debugging

For debuggers to work, you need to compile with debugging information. This is done with -g. Upon seeing this flag, the compiler will generate sections featuring DWARF sections (for ELF binaries, get it?).

// hello.cc

#include <iostream>

int main() {
    std::cout << "Hello World!";
    return 0;
}

$ clang++ hello.cc
$ ll a.out
-rwxrwxr-x 1 leaf leaf  9576 Mar 22 19:14 a.out*
$ clang++ -g hello.cc	
$ ll a.out
-rwxrwxr-x 1 leaf leaf 21912 Mar 22 19:15 a.out*

The size difference is significant but it is much more telling to compare with a real world, large application such as git.

$ sudo apt-get install dh-autoreconf libcurl4-gnutls-dev libexpat1-dev gettext libz-dev libssl-dev
$ sudo apt-get install install-info  
$ git clone git@github.com:git/git.git
$ cd git
$ make configure
$ ./configure --prefix=/usr
$ make all
$ ls -l --block-size=M git
-rwxrwxr-x 137 leaf leaf 18M Apr  2 18:04 git

git's Makefile builds in debug mode by default. Let's remove the flag -g and build again to see the difference.

$ git reset --hard
$ git clean -f -x
$ sed -i 's/-g //g' Makefile
$ make clean
$ make all
$ ls -l --block-size=M git
-rwxrwxr-x 137 leaf leaf 4M Apr  2 18:04 git

A binary without debug information is 3x smaller (-12 MiB)!

If you are unsure what make is doing, you can run it in verbose mode to see every commands it executes via parameter V=1.

Strip

Compiling with debug sections logically increases the size of the output. The opposite of this operation is to strip sections which are not useful. It can reduce further the size of an ELF file.

$ clang -c -g -o hello.o hello.c
$ ll hello.o
-rw-rw-r-- 1 leaf leaf 3256 Apr  3 01:21 hello.o
$ readelf -S -W hello.o
There are 22 section headers, starting at offset 0x738:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .strtab           STRTAB          0000000000000000 000611 000121 00      0   0  1
  [ 2] .text             PROGBITS        0000000000000000 000040 000034 00  AX  0   0  4
  [ 3] .rela.text        RELA            0000000000000000 000478 000048 18   I 21   2  8
  [ 4] .rodata.str1.1    PROGBITS        0000000000000000 000074 00000e 01 AMS  0   0  1
  [ 5] .debug_abbrev     PROGBITS        0000000000000000 000082 000038 00      0   0  1
  [ 6] .debug_info       PROGBITS        0000000000000000 0000ba 000037 00      0   0  1
  [ 7] .rela.debug_info  RELA            0000000000000000 0004c0 000060 18   I 21   6  8
  [ 8] .debug_str_offsets PROGBITS       0000000000000000 0000f1 00001c 00      0   0  1
  [ 9] .rela.debug_str_offsets RELA      0000000000000000 000520 000078 18   I 21   8  8
  [10] .debug_str        PROGBITS        0000000000000000 00010d 000050 01  MS  0   0  1
  [11] .debug_addr       PROGBITS        0000000000000000 00015d 000010 00      0   0  1
  [12] .rela.debug_addr  RELA            0000000000000000 000598 000018 18   I 21  11  8
  [13] .comment          PROGBITS        0000000000000000 00016d 000026 01  MS  0   0  1
  [14] .note.GNU-stack   PROGBITS        0000000000000000 000193 000000 00      0   0  1
  [15] .eh_frame         PROGBITS        0000000000000000 000198 000030 00   A  0   0  8
  [16] .rela.eh_frame    RELA            0000000000000000 0005b0 000018 18   I 21  15  8
  [17] .debug_line       PROGBITS        0000000000000000 0001c8 00005f 00      0   0  1
  [18] .rela.debug_line  RELA            0000000000000000 0005c8 000048 18   I 21  17  8
  [19] .debug_line_str   PROGBITS        0000000000000000 000227 000022 01  MS  0   0  1
  [20] .llvm_addrsig     LOOS+0xfff4c03  0000000000000000 000610 000001 00   E 21   0  1
  [21] .symtab           SYMTAB          0000000000000000 000250 000228 18      1  21  8

Now let's strip the object.

$ strip hello.o
$ ll hello.o
-rw-rw-r-- 1 leaf leaf 816 Apr  3 01:22 hello.o
$ readelf -S -W hello.o
There are 8 section headers, starting at offset 0x130:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 000034 00  AX  0   0  4
  [ 2] .rodata.str1.1    PROGBITS        0000000000000000 000074 00000e 01 AMS  0   0  1
  [ 3] .comment          PROGBITS        0000000000000000 000082 000026 01  MS  0   0  1
  [ 4] .note.GNU-stack   PROGBITS        0000000000000000 0000a8 000000 00      0   0  1
  [ 5] .eh_frame         PROGBITS        0000000000000000 0000a8 000030 00   A  0   0  8
  [ 6] .llvm_addrsig     LOOS+0xfff4c03  0000000000000000 0000d8 000001 00   E  0   0  1
  [ 7] .shstrtab         STRTAB          0000000000000000 0000d9 000051 00      0   0  1

Note that in this example, we used strip on an object file but since it works on ELF, it can be (and usually is) used on the linker output.

The Linker (4/5)

Driving Compilers