IA-32 assembly on Mac OS X
I had to do some low level work with Mac OS X Snow Leopard using my MacBook Pro Core 2 Duo. I learned plenty regarding GAS for i386 and x86_64 but I would not recommend this setup to learn assembly. I think Apple's specifics would discourage a beginner and impair his/her ability to use code samples found in most books. I would rather recommend a IBM T42 with Linux Ubuntu.
EDIT: I've received numerous emails so I'm going to try to explain further. It's legitimate to buy a IA-32 book or to follow an online tutorial about i386 assembly intending to use a MacBookPro to experiment as it should be backward compatible. My concern is that by doing this: Nothing will work ! The code sample won't compile, the libc call won't link and the stack won't be aligned properly. That would be a huge learning curve.
Memory refresh
Just to make sure we are at on the same page, here is a schema of the memory the way I like to represent it:
- Lower memory at the top.
- Higher memory at the bottom.
- Text section contains the actual code.
- Data section contains initialized global variables (with 0s).
- Bss section contains uninitialized global variables.
- Heap grows "down", toward higher address.
- Stack grows "up", toward lower address.
- Every function calls, a stack frame is created to hold parameters and local variables.
Registers EAX, EBX, ECX, EDX, Floating Points, MMX's, SSE's are used in operations but logistic is mostly done via ESP and EBP.
Every time a function is called, a new stack frame is created relying mostly on the EBP and ESP registers.
ESP points to the last item inserted in the stack. EBP points to the base of the current stackframe:
- Function parameters are pushed on the stack in reverse order of declaration in source code.
- The EIP return address (instruction to start from when the function returns) is pushed on the stack.
- The function prolog is:
- Current EBP value is pushed on the stack.
- EBP takes the value of the current ESP.
- Function's local variable are allocated on the stack in order of declaration.
- While in the function body, arguments and local variables are referenced via the EBP register (see schema).
- The function epilog is:
- ESP takes the value of the current EBP.
- EBP value is popped from the stack.
void function(int parA, int parB, int parC, int parD) { int foo; int bar; [...] return ; }
The annoying 16 byte stack alignment
Because compilers try to take advantage of the SIMD unit (MMX,SSE,SSE2 and SSE3 instructions set) Apple want the stack to be 16 byte aligned even in an IA-32 environment, so ESP point to a 0xXXXXXXXC
memory address.
Let's take the example of a small program such as cpuid2.s
which goal is to write on screen the type of CPU running, using the libc function printf
:
.bss .data output: .asciz "The processor Vendor ID is '%s' \n" .lcomm buffer, 13 .text .globl _main _main: movl $0, %eax # define cpuid output option cpuid movl $buffer, %edi # put values in string movl %ebx, 0(%edi) movl %edx, 4(%edi) movl %ecx, 8(%edi) movl $0, %ecx movl %ecx, 12(%edi) # Now Calling printf subl $0x4, %esp #padding stack :/ ! pushl $buffer pushl $output call _printf #macosx need libc pre_ # ESP is at 0xXXXXXXX0 now, # with MacOS X Special stub it will be at 0xXXXXXXXC call _exit nop
Here are three significant stack state at:
- Beginning of the program
- Just before
call _printf
- Inside libc's
printf
On a "regular" system:
On Mac OS X:
Not only you need to pad the stack but you also need to take into account the fact that Mac OS X will perfom an extra 4 byte
push
on the stack. That's why with two 4 byte parameters, the stack is only padded with 4 byte so the Mac OS X special still find the stack aligned on 0xXXXXXXXC
.Also on the list of thing to adapt, you need to remember that function parameters are 4 byte further than where they would be because of "Mac OS X"'s special
push
. For example, the first parameter on regulat system is at 8(%EBP)
while it is at 12(%EBP)
on Mac OS X.
Compilation problems
An other issue that beginners may find discouraging is that plain old compilation doesn't work. GAS assembly program can normally be compiled via as
assembler &ld
linker (or gcc
directly) via:
as -o cpuid2.o cpuid2.s ld -e _main -o cpuid2 -lc cpuid2.o //or gcc cpuid2.s -gstabs -o cpuid2
But this won't work on a MacBookPro Core2 Duo running Snow Leopard, cpuid2.s althought IA-32 valid won't get assembled:
cpuid2.s:50:suffix or operands invalid for `push' cpuid2.s:51:suffix or operands invalid for `push'
It seems as
default to x86_64
assembly. You need to be specific about the target architecture:
as -arch i386 -o cpuid2.o cpuid2.s ld -e _main -o cpuid2 -lc cpuid2.o //or gcc -arch i386 cpuid2.s -gstabs -o cpuid2
EDIT: Reddit user jah6 pointed out that you can get rid of the alignment problem when compiling with gcc: Use -mstackrealign
parameter.
LibC method names
Mac OS S requires yout to prefix all your libc method name with "_". Most code sample in Unix world don't have this constraint.
Recommended reading
Try "Professional Assembly Language", it's a pretty good book. Except that they should really really really stop putting the face of their author on the front cover. What a turn off.
Source code
I've "ported" a few example from GAS Linux to GAS Mac OS X: cpuid with Write System Calls and cpuid with libc