Lesson 5: MOV, Registers, and Addressing Modes
Why Learn This?
There's something deeply satisfying about peeling back the layers and seeing what's actually happening when your code runs. High-level languages give you p[i] and you don't think twice—but the machine is computing a base address plus an index times a scale factor, and there's a single instruction for that. Understanding what's going on underneath makes you a better programmer: you'll write more efficient code, you'll debug more effectively, and you'll have a much richer mental model of what your programs are actually doing.
But there's a bigger lesson here too, one that goes beyond x86. In this lecture, we didn't sit through a list of instructions and memorize them—we looked at real compiled code and figured out what it meant. That's a skill that will serve you for your entire career. You will constantly encounter unfamiliar systems, languages, file formats, protocols, and APIs. You won't always have documentation. You won't always have someone to ask. But you can play detective. You can look at examples, form hypotheses, test them against the evidence, and build up understanding piece by piece. That's exactly what we did today, and it works.
Registers and Conventions
At the start of the lecture, this table just listed the registers, but this version reflects what we figured out about their usage:
| 64-bit | 32-bit | 16-bit | 8-bit | Use |
|---|---|---|---|---|
| RAX | EAX | AX | AL | Return Value |
| RBX | EBX | BX | BL | |
| RCX | ECX | CX | CL | 4th Argument |
| RDX | EDX | DX | DL | 3rd Argument |
| RSI | ESI | SI | SIL | 2nd Argument |
| RDI | EDI | DI | DIL | 1st Argument |
| RBP | EBP | BP | BPL | Current Stack-Frame Pointer |
| RSP | ESP | SP | SPL | Stack Pointer |
| R8 | R8D | R8W | R8B | 5th Argument |
| R9 | R9D | R9W | R9B | 6th Argument |
| R10 | R10D | R10W | R10B | |
| R11 | R11D | R11W | R11B | |
| R12 | R12D | R12W | R12B | |
| R13 | R13D | R13W | R13B | |
| R14 | R14D | R14W | R14B | |
| R15 | R15D | R15W | R15B | |
| RIP | Instruction Pointer |
Stages of Compilation:
Recall from CS 70:
- Preprocessing: handles
#includeand#definedirectives, and other preprocessor directives. - Compilation: translates C++ source code into assembly code.
- Assembling: translates assembly code into machine code, producing an object file.
- Linking: combines object files and libraries into an executable.
Ways to View the Assembly Code
objdump -don the executablegdbwithset disassemble-next-line onor thedisassemblecommandgccwith the-Sflag to produce an assembly file directly from the compilation step
Example 1: Parameter Passing
Code
// Forward declaration so we can call it from `do_six`.
void six_args(long a, long b, long c, long d, long e, long f);
void do_six() {
six_args(1,2,3,4,5,6);
}
Generated Code
00000000004027f1 <do_six>:
4027f1: 55 pushq %rbp
4027f2: 48 89 e5 movq %rsp, %rbp
4027f5: 41 b9 06 00 00 00 movl $0x6, %r9d
4027fb: 41 b8 05 00 00 00 movl $0x5, %r8d
402801: b9 04 00 00 00 movl $0x4, %ecx
402806: ba 03 00 00 00 movl $0x3, %edx
40280b: be 02 00 00 00 movl $0x2, %esi
402810: bf 01 00 00 00 movl $0x1, %edi
402815: e8 8b ff ff ff callq 0x4027a5 <six_args>
40281a: 5d popq %rbp
40281b: c3 retq
What We Figured Out
- The
mov FOO, BARinstruction means move the value FOO into BAR. (The order of the operands is the opposite of how we write things in C, but make sense for reading in English.) - The first six arguments to a function are passed in registers, in the order RDI, RSI, RDX, RCX, R8, and R9.
- The
callinstruction pushes the return address onto the stack and jumps to the function. - The
retinstruction pops the return address from the stack and jumps back to it. - Instructions have suffixes that indicate the size of their operands:
bfor byte (8 bits),wfor word (16 bits),lfor long (32 bits), andqfor quad (64 bits). (If there is no suffix, the size is determined by the registers used as operands.) - The code saves and restores the base pointer (RBP) at the beginning and end of the function, which is used to refer to the current stack frame. (Here we don't really need it, but it's a common convention and aids debugging.)
- Kinds of operands for instructions like
mov:- Immediate values, like
$0x6(the$indicates an immediate value) - Registers, like
%r9dthe name of the register shows how many of its bits are being used:r9dis the lower 32 bits ofr9,r9wis the lower 16 bits, andr9bis the lower 8 bits.
- Immediate values, like
- An odd trick: All our values are
longs, but some of the instructions are 32-bitmovlinstructions. This is because the x86-64 architecture has a special feature that when you write to the lower 32 bits of a register, the upper 32 bits are automatically set to zero. So we can usemovlto write a 64-bit value by writing to the lower 32 bits and having the upper bits be zero. This means the code can use shorter 32-bit constants and instructions, which is more efficient.
Example 2: Return Values
Code
long return42() {
return 42;
}
Generated Code
0000000000402857 <return42>:
402857: b8 2a 00 00 00 movl $0x2a, %eax
40285c: c3 retq
What We Figured Out
- The return value of a function is placed in RAX (or EAX for 32-bit values).
0x2ais 42 in hexadecimal (2 × 16 + 10 = 42).- We see the same
movl-to-a-32-bit-register trick here: since 42 fits in 32 bits, we can usemovlto EAX instead ofmovqto RAX, and the upper 32 bits are zeroed out automatically. - Notice there's no
push %rbp/pop %rbphere — this function is so simple the compiler decided it didn't need a stack frame at all.
Example 3: Identity — Registers to Registers
Code
long identity(long v) {
return v;
}
Generated Code
0000000000402895 <identity>:
402895: 48 89 f8 movq %rdi, %rax
402898: c3 retq
What We Figured Out
- The first argument arrives in RDI, and we need to return it in RAX, so we just move it from one register to the other.
movqis used here (rather thanmovl) because we're dealing with a full 64-bitlongvalue, and we can't assume the upper 32 bits are zero.
Example 4: Pointers — Reading from Memory
Code
long fetch_ptr(long *p) {
return *p;
}
Generated Code
0000000000402871 <fetch_ptr>:
402871: 48 8b 07 movq (%rdi), %rax
402874: c3 retq
What We Figured Out
- Parentheses around a register mean memory access (dereferencing a pointer).
(%rdi)means "go to the address stored in RDI and read the value there." - This is a new kind of operand: a memory reference. So now we have three kinds: immediate values (
$0x2a), registers (%rdi), and memory references ((%rdi)).- It's a bit odd that parentheses mean “access this memory location”, when normally parentheses are used for grouping and can be ignored. We rolled our eyes a bit at this design choice!
- The pointer
parrives in RDI (first argument). We dereference it with(%rdi)and put the result in RAX (return value).
Example 5: Pointers — Writing to Memory
Code
void store_ptr(long* p, long v) {
*p = v;
}
Generated Code
000000000040286d <store_ptr>:
40286d: 48 89 37 movq %rsi, (%rdi)
402870: c3 retq
What We Figured Out
- This is
fetch_ptrin reverse — we're writing to memory instead of reading from it. pis in RDI (1st argument),vis in RSI (2nd argument). We movevinto the memory location thatppoints to.- The parentheses can appear on either side of
mov— as the source (reading from memory) or the destination (writing to memory).
Example 6: Pointer Arithmetic — Displacement Addressing
Code
long fetch_next_ptr(long *p) {
return *(p+1);
}
Generated Code
0000000000402875 <fetch_next_ptr>:
402875: 48 8b 47 08 movq 0x8(%rdi), %rax
402879: c3 retq
What We Figured Out
0x8(%rdi)means "take the address in RDI, add 8 to it, and read from that memory location." This is called displacement addressing.- Why 8? Because a
longis 8 bytes, so*(p+1)in C is 8 bytes pastpin memory. C pointer arithmetic works in units of the pointed-to type; at the machine level, we have to do the scaling ourselves. - General form:
DISPLACEMENT(%register)adds the displacement to the register's value to compute the effective address.
Example 7: Pointer Arithmetic — Displacement for Stores
Code
void store_next_ptr(long* p, long v) {
*(p+1) = v;
}
Generated Code
000000000040287a <store_next_ptr>:
40287a: 48 89 77 08 movq %rsi, 0x8(%rdi)
40287e: c3 retq
What We Figured Out
- Same displacement addressing as
fetch_next_ptr, but now used as the destination. %rsi(the valuev, 2nd argument) is stored into memory at address RDI + 8.
Example 8: Array Indexing — Scaled Index Addressing
Code
long fetch_array(long *p, long i) {
return p[i];
}
Generated Code
000000000040287f <fetch_array>:
40287f: 48 8b 04 f7 movq (%rdi,%rsi,8), %rax
402883: c3 retq
What We Figured Out
(%rdi,%rsi,8)means "take the address in RDI, add RSI × 8 to it, and read from that memory location." This is called scaled index addressing.- General form:
(%base, %index, scale)computes the address asbase + index × scale. - The scale factor of 8 is because each
longis 8 bytes. The allowed scale factors are 1, 2, 4, and 8. - This single instruction does what
p[i]does in C: it computes the address of elementi(base pointer + index × element size) and reads from it.
Example 9: Array Indexing — Storing into an Array
Code
void store_array(long *p, long i, long v) {
p[i] = v;
}
Generated Code
0000000000402884 <store_array>:
402884: 48 89 14 f7 movq %rdx, (%rdi,%rsi,8)
402888: c3 retq
What We Figured Out
- Same scaled index addressing, but used as the destination.
pis in RDI (1st argument),iis in RSI (2nd argument),vis in RDX (3rd argument).- The instruction stores
vat addressp + i × 8— exactlyp[i] = v.
Example 10: Global Variables — RIP-Relative Addressing
Code
long var1;
void store_var1(long v) {
var1 = v;
}
long fetch_var1() {
return var1;
}
Generated Code
000000000040285d <store_var1>:
40285d: 48 89 3d 0c 42 0a 00 movq %rdi, 0xa420c(%rip)
# 0x4a6a70 <var1>
402864: c3 retq
0000000000402865 <fetch_var1>:
402865: 48 8b 05 04 42 0a 00 movq 0xa4204(%rip), %rax
# 0x4a6a70 <var1>
40286c: c3 retq
What We Figured Out
- Global variables are accessed using RIP-relative addressing:
0xa420c(%rip). This means "take the current instruction pointer (RIP) and add the displacement to find the variable." - The assembler helpfully shows us the actual computed address in the comment:
0x4a6a70 <var1>. Bothstore_var1andfetch_var1access the same address — they're talking about the same global variable. - Why RIP-relative?
- It takes up less space! If we had 64-bit addresses, they'd need to take 8 bytes in the instruction, but with RIP-relative addressing, the displacement is only 4 bytes.
- Also, sometimes code doesn't know in advance exactly where it will be loaded in memory, but it does know the distance between itself and the global variable (they're in the same executable). So using an offset from the current position is a reliable way to find it. (This also makes the code position-independent, which is important for shared libraries and security features like ASLR.)
- The two displacements are slightly different (
0xa420cvs0xa4204) because each instruction is at a different address, but RIP-relative offsets are computed from the end of the current instruction — so they both resolve to0x4a6a70.
Example 11: LEA — Arithmetic in Disguise
Code
long formula(long x, long y) {
return x + 2 * y + 3;
}
Generated Code
000000000040288e <formula>:
40288e: 48 8d 44 77 03 leaq 0x3(%rdi,%rsi,2), %rax
402893: c3 retq
What We Figured Out
leastands for Load Effective Address. It computes an address using the same addressing modes we've seen, but instead of reading from memory, it just puts the computed address into the destination register.- Here,
leaq 0x3(%rdi,%rsi,2), %raxcomputesRDI + RSI × 2 + 3and puts the result in RAX. That's exactlyx + 2*y + 3! - The compiler is being clever: it's repurposing the address-calculation hardware to do arithmetic. No memory access happens at all —
leajust does the math. - This is a common compiler trick. Whenever arithmetic happens to match the form
base + index × scale + displacement(with scale being 1, 2, 4, or 8), the compiler can use a singleleainstruction instead of multipleaddandmulinstructions.
Example 12: Six Arguments — Accumulating a Sum
Code
void six_args(long a, long b, long c, long d, long e, long f) {
long result = a;
result += b;
result += c;
result += d;
result += e;
result += f;
print_long(result);
}
Generated Code
00000000004027a5 <six_args>:
4027a5: 55 pushq %rbp
4027a6: 48 89 e5 movq %rsp, %rbp
4027a9: 48 01 f7 addq %rsi, %rdi
4027ac: 48 01 d7 addq %rdx, %rdi
4027af: 48 01 cf addq %rcx, %rdi
4027b2: 4c 01 c7 addq %r8, %rdi
4027b5: 4c 01 cf addq %r9, %rdi
4027b8: e8 cc 00 00 00 callq 0x402889 <print_long>
4027bd: 5d popq %rbp
4027be: c3 retq
What We Figured Out
addq SRC, DSTadds the source to the destination, storing the result in the destination (i.e.,DST += SRC).- The compiler is being clever again: instead of creating a separate
resultvariable, it accumulates the sum directly in RDI. Sinceresultstarts asa, andais already in RDI, the compiler just keeps adding the other arguments into RDI. - Why RDI specifically? Because after computing the sum, the function calls
print_long(result)— and the first argument to a function goes in RDI. By accumulating in RDI, the result is already in the right place for the function call. No extramovneeded! - The compiler doesn't follow our C code line by line — it figures out what the code means and finds the most efficient way to get there.
Example 13: Twelve Arguments — When Registers Run Out
Code (Caller)
void twelve_args(long a, long b, long c, long d, long e, long f,
long g, long h, long i, long j, long k, long l);
void do_twelve() {
twelve_args(1,2,3,4,5,6,7,8,9,10,11,12);
}
Generated Code (Caller)
000000000040281c <do_twelve>:
40281c: 55 pushq %rbp
40281d: 48 89 e5 movq %rsp, %rbp
402820: 6a 0c pushq $0xc
402822: 6a 0b pushq $0xb
402824: 6a 0a pushq $0xa
402826: 6a 09 pushq $0x9
402828: 6a 08 pushq $0x8
40282a: 6a 07 pushq $0x7
40282c: 41 b9 06 00 00 00 movl $0x6, %r9d
402832: 41 b8 05 00 00 00 movl $0x5, %r8d
402838: b9 04 00 00 00 movl $0x4, %ecx
40283d: ba 03 00 00 00 movl $0x3, %edx
402842: be 02 00 00 00 movl $0x2, %esi
402847: bf 01 00 00 00 movl $0x1, %edi
40284c: e8 6e ff ff ff callq 0x4027bf <twelve_args>
402851: 48 83 c4 30 addq $0x30, %rsp
402855: c9 leave
402856: c3 retq
Code (Callee)
void twelve_args(long a, long b, long c, long d, long e, long f,
long g, long h, long i, long j, long k, long l) {
long result = a;
result += b;
result += c;
result += d;
result += e;
result += f;
result += g;
result += h;
result += i;
result += j;
result += k;
result += l;
print_long(result);
}
Generated Code (Callee)
00000000004027bf <twelve_args>:
4027bf: 55 pushq %rbp
4027c0: 48 89 e5 movq %rsp, %rbp
4027c3: 48 01 f7 addq %rsi, %rdi
4027c6: 48 01 d7 addq %rdx, %rdi
4027c9: 48 01 cf addq %rcx, %rdi
4027cc: 4c 01 c7 addq %r8, %rdi
4027cf: 4c 01 cf addq %r9, %rdi
4027d2: 48 03 7d 10 addq 0x10(%rbp), %rdi
4027d6: 48 03 7d 18 addq 0x18(%rbp), %rdi
4027da: 48 03 7d 20 addq 0x20(%rbp), %rdi
4027de: 48 03 7d 28 addq 0x28(%rbp), %rdi
4027e2: 48 03 7d 30 addq 0x30(%rbp), %rdi
4027e6: 48 03 7d 38 addq 0x38(%rbp), %rdi
4027ea: e8 9a 00 00 00 callq 0x402889 <print_long>
4027ef: 5d popq %rbp
4027f0: c3 retq
What We Figured Out
- When a function has more than six arguments, the first six still go in registers (RDI, RSI, RDX, RCX, R8, R9), but the rest are pushed onto the stack.
- In the caller (
do_twelve):- The extra arguments (7–12) are pushed in reverse order (12, 11, 10, 9, 8, 7) so that they end up in the correct order on the stack (argument 7 at the lowest address, closest to the top of the stack).
- After the call returns,
addq $0x30, %rspcleans up the stack.0x30= 48 = 6 arguments × 8 bytes each. leaveis a shorthand instruction equivalent tomovq %rbp, %rspfollowed bypopq %rbp— it tears down the stack frame in one instruction.
- In the callee (
twelve_args):- The first six arguments are added from registers, just like in
six_args. - The remaining arguments are accessed using displacement addressing off RBP:
0x10(%rbp),0x18(%rbp),0x20(%rbp), etc. - Why does the 7th argument start at
0x10(%rbp)(offset 16) and not0x0(%rbp)? Because the stack frame looks like this (growing downward): first the caller pushed the extra arguments, thencallpushed the return address (8 bytes), thenpushq %rbpsaved the old base pointer (8 bytes). So the first stack argument is 16 bytes above the current RBP. - Each subsequent argument is 8 bytes further:
0x18,0x20,0x28,0x30,0x38— that's arguments 8, 9, 10, 11, and 12.
- The first six arguments are added from registers, just like in
Bonus Puzzle
In the code for do_twelve, we see the addq $0x30, %rsp instruction to clean up the stack after the function call. But would the program still work correctly if we omitted that instruction? (Why or why not?)
Summary of Addressing Modes
Pulling together all the addressing modes we saw today:
| Syntax | Name | Computes | Example |
|---|---|---|---|
$0x2a |
Immediate | The constant value itself | movl $0x2a, %eax |
%rdi |
Register | Value in the register | movq %rdi, %rax |
(%rdi) |
Indirect | Memory at address in register | movq (%rdi), %rax |
0x8(%rdi) |
Displacement | Memory at register + offset | movq 0x8(%rdi), %rax |
(%rdi,%rsi,8) |
Scaled Index | Memory at base + index × scale | movq (%rdi,%rsi,8), %rax |
0x3(%rdi,%rsi,2) |
Full | base + index × scale + displacement | leaq 0x3(%rdi,%rsi,2), %rax |
0xa420c(%rip) |
RIP-Relative | Memory at instruction pointer + offset | movq 0xa420c(%rip), %rax |
(When logged in, completion status appears here.)