CS 105

Lesson 5: MOV, Registers, and Addressing Modes

Why Learn This?

There's something deeply satisfying about peeling back the layers and seeing what's actually happening when your code runs. High-level languages give you p[i] and you don't think twice—but the machine is computing a base address plus an index times a scale factor, and there's a single instruction for that. Understanding what's going on underneath makes you a better programmer: you'll write more efficient code, you'll debug more effectively, and you'll have a much richer mental model of what your programs are actually doing.

But there's a bigger lesson here too, one that goes beyond x86. In this lecture, we didn't sit through a list of instructions and memorize them—we looked at real compiled code and figured out what it meant. That's a skill that will serve you for your entire career. You will constantly encounter unfamiliar systems, languages, file formats, protocols, and APIs. You won't always have documentation. You won't always have someone to ask. But you can play detective. You can look at examples, form hypotheses, test them against the evidence, and build up understanding piece by piece. That's exactly what we did today, and it works.

Registers and Conventions

At the start of the lecture, this table just listed the registers, but this version reflects what we figured out about their usage:

64-bit 32-bit 16-bit 8-bit Use
RAX EAX AX AL Return Value
RBX EBX BX BL
RCX ECX CX CL 4th Argument
RDX EDX DX DL 3rd Argument
RSI ESI SI SIL 2nd Argument
RDI EDI DI DIL 1st Argument
RBP EBP BP BPL Current Stack-Frame Pointer
RSP ESP SP SPL Stack Pointer
R8 R8D R8W R8B 5th Argument
R9 R9D R9W R9B 6th Argument
R10 R10D R10W R10B
R11 R11D R11W R11B
R12 R12D R12W R12B
R13 R13D R13W R13B
R14 R14D R14W R14B
R15 R15D R15W R15B
RIP Instruction Pointer

Stages of Compilation:

Recall from CS 70:

  • Preprocessing: handles #include and #define directives, and other preprocessor directives.
  • Compilation: translates C++ source code into assembly code.
  • Assembling: translates assembly code into machine code, producing an object file.
  • Linking: combines object files and libraries into an executable.

Ways to View the Assembly Code

  • objdump -d on the executable
  • gdb with set disassemble-next-line on or the disassemble command
  • gcc with the -S flag to produce an assembly file directly from the compilation step

Example 1: Parameter Passing

Code

// Forward declaration so we can call it from `do_six`.
void six_args(long a, long b, long c, long d, long e, long f);

void do_six() {
    six_args(1,2,3,4,5,6);
}

Generated Code

00000000004027f1 <do_six>:
  4027f1: 55                            pushq   %rbp
  4027f2: 48 89 e5                      movq    %rsp, %rbp
  4027f5: 41 b9 06 00 00 00             movl    $0x6, %r9d
  4027fb: 41 b8 05 00 00 00             movl    $0x5, %r8d
  402801: b9 04 00 00 00                movl    $0x4, %ecx
  402806: ba 03 00 00 00                movl    $0x3, %edx
  40280b: be 02 00 00 00                movl    $0x2, %esi
  402810: bf 01 00 00 00                movl    $0x1, %edi
  402815: e8 8b ff ff ff                callq   0x4027a5 <six_args>
  40281a: 5d                            popq    %rbp
  40281b: c3                            retq

What We Figured Out

  • The mov FOO, BAR instruction means move the value FOO into BAR. (The order of the operands is the opposite of how we write things in C, but make sense for reading in English.)
  • The first six arguments to a function are passed in registers, in the order RDI, RSI, RDX, RCX, R8, and R9.
  • The call instruction pushes the return address onto the stack and jumps to the function.
  • The ret instruction pops the return address from the stack and jumps back to it.
  • Instructions have suffixes that indicate the size of their operands: b for byte (8 bits), w for word (16 bits), l for long (32 bits), and q for quad (64 bits). (If there is no suffix, the size is determined by the registers used as operands.)
  • The code saves and restores the base pointer (RBP) at the beginning and end of the function, which is used to refer to the current stack frame. (Here we don't really need it, but it's a common convention and aids debugging.)
  • Kinds of operands for instructions like mov:
    • Immediate values, like $0x6 (the $ indicates an immediate value)
    • Registers, like %r9d the name of the register shows how many of its bits are being used: r9d is the lower 32 bits of r9, r9w is the lower 16 bits, and r9b is the lower 8 bits.
  • An odd trick: All our values are longs, but some of the instructions are 32-bit movl instructions. This is because the x86-64 architecture has a special feature that when you write to the lower 32 bits of a register, the upper 32 bits are automatically set to zero. So we can use movl to write a 64-bit value by writing to the lower 32 bits and having the upper bits be zero. This means the code can use shorter 32-bit constants and instructions, which is more efficient.

Example 2: Return Values

Code

long return42() {
    return 42;
}

Generated Code

0000000000402857 <return42>:
  402857: b8 2a 00 00 00                movl    $0x2a, %eax
  40285c: c3                            retq

What We Figured Out

  • The return value of a function is placed in RAX (or EAX for 32-bit values).
  • 0x2a is 42 in hexadecimal (2 × 16 + 10 = 42).
  • We see the same movl-to-a-32-bit-register trick here: since 42 fits in 32 bits, we can use movl to EAX instead of movq to RAX, and the upper 32 bits are zeroed out automatically.
  • Notice there's no push %rbp / pop %rbp here — this function is so simple the compiler decided it didn't need a stack frame at all.

Example 3: Identity — Registers to Registers

Code

long identity(long v) {
    return v;
}

Generated Code

0000000000402895 <identity>:
  402895: 48 89 f8                      movq    %rdi, %rax
  402898: c3                            retq

What We Figured Out

  • The first argument arrives in RDI, and we need to return it in RAX, so we just move it from one register to the other.
  • movq is used here (rather than movl) because we're dealing with a full 64-bit long value, and we can't assume the upper 32 bits are zero.

Example 4: Pointers — Reading from Memory

Code

long fetch_ptr(long *p) {
    return *p;
}

Generated Code

0000000000402871 <fetch_ptr>:
  402871: 48 8b 07                      movq    (%rdi), %rax
  402874: c3                            retq

What We Figured Out

  • Parentheses around a register mean memory access (dereferencing a pointer). (%rdi) means "go to the address stored in RDI and read the value there."
  • This is a new kind of operand: a memory reference. So now we have three kinds: immediate values ($0x2a), registers (%rdi), and memory references ((%rdi)).
    • It's a bit odd that parentheses mean “access this memory location”, when normally parentheses are used for grouping and can be ignored. We rolled our eyes a bit at this design choice!
  • The pointer p arrives in RDI (first argument). We dereference it with (%rdi) and put the result in RAX (return value).

Example 5: Pointers — Writing to Memory

Code

void store_ptr(long* p, long v) {
    *p = v;
}

Generated Code

000000000040286d <store_ptr>:
  40286d: 48 89 37                      movq    %rsi, (%rdi)
  402870: c3                            retq

What We Figured Out

  • This is fetch_ptr in reverse — we're writing to memory instead of reading from it.
  • p is in RDI (1st argument), v is in RSI (2nd argument). We move v into the memory location that p points to.
  • The parentheses can appear on either side of mov — as the source (reading from memory) or the destination (writing to memory).

Example 6: Pointer Arithmetic — Displacement Addressing

Code

long fetch_next_ptr(long *p) {
    return *(p+1);
}

Generated Code

0000000000402875 <fetch_next_ptr>:
  402875: 48 8b 47 08                   movq    0x8(%rdi), %rax
  402879: c3                            retq

What We Figured Out

  • 0x8(%rdi) means "take the address in RDI, add 8 to it, and read from that memory location." This is called displacement addressing.
  • Why 8? Because a long is 8 bytes, so *(p+1) in C is 8 bytes past p in memory. C pointer arithmetic works in units of the pointed-to type; at the machine level, we have to do the scaling ourselves.
  • General form: DISPLACEMENT(%register) adds the displacement to the register's value to compute the effective address.

Example 7: Pointer Arithmetic — Displacement for Stores

Code

void store_next_ptr(long* p, long v) {
    *(p+1) = v;
}

Generated Code

000000000040287a <store_next_ptr>:
  40287a: 48 89 77 08                   movq    %rsi, 0x8(%rdi)
  40287e: c3                            retq

What We Figured Out

  • Same displacement addressing as fetch_next_ptr, but now used as the destination.
  • %rsi (the value v, 2nd argument) is stored into memory at address RDI + 8.

Example 8: Array Indexing — Scaled Index Addressing

Code

long fetch_array(long *p, long i) {
    return p[i];
}

Generated Code

000000000040287f <fetch_array>:
  40287f: 48 8b 04 f7                   movq    (%rdi,%rsi,8), %rax
  402883: c3                            retq

What We Figured Out

  • (%rdi,%rsi,8) means "take the address in RDI, add RSI × 8 to it, and read from that memory location." This is called scaled index addressing.
  • General form: (%base, %index, scale) computes the address as base + index × scale.
  • The scale factor of 8 is because each long is 8 bytes. The allowed scale factors are 1, 2, 4, and 8.
  • This single instruction does what p[i] does in C: it computes the address of element i (base pointer + index × element size) and reads from it.

Example 9: Array Indexing — Storing into an Array

Code

void store_array(long *p, long i, long v) {
    p[i] = v;
}

Generated Code

0000000000402884 <store_array>:
  402884: 48 89 14 f7                   movq    %rdx, (%rdi,%rsi,8)
  402888: c3                            retq

What We Figured Out

  • Same scaled index addressing, but used as the destination.
  • p is in RDI (1st argument), i is in RSI (2nd argument), v is in RDX (3rd argument).
  • The instruction stores v at address p + i × 8 — exactly p[i] = v.

Example 10: Global Variables — RIP-Relative Addressing

Code

long var1;

void store_var1(long v) {
    var1 = v;
}

long fetch_var1() {
    return var1;
}

Generated Code

000000000040285d <store_var1>:
  40285d: 48 89 3d 0c 42 0a 00          movq    %rdi, 0xa420c(%rip)
                            # 0x4a6a70 <var1>
  402864: c3                            retq
0000000000402865 <fetch_var1>:
  402865: 48 8b 05 04 42 0a 00          movq    0xa4204(%rip), %rax
                            # 0x4a6a70 <var1>
  40286c: c3                            retq

What We Figured Out

  • Global variables are accessed using RIP-relative addressing: 0xa420c(%rip). This means "take the current instruction pointer (RIP) and add the displacement to find the variable."
  • The assembler helpfully shows us the actual computed address in the comment: 0x4a6a70 <var1>. Both store_var1 and fetch_var1 access the same address — they're talking about the same global variable.
  • Why RIP-relative?
    • It takes up less space! If we had 64-bit addresses, they'd need to take 8 bytes in the instruction, but with RIP-relative addressing, the displacement is only 4 bytes.
    • Also, sometimes code doesn't know in advance exactly where it will be loaded in memory, but it does know the distance between itself and the global variable (they're in the same executable). So using an offset from the current position is a reliable way to find it. (This also makes the code position-independent, which is important for shared libraries and security features like ASLR.)
  • The two displacements are slightly different (0xa420c vs 0xa4204) because each instruction is at a different address, but RIP-relative offsets are computed from the end of the current instruction — so they both resolve to 0x4a6a70.

Example 11: LEA — Arithmetic in Disguise

Code

long formula(long x, long y) {
    return x + 2 * y + 3;
}

Generated Code

000000000040288e <formula>:
  40288e: 48 8d 44 77 03                leaq    0x3(%rdi,%rsi,2), %rax
  402893: c3                            retq

What We Figured Out

  • lea stands for Load Effective Address. It computes an address using the same addressing modes we've seen, but instead of reading from memory, it just puts the computed address into the destination register.
  • Here, leaq 0x3(%rdi,%rsi,2), %rax computes RDI + RSI × 2 + 3 and puts the result in RAX. That's exactly x + 2*y + 3!
  • The compiler is being clever: it's repurposing the address-calculation hardware to do arithmetic. No memory access happens at all — lea just does the math.
  • This is a common compiler trick. Whenever arithmetic happens to match the form base + index × scale + displacement (with scale being 1, 2, 4, or 8), the compiler can use a single lea instruction instead of multiple add and mul instructions.

Example 12: Six Arguments — Accumulating a Sum

Code

void six_args(long a, long b, long c, long d, long e, long f) {
    long result = a;
    result += b;
    result += c;
    result += d;
    result += e;
    result += f;
    print_long(result);
}

Generated Code

00000000004027a5 <six_args>:
  4027a5: 55                            pushq   %rbp
  4027a6: 48 89 e5                      movq    %rsp, %rbp
  4027a9: 48 01 f7                      addq    %rsi, %rdi
  4027ac: 48 01 d7                      addq    %rdx, %rdi
  4027af: 48 01 cf                      addq    %rcx, %rdi
  4027b2: 4c 01 c7                      addq    %r8, %rdi
  4027b5: 4c 01 cf                      addq    %r9, %rdi
  4027b8: e8 cc 00 00 00                callq   0x402889 <print_long>
  4027bd: 5d                            popq    %rbp
  4027be: c3                            retq

What We Figured Out

  • addq SRC, DST adds the source to the destination, storing the result in the destination (i.e., DST += SRC).
  • The compiler is being clever again: instead of creating a separate result variable, it accumulates the sum directly in RDI. Since result starts as a, and a is already in RDI, the compiler just keeps adding the other arguments into RDI.
  • Why RDI specifically? Because after computing the sum, the function calls print_long(result) — and the first argument to a function goes in RDI. By accumulating in RDI, the result is already in the right place for the function call. No extra mov needed!
  • The compiler doesn't follow our C code line by line — it figures out what the code means and finds the most efficient way to get there.

Example 13: Twelve Arguments — When Registers Run Out

Code (Caller)

void twelve_args(long a, long b, long c, long d, long e, long f,
                 long g, long h, long i, long j, long k, long l);

void do_twelve() {
    twelve_args(1,2,3,4,5,6,7,8,9,10,11,12);
}

Generated Code (Caller)

000000000040281c <do_twelve>:
  40281c: 55                            pushq   %rbp
  40281d: 48 89 e5                      movq    %rsp, %rbp
  402820: 6a 0c                         pushq   $0xc
  402822: 6a 0b                         pushq   $0xb
  402824: 6a 0a                         pushq   $0xa
  402826: 6a 09                         pushq   $0x9
  402828: 6a 08                         pushq   $0x8
  40282a: 6a 07                         pushq   $0x7
  40282c: 41 b9 06 00 00 00             movl    $0x6, %r9d
  402832: 41 b8 05 00 00 00             movl    $0x5, %r8d
  402838: b9 04 00 00 00                movl    $0x4, %ecx
  40283d: ba 03 00 00 00                movl    $0x3, %edx
  402842: be 02 00 00 00                movl    $0x2, %esi
  402847: bf 01 00 00 00                movl    $0x1, %edi
  40284c: e8 6e ff ff ff                callq   0x4027bf <twelve_args>
  402851: 48 83 c4 30                   addq    $0x30, %rsp
  402855: c9                            leave
  402856: c3                            retq

Code (Callee)

void twelve_args(long a, long b, long c, long d, long e, long f,
                 long g, long h, long i, long j, long k, long l) {
    long result = a;
    result += b;
    result += c;
    result += d;
    result += e;
    result += f;
    result += g;
    result += h;
    result += i;
    result += j;
    result += k;
    result += l;
    print_long(result);
}

Generated Code (Callee)

00000000004027bf <twelve_args>:
  4027bf: 55                            pushq   %rbp
  4027c0: 48 89 e5                      movq    %rsp, %rbp
  4027c3: 48 01 f7                      addq    %rsi, %rdi
  4027c6: 48 01 d7                      addq    %rdx, %rdi
  4027c9: 48 01 cf                      addq    %rcx, %rdi
  4027cc: 4c 01 c7                      addq    %r8, %rdi
  4027cf: 4c 01 cf                      addq    %r9, %rdi
  4027d2: 48 03 7d 10                   addq    0x10(%rbp), %rdi
  4027d6: 48 03 7d 18                   addq    0x18(%rbp), %rdi
  4027da: 48 03 7d 20                   addq    0x20(%rbp), %rdi
  4027de: 48 03 7d 28                   addq    0x28(%rbp), %rdi
  4027e2: 48 03 7d 30                   addq    0x30(%rbp), %rdi
  4027e6: 48 03 7d 38                   addq    0x38(%rbp), %rdi
  4027ea: e8 9a 00 00 00                callq   0x402889 <print_long>
  4027ef: 5d                            popq    %rbp
  4027f0: c3                            retq

What We Figured Out

  • When a function has more than six arguments, the first six still go in registers (RDI, RSI, RDX, RCX, R8, R9), but the rest are pushed onto the stack.
  • In the caller (do_twelve):
    • The extra arguments (7–12) are pushed in reverse order (12, 11, 10, 9, 8, 7) so that they end up in the correct order on the stack (argument 7 at the lowest address, closest to the top of the stack).
    • After the call returns, addq $0x30, %rsp cleans up the stack. 0x30 = 48 = 6 arguments × 8 bytes each.
    • leave is a shorthand instruction equivalent to movq %rbp, %rsp followed by popq %rbp — it tears down the stack frame in one instruction.
  • In the callee (twelve_args):
    • The first six arguments are added from registers, just like in six_args.
    • The remaining arguments are accessed using displacement addressing off RBP: 0x10(%rbp), 0x18(%rbp), 0x20(%rbp), etc.
    • Why does the 7th argument start at 0x10(%rbp) (offset 16) and not 0x0(%rbp)? Because the stack frame looks like this (growing downward): first the caller pushed the extra arguments, then call pushed the return address (8 bytes), then pushq %rbp saved the old base pointer (8 bytes). So the first stack argument is 16 bytes above the current RBP.
    • Each subsequent argument is 8 bytes further: 0x18, 0x20, 0x28, 0x30, 0x38 — that's arguments 8, 9, 10, 11, and 12.

Bonus Puzzle

In the code for do_twelve, we see the addq $0x30, %rsp instruction to clean up the stack after the function call. But would the program still work correctly if we omitted that instruction? (Why or why not?)

Summary of Addressing Modes

Pulling together all the addressing modes we saw today:

Syntax Name Computes Example
$0x2a Immediate The constant value itself movl $0x2a, %eax
%rdi Register Value in the register movq %rdi, %rax
(%rdi) Indirect Memory at address in register movq (%rdi), %rax
0x8(%rdi) Displacement Memory at register + offset movq 0x8(%rdi), %rax
(%rdi,%rsi,8) Scaled Index Memory at base + index × scale movq (%rdi,%rsi,8), %rax
0x3(%rdi,%rsi,2) Full base + index × scale + displacement leaq 0x3(%rdi,%rsi,2), %rax
0xa420c(%rip) RIP-Relative Memory at instruction pointer + offset movq 0xa420c(%rip), %rax

(When logged in, completion status appears here.)