CS 105

The Attack Lab: Understanding Buffer-Overflow Bugs

Introduction

This assignment involves generating a total of five attacks on two programs having different security vulnerabilities.

Outcomes you will gain from this lab include:

  • You will learn different ways that attackers can exploit security vulnerabilities when programs do not safeguard themselves well enough against buffer overflows.
  • Through this, you will get a better understanding of how to write programs that are more secure, as well as some of the features provided by compilers and operating systems to make programs less vulnerable.
  • You will gain a deeper understanding of the stack and parameter-passing mechanisms of x86-64 machine code.
  • You will gain a deeper understanding of how x86-64 instructions are encoded.
  • You will gain more experience with debugging tools such as gdb and objdump.

You will want to study Sections 3.10.3, Out-of-Bounds Memory References and Buffer Overflow, and 3.10.4, Thwarting Buffer Overflow Attacks of Bryant & O’Hallaron's Computer Systems (3rd edition) textbook as reference material for this lab.

Warning

In this lab, you will gain firsthand experience with methods used to exploit security weaknesses in operating systems and network servers. Our purpose is to help you learn about the runtime operation of programs and to understand the nature of these security weaknesses so that you can avoid them when you write system code. We do not condone the use of any other form of attack to gain unauthorized access to any system resources. Attacking computer systems belonging to other people, and gaining unauthorized access, is a serious criminal offense that can lead to long prison terms.

Logistics

As usual, you should work with your lab partner(s). You will generate attacks for target programs that are custom-generated for you.

Getting Files

See the instructions on the course website for getting your team's target.

The server will build your files and return them to your browser in a tar file called targetk.tar, where k is the unique number of your target programs.

After saving the targetk.tar file in a (protected) directory on wilkes in which you plan to do your work, give the command: tar -xvf targetk.tar. This will extract a directory targetk containing the files described below.

You should only download one set of files. If for some reason you download multiple targets, choose one target to work on and delete the rest.

Warning: If you expand your targetk.tar on a PC by using a utility such as Winzip or letting your browser do the extraction, you'll risk resetting permission bits on the executable files.

The files in targetk include:

README.txt
A file describing the contents of the directory.
ctarget
An executable program vulnerable to code-injection attacks.
rtarget
An executable program vulnerable to return-oriented programming attacks.
cookie.txt
An 8-digit hex code that you will use as a unique identifier in your attacks.
farm.c
The source code of your target's “gadget farm”, which you will use in generating return-oriented programming attacks.
hex2raw
A utility to generate attack strings.

In the following instructions, we will assume that you have copied the files to a protected directory on wilkes, and that you are executing the programs in that directory.

Hand-in

As in the Bomb lab, there is no explicit hand-in. The target will notify us automatically about your progress as you work on it. You can keep track of how you are doing by looking at the class scoreboard at:

\url{https://www.cs.hmc.edu/cs105/attack/scoreboard}\\


This web page is frequently updated to show the progress for each team's targets.
Note that the web page does \emph{not} update every time you refresh it.

Important Points

Here are some important rules about valid solutions for this lab. These points might not make much sense when you read this document for the first time; they're here as a reference once you get started.

  • You must do the assignment on wilkes.
  • Your solutions may not use attacks to circumvent the validation code in the programs.

    Specifically, any address you incorporate into an attack string for use by a ret instruction should be to one of the following destinations:

    • The addresses for functions touch1, touch2, or touch3.
    • The address of your injected code
    • The address of one of your gadgets from the gadget farm.
  • You may only construct gadgets from file rtarget with addresses ranging between those for functions start_farm and end_farm.

Target Programs

Both ctarget and rtarget read strings from standard input. They do so with the function getbuf:

unsigned getbuf()
{
    char buf[BUFFER_SIZE];
    Gets(buf);
    return 1;
}

The function Gets is similar to the standard library function gets—it reads a string from standard input (terminated by \n or end-of-file) and stores it (along with a null terminator) at the specified destination. In this code, you can see that the destination is an array buf, declared as having BUFFER_SIZE bytes. At the time your targets were generated, BUFFER_SIZE was a compile-time constant specific to your version.

The functions Gets() and gets()have no way to determine whether their destination buffers are large enough to store the string they read. They simply copy sequences of bytes, possibly overrunning the bounds of the storage allocated at the destination.

If the string typed by the user and read by getbuf is sufficiently short, it is clear that getbuf will return 1, as shown by the following execution examples:

./ctarget
Cookie: 0x1a7dd803
Type string:Keep it short!
No exploit.  Getbuf returned 0x1
Normal return

(Note that the value of the cookie shown will differ from yours.)

Typically an error occurs if you type a long string:

./ctarget
Cookie: 0x1a7dd803
Type string:This is not a very interesting string, but it has the property ...
Ouch!: You caused a segmentation fault!
Better luck next time

Program rtarget will have the same behavior. As the error message indicates, overrunning the buffer typically causes the program state to be corrupted, leading to a memory-access error. Your task is to be more clever with the strings you feed ctarget and rtarget so that they do more interesting things. The strings that cause such behavior are called exploit strings.

Both ctarget and rtarget take several different command-line arguments:

-h
Print a list of possible command-line arguments
-q
Don't send results to the grading server
-i FILE
Supply input from a file, rather than from standard input

Your exploit strings will typically contain byte values that do not correspond to the ASCII values for printing characters. The program hex2raw will enable you to generate these raw strings. More information on how to use hex2raw.

Important points:

  • Your exploit string must not contain byte value 0x0a at any intermediate position, since that is the ASCII code for newline (“\n'). WhenGets` encounters this byte, it will assume you intended to terminate the string.

  • hex2raw expects two-digit hex values separated by one or more white spaces.

    So if you want to create a byte with a hex value of 0, you need to write it as 00. To create the word 0xdeadbeef you should pass “ef be ad de” to hex2raw (note the reversal required for little-endian byte ordering).

  • Appendix A also contains examples of using I/O redirection (< and >) and pipes (|) on the command line, as used in the example below.

When you have correctly solved one of the levels, your target program will automatically send a notification to the grading server. For example:

./hex2raw < ctarget.l2.txt | ./ctarget
Cookie: 0x1a7dd803
Type string:Touch2!: You called touch2(0x1a7dd803)
Valid solution for level 2 with target ctarget
PASSED: Sent exploit string to server to be validated.
NICE JOB!

The server will test your exploit string to make sure it really works, and it will update the Attacklab scoreboard page indicating that your user id (listed by your target number for anonymity) has completed this phase.

Unlike the Bomb Lab, there is no penalty for making mistakes in this lab. Feel free to fire away at ctarget and rtarget with any strings you like.

Figure \ref{fig:phases} summarizes the five phases of the lab. As can be seen, the first three involve code-injection (CI) attacks on ctarget, while the last two involve return-oriented-programming (ROP) attacks on rtarget. Note that the fifth phase is extra-credit.

Phase Program Level Method Function Points
1 ctarget 1 CI touch1 10
2 ctarget 2 CI touch2 25
3 ctarget 3 CI touch3 25
4 rtarget 2 ROP touch2 40
5 rtarget 3 ROP touch3 10

CI: Code Injection
ROP: Return-Oriented Programming

Summary of attack lab phases.

Part I: Code-Injection Attacks

For the first three phases, your exploit strings will attack ctarget. This program is set up so that the stack positions will be consistent from one run to the next and so that data on the stack can be treated as executable code. These features make the program vulnerable to attacks where the exploit strings contain byte encodings of executable code.

Level 1

For Phase 1, you will not inject new code. Instead, your exploit string will redirect the program to execute an existing procedure.

Function getbuf is called within ctarget by a function test, which has the following C code:

void test()
{
    int val;
    val = getbuf(); 
    printf("No exploit.  Getbuf returned 0x%x\n", val);
}

When getbuf executes its return statement (line 5 of getbuf), the program ordinarily resumes execution within function test (at line 5 of this function). We want to change this behavior. Within the file ctarget, there is code for a function touch1 having the following C representation:

void touch1()
{
    vlevel = 1;       /* Part of validation protocol */
    printf("Touch1!: You called touch1()\n");
    validate(1);
    exit(0);
}

Your task is to get ctarget to execute the code for touch1 when getbuf executes its return statement, rather than returning to test. Note that your exploit string may also corrupt parts of the stack not directly related to this stage, but that will not cause a problem, since touch1 causes the program to exit directly.

Some Advice:

  • All the information you need to devise your exploit string for this level can be determined by examining a disassembled version of ctarget. Use objdump -d to get this disassembled version.

  • The idea is to position a byte representation of the starting address for touch1 so that the ret instruction at the end of the code for getbuf will transfer control to touch1.

  • Be careful about byte ordering, and remember that addresses are 8 bytes.

  • You might want to use gdb to step the program through the last few instructions of getbuf to make sure it is doing the right thing.

  • The placement of buf within the stack frame for getbuf depends on the value of the compile-time constant BUFFER_SIZE and on the allocation strategy used by gcc. You will need to examine the disassembled code to determine its position.

Level 2

Phase 2 involves injecting a small amount of code as part of your exploit string.

Within the file ctarget there is code for a function touch2 having the following C representation:

void touch2(unsigned val)
{
    vlevel = 2;       /* Part of validation protocol */
    if (val == cookie) {
    printf("Touch2!: You called touch2(0x%.8x)\n", val);
    validate(2);
    } else {
    printf("Misfire: You called touch2(0x%.8x)\n", val);
    fail(2);
    }
    exit(0);
}

Your task is to get ctarget to execute the code for touch2 rather than returning to test. In this case, however, you must make it appear to touch2 as if you have passed your cookie as its argument.

Some Advice:

  • You will want to position a byte representation of the address of your injected code in such a way that ret instruction at the end of the code for getbuf will transfer control to it.

  • Recall that the first argument to a function is passed in register rdi.

  • Your injected code should set the register to your cookie, and then use a ret instruction to transfer control to the first instruction in touch2.

  • Do not attempt to use jmp or call instructions in your exploit code. The encodings of destination addresses for these instructions are difficult to formulate. Use ret instructions for all transfers of control, even when you are not returning from a call.

  • See the discussion in Appendix \ref{app:bytecode} on how to use tools to generate the byte-level representations of instruction sequences.

Level 3

Phase 3 also involves a code-injection attack, but this time requires passing a string as the argument.

Within the file ctarget there is code for functions hexmatch and touch3 having the following C representations:

/* Compare string to hex represention of unsigned value */
int hexmatch(unsigned val, char *sval)
{
    char cbuf[110];
    /* Make position of check string unpredictable */
    char *s = cbuf + random() % 100;
    sprintf(s, "%.8x", val);
    return strncmp(sval, s, 9) == 0;
}

void touch3(char *sval)
{
    vlevel = 3;       /* Part of validation protocol */
    if (hexmatch(cookie, sval)) {
    printf("Touch3!: You called touch3(\"%s\")\n", sval);
    validate(3);
    } else {
    printf("Misfire: You called touch3(\"%s\")\n", sval);
    fail(3);
    }
    exit(0);
}

Your task is to get ctarget to execute the code for touch3 rather than returning to test. You must make it appear to touch3 as if you have passed a string representation of your cookie as its argument.

Some Advice:

  • You will need to include a string representation of your cookie in your exploit string. The string should consist of the eight hexadecimal digits (ordered from most to least significant) without a leading “0x.”

  • Recall that a string is represented in C as a sequence of bytes followed by a byte with value 0. Type “man ascii” on any Linux machine to see the byte representations of the characters you need.

  • Your injected code should set register reg to the address of this string.

  • When the functions hexmatch and strncmp are called, they push data onto the stack, overwriting portions of the memory that held the buffer used by getbuf. As a result, you will need to be careful where you place the string representation of your cookie.

  • hexmatch is a bit difficult to understand. In essence, hexmatch(val, sval) converts val into printable hexadecimal format and compares it with the 8-character string sval, returning true if the two match. So, for example:

    • hexmatch(0x1234abcd, "1234abcd") returns true
    • hexmatch(0x1, "00000001") returns true
    • hexmatch(0xdcba4321, "1234abcd") returns false, and
    • hexmatch(0x1234ABCD, "1234ABCD") returns false (why?).

Part II: Return-Oriented Programming

Performing code-injection attacks on the program rtarget is much more difficult than it is for ctarget, because it uses two techniques to thwart such attacks:

  • It uses randomization so that the stack positions differ from one run to another. This makes it impossible to determine where your injected code will be located.
  • It marks the section of memory holding the stack as nonexecutable, so even if you could set the program counter to the start of your injected code, the program would fail with a segmentation fault.

Fortunately, clever people have devised strategies for getting useful things done in a program by executing existing code, rather than injecting new code. The most general form of this approach is referred to as return-oriented programming (ROP) \cite{roemer-2012,schwartz-2011}. The strategy with ROP is to identify byte sequences within an existing program that consist of one or more instructions followed by the instruction ret. Such a segment is referred to as a gadget. Figure \ref{fig:rop} illustrates how the stack can be set up to execute a sequence of n gadgets. In this figure, the stack contains a sequence of gadget addresses. Each gadget consists of a series of instruction bytes, with the final one being 0xc3, encoding the ret instruction. When the program executes a ret instruction starting with this configuration, it will initiate a chain of gadget executions, with the ret instruction at the end of each gadget causing the program to jump to the beginning of the next.

Setting up a sequence of gadgets for execution. Byte value 0xc3 encodes the ret instruction.

A gadget can make use of code corresponding to assembly-language statements generated by the compiler, especially ones at the ends of functions. In practice, there may be some useful gadgets of this form, but not enough to implement many important operations. For example, it is highly unlikely that a compiled function would have popq reg as its last instruction before ret. Fortunately, with a byte-oriented instruction set, such as x86-64, a gadget can often be found by extracting patterns from other parts of the instruction byte sequence.

For example, one version of rtarget contains code generated for the following C function:

void setval_210(unsigned *p)
{
    *p = 3347663060U;
}

The chances of this function being useful for attacking a system seem pretty slim. But, the disassembled machine code for this function shows an interesting byte sequence:

0000000000400f15 <setval_210>:
  400f15:   c7 07 d4 48 89 c7       movl   $0xc78948d4,(%rdi)
  400f1b:   c3                      retq

The byte sequence 48 89 c7 encodes the instruction movq rax, rdi (See Figure for the encodings of useful movq instructions.) This sequence is followed by byte value c3, which encodes the ret instruction. The function starts at address 0x400f15, and the sequence starts on the fourth byte of the function. Thus, this code contains a gadget that has a starting address of 0x400f18 and which will copy the 64-bit value in register rax to register rdi.

Your code for rtarget contains a number of functions similar to the setval_210 function shown above. They all reside in a region we refer to as the gadget farm. Your job will be to identify useful gadgets in the gadget farm and use them to perform attacks similar to those you did in Phases 2 and 3.

Important: The gadget farm is demarcated by functions start_farm and end_farm in your copy of rtarget. Do not attempt to construct gadgets from other portions of the program code.

Source Destination D
S %rax %rcx %rdx %rbx %rsp %rbp %rsi %rdi
%rax 48 89 c0 48 89 c1 48 89 c2 48 89 c3 48 89 c4 48 89 c5 48 89 c6 48 89 c7
%rcx 48 89 c8 48 89 c9 48 89 ca 48 89 cb 48 89 cc 48 89 cd 48 89 ce 48 89 cf
%rdx 48 89 d0 48 89 d1 48 89 d2 48 89 d3 48 89 d4 48 89 d5 48 89 d6 48 89 d7
%rbx 48 89 d8 48 89 d9 48 89 da 48 89 db 48 89 dc 48 89 dd 48 89 de 48 89 df
%rsp 48 89 e0 48 89 e1 48 89 e2 48 89 e3 48 89 e4 48 89 e5 48 89 e6 48 89 e7
%rbp 48 89 e8 48 89 e9 48 89 ea 48 89 eb 48 89 ec 48 89 ed 48 89 ee 48 89 ef
%rsi 48 89 f0 48 89 f1 48 89 f2 48 89 f3 48 89 f4 48 89 f5 48 89 f6 48 89 f7
%rdi 48 89 f8 48 89 f9 48 89 fa 48 89 fb 48 89 fc 48 89 fd 48 89 fe 48 89 ff

Byte encodings of movq (movq S, D`) instructions. (Values in hexadecimal.)

Register R
Operation %rax %rcx %rdx %rbx %rsp %rbp %rsi %rdi
popq R 58 59 5a 5b 5c 5d 5e 5f

Byte Encodings of popq Instructions. (Values in hexadecimal.)

Source Destination D
S %eax %ecx %edx %ebx %esp %ebp %esi %edi
%eax 89 c0 89 c1 89 c2 89 c3 89 c4 89 c5 89 c6 89 c7
%ecx 89 c8 89 c9 89 ca 89 cb 89 cc 89 cd 89 ce 89 cf
%edx 89 d0 89 d1 89 d2 89 d3 89 d4 89 d5 89 d6 89 d7
%ebx 89 d8 89 d9 89 da 89 db 89 dc 89 dd 89 de 89 df
%esp 89 e0 89 e1 89 e2 89 e3 89 e4 89 e5 89 e6 89 e7
%ebp 89 e8 89 e9 89 ea 89 eb 89 ec 89 ed 89 ee 89 ef
%esi 89 f0 89 f1 89 f2 89 f3 89 f4 89 f5 89 f6 89 f7
%edi 89 f8 89 f9 89 fa 89 fb 89 fc 89 fd 89 fe 89 ff
Byte encodings of `movl` (movl _S_, _D_`) instructions. (Values in hexadecimal.)
Register R
Operation %al %cl %dl %bl
andb R, R 20 c0 20 c9 20 d2 20 db
orb R, R 08 c0 08 c9 08 d2 08 db
cmpb R, R 38 c0 38 c9 38 d2 38 db
testb R, R 84 c0 84 c9 84 d2 84 db
Byte encodings of 2-byte functional nop instructions. (Values in hexadecimal.)
### Level 2 For Phase 4, you will repeat the attack of Phase 2, arranging for your exploit to call `touch2`, but you will do so on the program `rtarget` by using gadgets from your gadget farm. You can construct your solution using gadgets consisting of the following instruction types, and using only the first eight x86-64 registers (`rax`--`rdi`). \begin{description} \item[`movq`]: The codes for these are shown in Figure \ref{fig:encode`A. \item[`popq`]: The codes for these are shown in Figure \ref{fig:encode`B. \item[`ret`]: This instruction is encoded by the single byte `0xc3`. \item[`nop`]: This instruction (pronounced “no op,” which is short for “no operation”, or just “nop”). It is encoded by the single byte `0x90`. Its only effect is to cause the program counter to be incremented by 1. \end{description` **Some Advice:** * All the gadgets you need can be found in the region of `rtarget`'s code demarcated by the functions `start_farm` and `mid_farm`. * You can do this attack with just two gadgets. * When a gadget uses a `popq` instruction, it will pop data from the stack. As a result, your exploit string will contain a combination of gadget addresses and data. ### Level 3 Before you take on Phase 5, pause to consider what you have accomplished so far. In Phases 2 and 3, you caused a program to execute machine code of your own design. If `ctarget` had been a network server, you could have injected your own code into a distant machine. In Phase 4, you circumvented two of the main devices modern systems use to thwart buffer overflow attacks. Although you did not inject your own code, you were able inject a type of program that operates by stitching together sequences of existing code. You have also gotten 100/100 points for the lab. That's a great score! If you have other pressing obligations, consider stopping right now. Phase 5 requires you to do an ROP attack on `rtarget` to invoke the function `touch3` with a pointer to a string representation of your cookie. That may not seem significantly more difficult than using an ROP attack to invoke `touch2`, except that we have made it so. Moreover, Phase 5 counts for only 10 points, which is not a true measure of the effort it will require. Think of it as more an extra-credit problem for those who want to go beyond the normal expectations for the course. To solve Phase 5, you can use gadgets in the region of the code in `rtarget` demarcated by functions `start_farm` and `end_farm`. In addition to the gadgets used in Phase 4, this expanded farm includes the encodings of different `movl` instructions, as shown in Figure \ref{fig:encode`C. The byte sequences in this part of the farm also contain 2-byte instructions that serve as _functional nops_, i.e., they do not change any register or memory values. These include instructions, shown in Figure \ref{fig:encode} D, such as `andb %al,%al`, that operate on the low-order bytes of some of the registers but do not change their values. **Some Advice:** * You'll want to review the effect a `movl` instruction has on the upper 4 bytes of a register, as described on page 183 of the text. * The official solution requires eight gadgets (not all of which are unique). Good luck and have fun! \bibliography{refs}

(When logged in, completion status appears here.)