The Attack Lab: Understanding Buffer-Overflow Bugs
Introduction
This assignment involves generating a total of five attacks on two programs having different security vulnerabilities.
Outcomes you will gain from this lab include:
- You will learn different ways that attackers can exploit security vulnerabilities when programs do not safeguard themselves well enough against buffer overflows.
- Through this, you will get a better understanding of how to write programs that are more secure, as well as some of the features provided by compilers and operating systems to make programs less vulnerable.
- You will gain a deeper understanding of the stack and parameter-passing mechanisms of x86-64 machine code.
- You will gain a deeper understanding of how x86-64 instructions are encoded.
- You will gain more experience with debugging tools such as
gdbandobjdump.
You will want to study Sections 3.10.3, Out-of-Bounds Memory References and Buffer Overflow, and 3.10.4, Thwarting Buffer Overflow Attacks of Bryant & O’Hallaron's Computer Systems (3rd edition) textbook as reference material for this lab.
Warning
In this lab, you will gain firsthand experience with methods used to exploit security weaknesses in operating systems and network servers. Our purpose is to help you learn about the runtime operation of programs and to understand the nature of these security weaknesses so that you can avoid them when you write system code. We do not condone the use of any other form of attack to gain unauthorized access to any system resources. Attacking computer systems belonging to other people, and gaining unauthorized access, is a serious criminal offense that can lead to long prison terms.
Logistics
As usual, you should work with your lab partner(s). You will generate attacks for target programs that are custom-generated for you.
Getting Files
See the instructions on the course website for getting your team's target.
The server will build your files and return them to your browser in a
tar file called target, where
After saving the target file in a (protected) directory on wilkes
in which you plan to do your work, give the command:
tar -xvf target. This will extract a
directory target containing the files described below.
You should only download one set of files. If for some reason you download multiple targets, choose one target to work on and delete the rest.
Warning: If you expand your target on a
PC by using a utility such as Winzip or letting your browser do the
extraction, you'll risk resetting permission bits on the executable
files.
The files in target include:
README.txt- A file describing the contents of the directory.
ctarget- An executable program vulnerable to code-injection attacks.
rtarget- An executable program vulnerable to return-oriented programming attacks.
cookie.txt- An 8-digit hex code that you will use as a unique identifier in your attacks.
farm.c- The source code of your target's “gadget farm”, which you will use in generating return-oriented programming attacks.
hex2raw- A utility to generate attack strings.
In the following instructions, we will assume that you have copied the
files to a protected directory on wilkes, and that you are
executing the programs in that directory.
Hand-in
As in the Bomb lab, there is no explicit hand-in. The target will notify us automatically about your progress as you work on it. You can keep track of how you are doing by looking at the class scoreboard at:
\url{https://www.cs.hmc.edu/cs105/attack/scoreboard}\\
This web page is frequently updated to show the progress for each team's targets.
Note that the web page does \emph{not} update every time you refresh it.
Important Points
Here are some important rules about valid solutions for this lab. These points might not make much sense when you read this document for the first time; they're here as a reference once you get started.
- You must do the assignment on
wilkes. -
Your solutions may not use attacks to circumvent the validation code in the programs.
Specifically, any address you incorporate into an attack string for use by a
retinstruction should be to one of the following destinations:- The addresses for functions
touch1,touch2, ortouch3. - The address of your injected code
- The address of one of your gadgets from the gadget farm.
- The addresses for functions
-
You may only construct gadgets from file
rtargetwith addresses ranging between those for functionsstart_farmandend_farm.
Target Programs
Both ctarget and rtarget read strings from standard input. They
do so with the function getbuf:
unsigned getbuf()
{
char buf[BUFFER_SIZE];
Gets(buf);
return 1;
}
The function Gets is similar to the standard library
function gets—it reads a string from standard input
(terminated by \n or end-of-file) and stores it (along with a null
terminator) at the specified destination. In this code, you can see
that the destination is an array buf, declared as having
BUFFER_SIZE bytes. At the time your targets were generated,
BUFFER_SIZE was a compile-time constant specific to your version.
The functions Gets() and gets()have no way to determine
whether their destination buffers
are large enough to store the string they read. They simply copy
sequences of bytes,
possibly overrunning the bounds of the storage
allocated at the destination.
If the string typed by the user and read by getbuf is
sufficiently short, it is clear that getbuf will return 1,
as shown by the following execution examples:
./ctarget
Cookie: 0x1a7dd803
Type string:Keep it short!
No exploit. Getbuf returned 0x1
Normal return
(Note that the value of the cookie shown will differ from yours.)
Typically an error occurs if you type a long string:
./ctarget
Cookie: 0x1a7dd803
Type string:This is not a very interesting string, but it has the property ...
Ouch!: You caused a segmentation fault!
Better luck next time
Program rtarget will have the same behavior. As the error
message indicates, overrunning the buffer typically causes the
program state to be corrupted, leading to a memory-access error.
Your task is to be more clever with the strings you feed
ctarget and rtarget so that they do more
interesting things. The strings that cause such behavior are called
exploit strings.
Both ctarget and rtarget take several different
command-line arguments:
-h- Print a list of possible command-line arguments
-q- Don't send results to the grading server
-i FILE- Supply input from a file, rather than from standard input
Your exploit strings will typically contain byte values that do not
correspond to the ASCII values for printing characters. The program
hex2raw will enable you to generate these raw strings. More
information on how to use hex2raw.
Important points:
-
Your exploit string must not contain byte value
0x0aat any intermediate position, since that is the ASCII code for newline (“\n'). WhenGets` encounters this byte, it will assume you intended to terminate the string. -
hex2rawexpects two-digit hex values separated by one or more white spaces.So if you want to create a byte with a hex value of 0, you need to write it as
00. To create the word0xdeadbeefyou should pass “ef be ad de” tohex2raw(note the reversal required for little-endian byte ordering). -
Appendix A also contains examples of using I/O redirection (
<and>) and pipes (|) on the command line, as used in the example below.
When you have correctly solved one of the levels, your target program will automatically send a notification to the grading server. For example:
./hex2raw < ctarget.l2.txt | ./ctarget
Cookie: 0x1a7dd803
Type string:Touch2!: You called touch2(0x1a7dd803)
Valid solution for level 2 with target ctarget
PASSED: Sent exploit string to server to be validated.
NICE JOB!
The server will test your exploit string to make sure it really works, and it will update the Attacklab scoreboard page indicating that your user id (listed by your target number for anonymity) has completed this phase.
Unlike the Bomb Lab, there is no penalty for making mistakes in this
lab. Feel free to fire away at ctarget and rtarget with
any strings you like.
Figure \ref{fig:phases} summarizes the five phases of the lab. As can
be seen, the first three involve code-injection (CI) attacks on
ctarget, while the last two involve return-oriented-programming
(ROP) attacks on rtarget.
Note that the fifth phase is extra-credit.
| Phase | Program | Level | Method | Function | Points |
|---|---|---|---|---|---|
| 1 | ctarget |
1 | CI | touch1 |
10 |
| 2 | ctarget |
2 | CI | touch2 |
25 |
| 3 | ctarget |
3 | CI | touch3 |
25 |
| 4 | rtarget |
2 | ROP | touch2 |
40 |
| 5 | rtarget |
3 | ROP | touch3 |
10 |
CI: Code Injection
ROP: Return-Oriented Programming
Part I: Code-Injection Attacks
For the first three phases, your exploit strings will attack ctarget. This program is set up so that the stack positions
will be consistent from one run to the next and so that data on the
stack can be treated as executable code. These features make the program
vulnerable to attacks where the exploit strings contain byte
encodings of executable code.
Level 1
For Phase 1, you will not inject new code. Instead, your exploit string will redirect the program to execute an existing procedure.
Function getbuf is called within ctarget by a function
test, which has the following C code:
void test()
{
int val;
val = getbuf();
printf("No exploit. Getbuf returned 0x%x\n", val);
}
When getbuf executes its return statement (line 5 of getbuf), the program ordinarily resumes execution within function
test (at line 5 of this function).
We want to change this behavior.
Within the file ctarget, there is code for a function touch1 having
the following C representation:
void touch1()
{
vlevel = 1; /* Part of validation protocol */
printf("Touch1!: You called touch1()\n");
validate(1);
exit(0);
}
Your task is to get ctarget to execute the code for touch1
when getbuf executes its return statement, rather than returning
to test. Note that your exploit string may also corrupt parts
of the stack not directly related to this stage, but that will not
cause a problem, since touch1 causes the program to exit
directly.
Some Advice:
-
All the information you need to devise your exploit string for this level can be determined by examining a disassembled version of
ctarget. Useobjdump -dto get this disassembled version. -
The idea is to position a byte representation of the starting address for
touch1so that theretinstruction at the end of the code forgetbufwill transfer control totouch1. -
Be careful about byte ordering, and remember that addresses are 8 bytes.
-
You might want to use
gdbto step the program through the last few instructions ofgetbufto make sure it is doing the right thing. -
The placement of
bufwithin the stack frame forgetbufdepends on the value of the compile-time constantBUFFER_SIZEand on the allocation strategy used bygcc. You will need to examine the disassembled code to determine its position.
Level 2
Phase 2 involves injecting a small amount of code as part of your exploit string.
Within the file ctarget there is code for a function touch2
having the following C representation:
void touch2(unsigned val)
{
vlevel = 2; /* Part of validation protocol */
if (val == cookie) {
printf("Touch2!: You called touch2(0x%.8x)\n", val);
validate(2);
} else {
printf("Misfire: You called touch2(0x%.8x)\n", val);
fail(2);
}
exit(0);
}
Your task is to get ctarget to execute the
code for touch2 rather than returning to test. In this
case, however, you must make it appear to touch2 as if you have
passed your cookie as its argument.
Some Advice:
-
You will want to position a byte representation of the address of your injected code in such a way that
retinstruction at the end of the code forgetbufwill transfer control to it. -
Recall that the first argument to a function is passed in register
rdi. -
Your injected code should set the register to your cookie, and then use a
retinstruction to transfer control to the first instruction intouch2. -
Do not attempt to use
jmporcallinstructions in your exploit code. The encodings of destination addresses for these instructions are difficult to formulate. Useretinstructions for all transfers of control, even when you are not returning from a call. -
See the discussion in Appendix \ref{app:bytecode} on how to use tools to generate the byte-level representations of instruction sequences.
Level 3
Phase 3 also involves a code-injection attack, but this time requires passing a string as the argument.
Within the file ctarget there is code for functions hexmatch and touch3
having the following C representations:
/* Compare string to hex represention of unsigned value */
int hexmatch(unsigned val, char *sval)
{
char cbuf[110];
/* Make position of check string unpredictable */
char *s = cbuf + random() % 100;
sprintf(s, "%.8x", val);
return strncmp(sval, s, 9) == 0;
}
void touch3(char *sval)
{
vlevel = 3; /* Part of validation protocol */
if (hexmatch(cookie, sval)) {
printf("Touch3!: You called touch3(\"%s\")\n", sval);
validate(3);
} else {
printf("Misfire: You called touch3(\"%s\")\n", sval);
fail(3);
}
exit(0);
}
Your task is to get ctarget to execute the
code for touch3 rather than returning to test.
You must make it appear to touch3 as if you have
passed a string representation of your cookie as its argument.
Some Advice:
-
You will need to include a string representation of your cookie in your exploit string. The string should consist of the eight hexadecimal digits (ordered from most to least significant) without a leading “
0x.” -
Recall that a string is represented in C as a sequence of bytes followed by a byte with value 0. Type “
man ascii” on any Linux machine to see the byte representations of the characters you need. -
Your injected code should set register
regto the address of this string. -
When the functions
hexmatchandstrncmpare called, they push data onto the stack, overwriting portions of the memory that held the buffer used bygetbuf. As a result, you will need to be careful where you place the string representation of your cookie. -
hexmatchis a bit difficult to understand. In essence,hexmatch(val, sval)convertsvalinto printable hexadecimal format and compares it with the 8-character stringsval, returning true if the two match. So, for example:hexmatch(0x1234abcd, "1234abcd")returns truehexmatch(0x1, "00000001")returns truehexmatch(0xdcba4321, "1234abcd")returns false, andhexmatch(0x1234ABCD, "1234ABCD")returns false (why?).
Part II: Return-Oriented Programming
Performing code-injection attacks on the program rtarget is much
more difficult than it is for ctarget, because it uses
two techniques to thwart such attacks:
- It uses randomization so that the stack positions differ from one run to another. This makes it impossible to determine where your injected code will be located.
- It marks the section of memory holding the stack as nonexecutable, so even if you could set the program counter to the start of your injected code, the program would fail with a segmentation fault.
Fortunately, clever people have devised strategies for getting useful
things done in a program by executing existing code, rather than
injecting new code. The most general form of this approach is referred to as
return-oriented programming (ROP)
\cite{roemer-2012,schwartz-2011}. The strategy with ROP is to
identify byte sequences within an existing program that consist of one
or more instructions followed by the instruction ret. Such
a segment is referred to as a gadget. Figure \ref{fig:rop}
illustrates how the stack can be set up to execute a sequence of n
gadgets. In this figure, the stack contains a sequence of gadget
addresses. Each gadget consists of a series of instruction
bytes, with the final one being 0xc3, encoding the
ret instruction.
When the program executes a ret instruction starting with this
configuration, it will initiate a chain of gadget executions,
with the ret instruction at the end of each gadget causing the
program to jump to the beginning of the next.

Setting up a sequence of gadgets for execution. Byte
value 0xc3 encodes the ret instruction.
A gadget can make use of code corresponding to assembly-language
statements generated by the compiler, especially ones at the ends of
functions. In practice, there may be some useful gadgets of this form,
but not enough to implement many important operations. For example,
it is highly unlikely that a compiled function would have popq
reg as its last instruction before ret. Fortunately,
with a byte-oriented instruction set, such as x86-64, a gadget can
often be found by extracting patterns from other parts of the
instruction byte sequence.
For example, one version of rtarget contains code generated for the
following C function:
void setval_210(unsigned *p)
{
*p = 3347663060U;
}
The chances of this function being useful for attacking a system seem pretty slim. But, the disassembled machine code for this function shows an interesting byte sequence:
0000000000400f15 <setval_210>:
400f15: c7 07 d4 48 89 c7 movl $0xc78948d4,(%rdi)
400f1b: c3 retq
The byte sequence 48 89 c7 encodes the instruction
movq rax, rdi
(See Figure for the encodings of useful movq instructions.)
This sequence is followed by byte value c3, which encodes the
ret instruction.
The function starts at address 0x400f15, and the sequence starts
on the fourth byte of the function. Thus, this code contains a
gadget
that has a starting address of 0x400f18
and which will copy the 64-bit value in register rax
to register rdi.
Your code for rtarget contains a number of functions similar to
the setval_210 function shown above. They all reside
in a region we refer to as
the gadget farm. Your job will be to identify useful gadgets in
the gadget farm and use them to perform attacks similar to those you
did in Phases 2 and 3.
Important: The gadget farm is demarcated by functions
start_farm and end_farm in your copy of rtarget.
Do not attempt to construct gadgets from other portions of the program code.
| Source | Destination D | |||||||
|---|---|---|---|---|---|---|---|---|
| S | %rax |
%rcx |
%rdx |
%rbx |
%rsp |
%rbp |
%rsi |
%rdi |
%rax |
48 89 c0 | 48 89 c1 | 48 89 c2 | 48 89 c3 | 48 89 c4 | 48 89 c5 | 48 89 c6 | 48 89 c7 |
%rcx |
48 89 c8 | 48 89 c9 | 48 89 ca | 48 89 cb | 48 89 cc | 48 89 cd | 48 89 ce | 48 89 cf |
%rdx |
48 89 d0 | 48 89 d1 | 48 89 d2 | 48 89 d3 | 48 89 d4 | 48 89 d5 | 48 89 d6 | 48 89 d7 |
%rbx |
48 89 d8 | 48 89 d9 | 48 89 da | 48 89 db | 48 89 dc | 48 89 dd | 48 89 de | 48 89 df |
%rsp |
48 89 e0 | 48 89 e1 | 48 89 e2 | 48 89 e3 | 48 89 e4 | 48 89 e5 | 48 89 e6 | 48 89 e7 |
%rbp |
48 89 e8 | 48 89 e9 | 48 89 ea | 48 89 eb | 48 89 ec | 48 89 ed | 48 89 ee | 48 89 ef |
%rsi |
48 89 f0 | 48 89 f1 | 48 89 f2 | 48 89 f3 | 48 89 f4 | 48 89 f5 | 48 89 f6 | 48 89 f7 |
%rdi |
48 89 f8 | 48 89 f9 | 48 89 fa | 48 89 fb | 48 89 fc | 48 89 fd | 48 89 fe | 48 89 ff |
Byte encodings of movq (movq S, D`) instructions. (Values in hexadecimal.)
| Register R | ||||||||
|---|---|---|---|---|---|---|---|---|
| Operation | %rax |
%rcx |
%rdx |
%rbx |
%rsp |
%rbp |
%rsi |
%rdi |
| popq R | 58 | 59 | 5a | 5b | 5c | 5d | 5e | 5f |
Byte Encodings of popq Instructions. (Values in hexadecimal.)
| Source | Destination D | |||||||
|---|---|---|---|---|---|---|---|---|
| S | %eax |
%ecx |
%edx |
%ebx |
%esp |
%ebp |
%esi |
%edi |
%eax |
89 c0 | 89 c1 | 89 c2 | 89 c3 | 89 c4 | 89 c5 | 89 c6 | 89 c7 |
%ecx |
89 c8 | 89 c9 | 89 ca | 89 cb | 89 cc | 89 cd | 89 ce | 89 cf |
%edx |
89 d0 | 89 d1 | 89 d2 | 89 d3 | 89 d4 | 89 d5 | 89 d6 | 89 d7 |
%ebx |
89 d8 | 89 d9 | 89 da | 89 db | 89 dc | 89 dd | 89 de | 89 df |
%esp |
89 e0 | 89 e1 | 89 e2 | 89 e3 | 89 e4 | 89 e5 | 89 e6 | 89 e7 |
%ebp |
89 e8 | 89 e9 | 89 ea | 89 eb | 89 ec | 89 ed | 89 ee | 89 ef |
%esi |
89 f0 | 89 f1 | 89 f2 | 89 f3 | 89 f4 | 89 f5 | 89 f6 | 89 f7 |
%edi |
89 f8 | 89 f9 | 89 fa | 89 fb | 89 fc | 89 fd | 89 fe | 89 ff |
| Register R | |||||
|---|---|---|---|---|---|
| Operation | %al |
%cl |
%dl |
%bl |
|
| andb | R, R | 20 c0 |
20 c9 |
20 d2 |
20 db |
| orb | R, R | 08 c0 |
08 c9 |
08 d2 |
08 db |
| cmpb | R, R | 38 c0 |
38 c9 |
38 d2 |
38 db |
| testb | R, R | 84 c0 |
84 c9 |
84 d2 |
84 db |
(When logged in, completion status appears here.)