Thoroughly Stripped

CSAW CTF 2017 Finals (Forensics, 200 pts)

Woo, first CTF writeup. I got the opportunity to participate in the 2017 CSAW CTF finals with Don’t Hack Alone.

Dumped by my core, left to bleed out bytes on the heap, I was stripped of my dignity… The last thing I could do was to let other programs strip me of my null-bytes just so my memory could live on.

We are provided with a core dump. Examining the flavor-text and the dump, we notice that the dump has no null bytes; we conjecture that they have been stripped out.

Next, we examine the hexdump and look for any clues. There are a bunch of ASCII strings, but they look like normal debugging symbols. One thing that jumps out is that there are a couple fairly convincing regular striped patterns that become vertically aligned if you display 20 bytes in each line. Once we do that, we notice the following section. (This dump is from xxb but xxd -c 20 thoroughlyStripped is quite sufficient.)

(00000370) 66ff ffff 5548 89e5 be66 488d 3d16 1820 e8b1 feff | f???UH???fH?=:: ????
(00000384) ff90 5dc3 5548 89e5 be6c 488d 3dfe 1720 e899 feff | ??]?UH???lH?=?: ????
(00000398) ff90 5dc3 5548 89e5 be61 488d 3de6 1720 e881 feff | ??]?UH???aH?=?: ????
(000003ac) ff90 5dc3 5548 89e5 be67 488d 3dce 1720 e869 feff | ??]?UH???gH?=?: ?i??
(000003c0) ff90 5dc3 5548 89e5 be7b 488d 3db6 1720 e851 feff | ??]?UH???{H?=?: ?Q??
(000003d4) ff90 5dc3 5548 89e5 be73 488d 3d9e 1720 e839 feff | ??]?UH???sH?=?: ?9??
(000003e8) ff90 5dc3 5548 89e5 be74 488d 3d86 1720 e821 feff | ??]?UH???tH?=?: ?!??
(000003fc) ff90 5dc3 5548 89e5 be79 488d 3d6e 1720 e809 feff | ??]?UH???yH?=n: ?:??
(00000410) ff90 5dc3 5548 89e5 be5f 488d 3d56 1720 e8f1 fdff | ??]?UH???_H?=V: ????
(00000424) ff90 5dc3 5548 89e5 be69 488d 3d3e 1720 e8d9 fdff | ??]?UH???iH?=>: ????
(00000438) ff90 5dc3 5548 89e5 be6e 488d 3d26 1720 e8c1 fdff | ??]?UH???nH?=&: ????
(0000044c) ff90 5dc3 5548 89e5 be63 488d 3d0e 1720 e8a9 fdff | ??]?UH???cH?=:: ????
(00000460) ff90 5dc3 5548 89e5 be6f 488d 3df6 1620 e891 fdff | ??]?UH???oH?=?: ????
(00000474) ff90 5dc3 5548 89e5 be65 488d 3dde 1620 e879 fdff | ??]?UH???eH?=?: ?y??
(00000488) ff90 5dc3 5548 89e5 be6b 488d 3dc6 1620 e861 fdff | ??]?UH???kH?=?: ?a??
(0000049c) ff90 5dc3 5548 89e5 be64 488d 3dae 1620 e849 fdff | ??]?UH???dH?=?: ?I??
(000004b0) ff90 5dc3 5548 89e5 be7d 488d 3d96 1620 e831 fdff | ??]?UH???}H?=?: ?1??
(000004c4) ff90 5dc3 5548 89e5 e85f feff ffe8 72fe ffff e885 | ??]?UH???_????r?????

The ASCII characters flag{sty_incoekd} can be read down a column. This absolutely looks like the structure of a flag, but it isn’t. Upon closer examination, we realize each character is unique: it seems to be the flag deduplicated, with only the first occurrence of each character preserved.

(We tried for some time to directly guess the flag based on this hypothesis, but didn’t succeed. After getting the flag, I think guessing it this way would have been hard but not entirely impossible.)

One might hope there’s a sequence of indices into these characters somewhere, something that starts as an arithmetic progression but sometimes repeats earlier elements, but none of the other patterns seem to fit and it’s not at all obvious where to find one. So let’s try to understand this pattern better. Disassembling it doesn’t yield great results because null bytes are randomly missing, but it does allow us to recognize 20-byte chunks that have the outline of a function definition and each hold a character:

55                      push   rbp
48 89 e5                mov    rbp,rsp
...
5d                      pop    rbp
c3                      ret

So for instance, the first chunk looks like 554889e5be66488d3d161820e8b1feffff905dc3 (with some null bytes inserted, in the original executable), and gives this garbage assembly:

>>> from pwn import *
>>> s = "554889e5be66488d3d161820e8b1feffff905dc3"
>>> print(disasm(unhex(s), arch="amd64"))
   0:   55                      push   rbp
   1:   48 89 e5                mov    rbp,rsp
   4:   be 66 48 8d 3d          mov    esi,0x3d8d4866
   9:   16                      (bad)
   a:   18 20                   sbb    BYTE PTR [rax],ah
   c:   e8 b1 fe ff ff          call   0xfffffffffffffec2
  11:   90                      nop
  12:   5d                      pop    rbp
  13:   c3                      ret

Of course, the addresses assigned by the assembler are complete garbage because we know null bytes have been stripped. Still, it’s not hard to take an educated stab at inserting null bytes to get sensible instructions where we load %esi and %rdi (the registers used for passing the first two arguments in x86-64) with values, the first of which is the ASCII character we noticed earlier, and then call a nearby function, although I didn’t bother doing this during the CTF.

   0:   55                      push   rbp
   1:   48 89 e5                mov    rbp,rsp
   4:   be 66 00 00 00          mov    esi,0x66
   9:   48 8d 3d 16 18 20 00    lea    rdi,[rip+0x201816]        # 0x201826
  10:   e8 b1 fe ff ff          call   0xfffffffffffffec6
  15:   90                      nop
  16:   5d                      pop    rbp
  17:   c3                      ret

More broadly, this is helpful because it reminds us that e8 followed by four bytes is a relative call, and just a little below that pattern we see just a lot of relative calls in a row with nothing in between. This also explains why we couldn’t find a simple arithmetic-progression-with-duplicates: even calling the same function twice will produce different bytecode because the calls will be at different relative offsets. It does not look like any null bytes have been stripped here.

(000004c4) .... .... .... .... e85f feff ffe8 72fe ffff e885 | ??]?UH???_????r?????
(000004d8) feff ffe8 98fe ffff e8ab feff ffe8 befe ffff e8d1 | ????????????????????
(000004ec) feff ffe8 6cfe ffff e8df feff ffe8 f2fe ffff e805 | ????l??????????????:
(00000500) ffff ffe8 18ff ffff e8e3 feff ffe8 26ff ffff e839 | ????:?????????&????9
(00000514) ffff ffe8 2cfe ffff e827 feff ffe8 42ff ffff e84d | ????,????'????B????M
(00000528) feff ffe8 38ff ffff e8bb feff ffe8 46ff ffff e8c9 | ????8?????????F?????
(0000053c) feff ffe8 54ff ffff e85f feff ffe8 62ff ffff

We can parse these offsets and, since each relative call is 5 bytes, add 5 to each subsequent offset to obtain a fixed offset against some point in memory, which it seems pretty reasonable to assume will be the first function that does something with the character f. If we normalize these fixed offsets against their start, we see that all the distances are multiples of 24, so this also confirms that each chunk was originally 24 bytes before the null bytes were removed. Dividing and indexing tells us which characters were used in each of the invoked function calls, which is just the flag, and 200 points.

>>> from pwn import *
>>> deduplicated_flag = "flag{sty_incoekd}"
>>> s = "e85ffeffffe872feffffe885feffffe898feffffe8abfeffffe8befeffffe8d1feffffe86cfeffffe8dffeffffe8f2feffffe805ffffffe818ffffffe8e3feffffe826ffffffe839ffffffe82cfeffffe827feffffe842ffffffe84dfeffffe838ffffffe8bbfeffffe846ffffffe8c9feffffe854ffffffe85ffeffffe862ffffff"
>>> offsets = [u32(unhex(chunk[2:]), sign="signed") for chunk in group(10, s)]
>>> normalized_offsets = [off - offsets[0] + 5*i for i, off in enumerate(offsets)]
>>> assert all(off % 24 == 0 for off in normalized_offsets)
>>> print ''.join(deduplicated_flag[off // 24] for off in normalized_offsets)
flag{stay_in_college_kids}