refrain

0CTF/TCTF 2019 Quals

http://111.186.63.17/perf.data.gz

Environment: Ubuntu 16.04+latex

In this challenge, we get a gzipped file called perf.data and a minimal description of an environment. Googling this reveals that perf.data is a record format of the perf tool, a Linux profiler. Installing perf allows us to read perf.data and see some pretty interactive tables of statistics in our terminal describing the profiling results, from which we can see some libraries and addresses being called, but they don’t reveal much about what’s going on. One hacky way to see more of the underlying data in a more human-readable way (and to see just how much of it there is) is perf report -D, which dumps the raw data in an ASCII format, but this is still not that useful. (One might hope that one could simply grep for the flag in this big text dump, but it’s nowhere to be seen.) Still, from this file, we can definitely read off all the exact library versions that the perf record was run against.

0x6178 [0xa8]: event: 10
.
. ... raw event: size 168 bytes
.  0000:  0a 00 00 00 02 00 a8 00 e7 61 00 00 e7 61 00 00  .........a...a..
.  0010:  00 00 40 00 00 00 00 00 00 10 00 00 00 00 00 00  ..@.............
.  0020:  00 00 00 00 00 00 00 00 fd 00 00 00 00 00 00 00  ................
.  0030:  ad 5e 46 00 00 00 00 00 71 95 13 17 00 00 00 00  .^F.....q.......
.  0040:  05 00 00 00 02 18 00 00 2f 75 73 72 2f 6c 69 62  ......../usr/lib
.  0050:  2f 78 38 36 5f 36 34 2d 6c 69 6e 75 78 2d 67 6e  /x86_64-linux-gn
.  0060:  75 2f 49 6d 61 67 65 4d 61 67 69 63 6b 2d 36 2e  u/ImageMagick-6.
.  0070:  38 2e 39 2f 62 69 6e 2d 51 31 36 2f 63 6f 6e 76  8.9/bin-Q16/conv
.  0080:  65 72 74 00 00 00 00 00 e7 61 00 00 e7 61 00 00  ert......a...a..
.  0090:  43 be 7a 60 88 a8 00 00 00 00 00 00 00 00 00 00  C.z`............
.  00a0:  15 00 00 00 00 00 00 00                          ........        

(Better places you may be able to get this from include perf buildid-list. Also, perf script >/dev/null will warn about missing libraries on stderr, which will be useful. More on that later.)

Another useful command is perf report --header-only, which gives, among other stuff:

# cmdline : /usr/lib/linux-hwe-tools-4.10.0-42/perf record -e intel_pt// convert -font Courier text:- image.png

This tells us the exact command that was profiled with perf: it was convert -font Courier text:- image.png, an ImageMagick command that reads text from stdin (presumably the flag) and renders it to a PNG image.

As for the perf invocation, the intel_pt// bit refers to Intel Processor Trace, a source of kernel events that tracks, among other things, whether each conditional branch was taken. It doesn’t track much more than that — much of the data is really just an encoded sequence, representing Ts (“taken”) and Ns (“not taken”), which you can see in perf report -D — but if you have the exact same executable and libraries that the command used, this is enough to perfectly reconstruct the control flow since you can trace the assembly and know what the next branch instruction is at all times. I did this challenge on a laptop running Ubuntu 18.04 and quickly concluded that I didn’t have the same libraries. Fortunately, I had a 16.04 VM that mostly did, so I could get better perf reports in the VM. I also played with the perf script command, which produces at least gigabytes of data, but reports the names and addresses involved in every conditional branch to the best of its ability in a format of millions of lines like this, which can be grepped through.

         convert 32608 [002] 1659210.049834:          1     branches:      7f794df893e9 strcmp (/lib/x86_64-linux-gnu/ld-2.27.so) =>     7f794df8a250 strcmp (/lib/x86_64-linux-gnu/ld-2.27.so)

Now, we need to figure out a place where this information theoretically allows us to reconstruct the text drawn by convert. This is not that easy. The trace only shows us conditional branches, not anything about the value of any data that’s being computed or passed around, and a lot of the time different characters or different pixels aren’t going to cause different control flow; they’ll just be passed around like any other character or pixel value. After a while digging through the ImageMagick source and stepping through some test executions of convert in gdb, I homed in on the calls to the FreeType library, in particular FT_Glyph_To_Bitmap, as the most likely place where different characters would lead to different control flow, although I wasn’t certain until I actually wrote the code that it would work. The idea is that different glyphs will have different numbers of strokes and lead to different numbers of pixels being drawn and such, which will change the control flow.

Even with this idea, though, it’s incredibly hard to actually reason through the control flow for every glyph just to reconstruct each character. This could involve reasoning about subtle differences among hundreds of branches across thousands of lines of assembly. Instead, we’d much rather let the computer do the work for us by running the same convert command against the same libraries on known plaintexts and compare the branching patterns we record against those.

Unfortunately, although I could read the perf report on my VM, I couldn’t get Intel PT recording to work in it, which isn’t that surprising since it has to interact with the processor on a very low level to work. So I ended up finding and forcibly downgrading libfreetype6 to 2.6.1-0.1ubuntu2.3 on my host laptop, the exact same version that was in my VM and that the challenge was recorded against. This was enough to make things like perf script find the symbols it wanted, and to let us make reference recordings that had the same branching behavior per glyph to the given recording for at least the time it spends in libfreetype6.

ASLR means that the addresses being branched between won’t be identical from run to run, but it will still preserve addresses mod 212, so you can get pretty solid fingerprints out of just taking the last three hex digits of all addresses involved.

After a lot of exploratory grepping, I figured out a conditional branch I liked, which occurred at something 0x132 FT_Glyph_To_Bitmap. I grepped for this line and a thousand lines after it out of perf script (since I assumed that grep would be able to get through the millions of lines more efficiently than a casually written Python script) and then postprocessed with a Python script to extract some hashes of the branching patterns, which could be compared against hashes produced the same way from the given perf.data. Trying this out against some very simple perf.data I made confirmed that the same letters seemed to be giving the same fingerprints (although the entire text seemed to be rendered four times with a few fixed calls before and after repetitions, but this was not hard to ignore).

Just to give an example, here’s the result of recording the conversion of flag{aaabbbcccddd} and postprocessing, with lines cut off at the right because all the lines are thousands of characters long. The lines look identical except for the hashes at the end because they don’t diverge until hundreds of characters in, but the hash at the end lets us see identical and different branching behavior easily. In particular, you can see feb2cf, 01c5bc, 20a3ce, and 115279 each repeat three times each, suggesting they correspond to the renderings of glyphs a, b, c, and d respectively. Then you can confirm that feb2cf reappears just a few lines earlier, corresponding to the a of flag. The whole thing repeats four times, surrounded and delimited by 75a768 and with a single extra 1c418a at the start, whose significance I’m not sure of, but since those hashes appear in the same positions in the fingerprints from the challenge recording we don’t need to worry about them.

1c418a 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
75a768 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
43dffc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
92c2d1 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
6db238 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
75a768 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
43dffc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
92c2d1 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
6db238 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
75a768 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
43dffc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
92c2d1 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
6db238 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
75a768 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
43dffc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
92c2d1 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
6db238 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
75a768 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc

I then recorded a reference perf.data like this (the digits are tripled just so, as above, it’s easier to notice the triplets of repeating hashes and slightly error-correct positions):

echo 'flag{abcdefghijklmnopqrstuvwxyz000111222333444555666777888999}' | sudo perf record -e intel_pt// convert -font Courier text:- image.png

This produces a fingerprint for the next few lines after each branch from 0x132 FT_Glyph_To_Bitmap.

perf script -i perf.data | grep "132 FT_Glyph_To_Bitmap" -A 1000 | python3 post1000.py > pout

The post1000.py script used in the last step to produce all of the above text dumps is the following extremely hacky snippet, which, for each branch from 0x132 FT_Glyph_To_Bitmap, extracts the mod-212 addresses along with some capital letters for the next couple branches that land wholly in libfreetype, with the hope that human inspection will be able to recover some information if something goes wrong, and then hashes the result for easy comparison. Nothing went wrong, so the rest of the line didn’t matter. (We don’t expect branches that go outside libfreetype to be deterministic functions of the letter or glyph being drawn — for example, if libfreetype has to malloc any memory, the control flow in malloc can depend chaotically on various allocations that happened earlier on different glyphs or different parts of the processing altogether. For the same reason, we don’t expect recording the 1000 branches after every branch from 0x132 FT_Glyph_To_Bitmap to stop at the exact same place in libfreetype control flow, so we only hash a somewhat arbitrary prefix.)

After this point I didn’t bother scripting the extraction of the flag, and just spent a few minutes manually comparing the hashes produced from the challenge perf.data against the hashes produced from the known plaintext perf.data and notating the flag’s characters one by one in Vim. This produces the flag:

flag{1df9e1d99ff7ea50bbe782492430b223}

(note: the commenting setup here is experimental and I may not check my comments often; if you want to tell me something instead of the world, email me!)