http://111.186.63.17/perf.data.gz
Environment: Ubuntu 16.04+latex
In this challenge, we get a gzipped file called
perf.data
and a minimal description of an environment.
Googling this reveals that perf.data
is a record format of
the perf
tool, a Linux profiler. Installing
perf
allows us to read perf.data
and see some
pretty interactive tables of statistics in our terminal describing the
profiling results, from which we can see some libraries and addresses
being called, but they don’t reveal much about what’s going on. One
hacky way to see more of the underlying data in a more human-readable
way (and to see just how much of it there is) is
perf report -D
, which dumps the raw data in an ASCII
format, but this is still not that useful. (One might hope that one
could simply grep for the flag in this big text dump, but it’s nowhere
to be seen.) Still, from this file, we can definitely read off all the
exact library versions that the perf record
was run
against.
0x6178 [0xa8]: event: 10
.
. ... raw event: size 168 bytes
. 0000: 0a 00 00 00 02 00 a8 00 e7 61 00 00 e7 61 00 00 .........a...a..
. 0010: 00 00 40 00 00 00 00 00 00 10 00 00 00 00 00 00 ..@.............
. 0020: 00 00 00 00 00 00 00 00 fd 00 00 00 00 00 00 00 ................
. 0030: ad 5e 46 00 00 00 00 00 71 95 13 17 00 00 00 00 .^F.....q.......
. 0040: 05 00 00 00 02 18 00 00 2f 75 73 72 2f 6c 69 62 ......../usr/lib
. 0050: 2f 78 38 36 5f 36 34 2d 6c 69 6e 75 78 2d 67 6e /x86_64-linux-gn
. 0060: 75 2f 49 6d 61 67 65 4d 61 67 69 63 6b 2d 36 2e u/ImageMagick-6.
. 0070: 38 2e 39 2f 62 69 6e 2d 51 31 36 2f 63 6f 6e 76 8.9/bin-Q16/conv
. 0080: 65 72 74 00 00 00 00 00 e7 61 00 00 e7 61 00 00 ert......a...a..
. 0090: 43 be 7a 60 88 a8 00 00 00 00 00 00 00 00 00 00 C.z`............
. 00a0: 15 00 00 00 00 00 00 00 ........
(Better places you may be able to get this from include
perf buildid-list
. Also,
perf script >/dev/null
will warn about missing libraries
on stderr, which will be useful. More on that later.)
Another useful command is perf report --header-only
,
which gives, among other stuff:
# cmdline : /usr/lib/linux-hwe-tools-4.10.0-42/perf record -e intel_pt// convert -font Courier text:- image.png
This tells us the exact command that was profiled with
perf
: it was
convert -font Courier text:- image.png
, an ImageMagick
command that reads text from stdin (presumably the flag) and renders it
to a PNG image.
As for the perf
invocation, the intel_pt//
bit refers to Intel Processor Trace, a source of kernel events that
tracks, among other things, whether each conditional branch was taken.
It doesn’t track much more than that — much of the data is really just
an encoded sequence, representing Ts (“taken”) and Ns (“not taken”),
which you can see in perf report -D
— but if you have the
exact same executable and libraries that the command used, this is
enough to perfectly reconstruct the control flow since you can trace the
assembly and know what the next branch instruction is at all times. I
did this challenge on a laptop running Ubuntu 18.04 and quickly
concluded that I didn’t have the same libraries. Fortunately, I had a
16.04 VM that mostly did, so I could get better
perf report
s in the VM. I also played with the
perf script
command, which produces at least gigabytes of
data, but reports the names and addresses involved in every conditional
branch to the best of its ability in a format of millions of lines like
this, which can be grepped through.
convert 32608 [002] 1659210.049834: 1 branches: 7f794df893e9 strcmp (/lib/x86_64-linux-gnu/ld-2.27.so) => 7f794df8a250 strcmp (/lib/x86_64-linux-gnu/ld-2.27.so)
Now, we need to figure out a place where this information
theoretically allows us to reconstruct the text drawn by
convert
. This is not that easy. The trace only shows us
conditional branches, not anything about the value of any data that’s
being computed or passed around, and a lot of the time different
characters or different pixels aren’t going to cause different control
flow; they’ll just be passed around like any other character or pixel
value. After a while digging through the ImageMagick source and stepping
through some test executions of convert
in
gdb
, I homed in on the calls to the FreeType library, in
particular FT_Glyph_To_Bitmap
, as the most likely place
where different characters would lead to different control flow,
although I wasn’t certain until I actually wrote the code that it would
work. The idea is that different glyphs will have different numbers of
strokes and lead to different numbers of pixels being drawn and such,
which will change the control flow.
Even with this idea, though, it’s incredibly hard to actually reason
through the control flow for every glyph just to reconstruct each
character. This could involve reasoning about subtle differences among
hundreds of branches across thousands of lines of assembly. Instead,
we’d much rather let the computer do the work for us by running the same
convert
command against the same libraries on known
plaintexts and compare the branching patterns we record against
those.
Unfortunately, although I could read the perf report
on
my VM, I couldn’t get Intel PT recording to work in it, which isn’t that
surprising since it has to interact with the processor on a very low
level to work. So I ended up finding and forcibly downgrading
libfreetype6
to 2.6.1-0.1ubuntu2.3
on my host
laptop, the exact same version that was in my VM and that the challenge
was recorded against. This was enough to make things like
perf script
find the symbols it wanted, and to let us make
reference recordings that had the same branching behavior per glyph to
the given recording for at least the time it spends in
libfreetype6
.
ASLR means that the addresses being branched between won’t be identical from run to run, but it will still preserve addresses mod 212, so you can get pretty solid fingerprints out of just taking the last three hex digits of all addresses involved.
After a lot of exploratory grepping, I figured out a conditional
branch I liked, which occurred at something
0x132 FT_Glyph_To_Bitmap
. I grepped for this line and a
thousand lines after it out of perf script
(since I assumed
that grep
would be able to get through the millions of
lines more efficiently than a casually written Python script) and then
postprocessed with a Python script to extract some hashes of the
branching patterns, which could be compared against hashes produced the
same way from the given perf.data
. Trying this out against
some very simple perf.data
I made confirmed that the same
letters seemed to be giving the same fingerprints (although the entire
text seemed to be rendered four times with a few fixed calls before and
after repetitions, but this was not hard to ignore).
Just to give an example, here’s the result of recording the
conversion of flag{aaabbbcccddd}
and postprocessing, with
lines cut off at the right because all the lines are thousands of
characters long. The lines look identical except for the hashes at the
end because they don’t diverge until hundreds of characters in, but the
hash at the end lets us see identical and different branching behavior
easily. In particular, you can see feb2cf
,
01c5bc
, 20a3ce
, and 115279
each
repeat three times each, suggesting they correspond to the renderings of
glyphs a
, b
, c
, and
d
respectively. Then you can confirm that
feb2cf
reappears just a few lines earlier, corresponding to
the a
of flag
. The whole thing repeats four
times, surrounded and delimited by 75a768
and with a single
extra 1c418a
at the start, whose significance I’m not sure
of, but since those hashes appear in the same positions in the
fingerprints from the challenge recording we don’t need to worry about
them.
1c418a 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
75a768 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
43dffc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
92c2d1 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
6db238 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
75a768 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
43dffc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
92c2d1 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
6db238 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
75a768 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
43dffc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
92c2d1 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
6db238 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
75a768 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
43dffc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
92c2d1 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
6db238 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
feb2cf 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
01c5bc 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
20a3ce 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
115279 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
215a7d 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
75a768 132FTGTB 833FTRGI 8b2FTRGI 845FTRGI 875FTRGI f9d fd8 fcc
I then recorded a reference perf.data
like this (the
digits are tripled just so, as above, it’s easier to notice the triplets
of repeating hashes and slightly error-correct positions):
echo 'flag{abcdefghijklmnopqrstuvwxyz000111222333444555666777888999}' | sudo perf record -e intel_pt// convert -font Courier text:- image.png
This produces a fingerprint for the next few lines after each branch
from 0x132 FT_Glyph_To_Bitmap
.
perf script -i perf.data | grep "132 FT_Glyph_To_Bitmap" -A 1000 | python3 post1000.py > pout
The post1000.py
script used in the last step to produce
all of the above text dumps is the following extremely hacky snippet,
which, for each branch from 0x132 FT_Glyph_To_Bitmap
,
extracts the mod-212 addresses along with some capital
letters for the next couple branches that land wholly in
libfreetype
, with the hope that human inspection will be
able to recover some information if something goes wrong, and then
hashes the result for easy comparison. Nothing went wrong, so the rest
of the line didn’t matter. (We don’t expect branches that go outside
libfreetype
to be deterministic functions of the letter or
glyph being drawn — for example, if libfreetype
has to
malloc
any memory, the control flow in malloc
can depend chaotically on various allocations that happened earlier on
different glyphs or different parts of the processing altogether. For
the same reason, we don’t expect recording the 1000 branches after every
branch from 0x132 FT_Glyph_To_Bitmap
to stop at the exact
same place in libfreetype
control flow, so we only hash a
somewhat arbitrary prefix.)
import sys
import hashlib
def show(buf):
= ' '.join(buf)
s print(hashlib.sha256(s[:1000].encode('utf-8')).hexdigest()[:6] + ' ' + s)
= []
line_buf
for line in sys.stdin:
if "branches:" in line:
= line.split("branches:")
_, rest if "132 FT_Glyph_To_Bitmap" in rest and line_buf:
show(line_buf)= []
line_buf if rest.count("libfreetype") >= 2:
*_ = rest.split()
tok1, tok2, -3:] + ''.join(c for c in tok2 if c.isupper()))
line_buf.append(tok1[
if line_buf:
show(line_buf)
After this point I didn’t bother scripting the extraction of the
flag, and just spent a few minutes manually comparing the hashes
produced from the challenge perf.data
against the hashes
produced from the known plaintext perf.data
and notating
the flag’s characters one by one in Vim. This produces the flag:
flag{1df9e1d99ff7ea50bbe782492430b223}