TI-1337 Silver Edition

DiceCTF 2022 (Misc, 299 pts)

Last weekend Galhacktic Trendsetters sort of spontaneously decided to do DiceCTF 2022, months or years after most of us had done another CTF. It was a lot of fun and we placed 6th!

Back in the day the silver edition was the top of the line Texas Instruments calculator, but now the security is looking a little obsolete. Can you break it?

It’s yet another Python jail. We input a string and, after it makes it through a gauntlet of checks and processing, it gets exec’d.

More precisely, the gauntlet does the following:

  1. Check if our input is longer than 1337 characters; if so, error out. This is not a harsh limit.
  2. Compile our code and disassemble it using the dis module. If any instruction is_jump_target, which generally means any kind of control flow, error out. Delete any of the banned operations MAKE_FUNCTION, CALL_FUNCTION, CALL_FUNCTION_KW, and CALL_FUNCTION_EX from the assembled bytecode (this happens in two steps: first, by overwriting it with -1s, and second, by filtering those out near the end).
  3. Check our code for any names with double underscores, and replace any such names in the code object with $INVALID$, which basically guarantees your code won’t run.
  4. Execute the modified code object in an environment with a gift builtin and nothing else.

(The challenge also comes with a Dockerfile, which says that the flag is in a file with a random filename. This didn’t end up being particularly relevant though.)

I knew very little about Python bytecode and code objects before this challenge. Bytecode is also an internal and unstable feature of the interpreter, so there’s not much documentation. Still, the dis module has enough documentation of bytecode instructions to get us by, and some of the special attributes of code objects and such, which also appear in the challenge, are lightly documented in the inspect module. Finally, it’s not too hard to figure out some of the details with a little experimentation and object introspection in a Python interpreter. Let’s do that.

Diving into Python code objects

This goes into a bit more depth than necessary, but I thought it was interesting, and I couldn’t find a tutorial on Python code objects I thought was a good fit for our use case in ten minutes of searching, so I decided to write it up.

When you compile some Python source code via compile, you get a code object. As a trivial example, you could open an interpreter and run the following to get a code object co:

The second parameter to compile is a filename and isn’t important. The third parameter is a mode, either "eval" or "exec".

We can evaluate co:

(We could exec it too, but the return value doesn’t come out of exec, so we can’t see our code object in action. I’ll be using the eval mode below in a few examples, but most of the ideas are the same between the two modes.)

We can get a somewhat human-readable representation of this code object’s instructions with dis.get_instructions:

Note that you can call dis.dis for a prettier, less verbose printout (and also most of the dis functions are pretty smart and work with raw source code strings just as well as compiled code objects):

However, I found the verbose printout more educational and exclusively used it while doing the challenge, so I’ll stick with it here too.

Python’s bytecode is for a stack-based virtual machine of sorts. This example just has two instructions: the first pushes 3 into the stack, and the second returns the top value of the stack. Our code was so simple that the addition was optimized away, so this example wasn’t that illustrative. However, one thing to know already is that the bytecode instruction doesn’t itself directly contain the constant 3. Instead, the code object separately stores a tuple of all constants used by the code, and the bytecode just contains a reference to it. The list of constants used by a code object is in the attribute co_consts:

Though we cannot directly modify constants on a code object, we can use the method replace to get a new code object that’s identical except for the list of constants.

Disassembling this new code object shows what appear to be different instructions, which have the same opcodes but different argvals:

However, the two code objects have exactly the same bytecode.

The details of how bytecode is structured is fairly unimportant, but if you’re curious, as of Python 3.6, all bytecode instructions are two bytes: one byte for the opcode and one byte for the argument. Many opcodes ignore their argument, as in the example above with RETURN_VALUE. Opcodes can also take multi-byte arguments, up to four, by prepending additional “instructions” that have a special opcode, EXTENDED_ARG.

So if we look at the numeric values of each byte in this code object’s bytecode, we can read off the opcodes and arguments:

The bytecode has two instructions: opcode 100 with argument 0, followed by opcode 83 with argument 0. This matches the opcode and arg fields of the Instructions we got from iterating over dis.get_instructions(co). The argvals differ because the co_consts associated with the code objects differ, not because the bytecodes differ.

Let’s try the same thing for the code x + y:

This code object has no associated constants. However, co.co_names is a tuple of variable names that the code uses: ('x', 'y'). The LOAD_NAME opcodes take arguments that reference variable names in co_names, just like the LOAD_CONST opcodes earlier take arguments that reference constants in co_consts.

It’s worth noting that the same tuple is used for things like attribute accesses and method calls:

This example is also remarkable in another way: the opcode we’re using to call x.y.z is CALL_METHOD, which is not any of the banned opcodes!

A more natural way to discover this fact is probably by scrolling through the section of the dis module documentation on Python bytecode instructions and looking for names that contain the word “CALL”. This will be very important later.

Let’s try something a bit more adventurous and compile a lambda expression, lambda x, y: x + y:

I was a bit surprised to see an empty co_names and no BINARY_ADD opcode anywhere, but the pretty-printed argument values for the opcodes quickly reveal what’s going on. The MAKE_FUNCTION opcode takes two arguments on the stack, a code object and a function name, and produces a function. That code object used by the function has already been compiled and placed as a constant attached to the outer code object:

If we examine and disassemble that code object, we see the names and BINARY_ADD opcode we’re expecting:

Just kidding, the parameter names are in co_varnames instead of co_names.

As of time of writing, I don’t think the inspect module documentation is entirely correct here. The first hit for co_names is a StackOverflow question, which points to the source code docstring as a more accurate source. There is also already an issue in the bug tracker.

Anyway, if we try to compile a lambda that references names that aren’t arguments, they do appear in the inner code object’s co_names, and still don’t appear in the outer code object:

Reorienting to the goal

Now that we understand a bit more about Python bytecode and code objects, let’s think more about what we want to end up achieving. It’s well-known that, even if you pass an empty dictionary as builtins to exec or eval, it’s still insecure; some useful references include Ned Batchelder’s Eval really is dangeous and Gynvael Coldwind’s Python “sandbox” escape.

Unfortunately, all these jailbreaks start with member accesses along the lines of ().__class__.__bases__[0].__subclasses__(), which contain names with double underscores that the challenge replaces in the code object with $INVALID$, immediately breaking them. One alternate avenue that seemed promising to me at first was doing something with the func_globals attribute of gift (or some other function), as suggested in, for example, 2013 PlaidCTF writeup by wapiflapi; unfortunately, that attribute was renamed to __globals__ in Python 3.0, so it fails in the same way. Another idea is that it’s fine if strings contain double underscores, and the gift function lets us setattr once with string literals that could contain double underscores. However, it’s really unclear what attribute we can set that will help us make progress towards one of the above jailbreaks; we really want getattr. So it’s also not immediately apparent if this is useful. Overall, I was not able to find any direct jailbreaks of this form that didn’t use any names containing double underscores.

However, we saw from our earlier experimentation with code objects that, if we make a lambda and called it, names used by the lambda only appear in the co_names field of an inner code object, so the challenge won’t notice them. Even an expression like (lambda: ().__class__.__bases__[0].__subclasses__())() would sail through the name check. The same goes for any opcodes and control flow used by that lambda. So if we can figure out how to make and call a single lambda, we’re basically done with the challenge.

Of course, we can do neither of those things, at least not directly, because MAKE_FUNCTION and all the CALL_FUNCTION opcodes are banned. Let’s resolve the latter obstacle first.

Calling a function without calling a function

We already saw that CALL_METHOD is not banned, which means that if we stick our lambda as an attribute onto something else, we can call it via that attribute without using any banned opcodes. Most built-in Python objects do not let you freely set attributes on them, but fortunately, one type of built-in Python object for which this does work is functions themselves. In sum, we could make and call a lambda without any CALL_FUNCTION opcodes with code like so:

This disassembles to:

Instruction(opname='LOAD_CONST', opcode=100, arg=0, argval=<code object <lambda> at 0x7fd18bc0e450, file "<math>", line 1>, argrepr='<code object <lambda> at 0x7fd18bc0e450, file "<math>", line 1>', offset=0, starts_line=1, is_jump_target=False)
Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval='<lambda>', argrepr="'<lambda>'", offset=2, starts_line=None, is_jump_target=False)
Instruction(opname='MAKE_FUNCTION', opcode=132, arg=0, argval=0, argrepr='', offset=4, starts_line=None, is_jump_target=False)
Instruction(opname='STORE_NAME', opcode=90, arg=0, argval='x', argrepr='x', offset=6, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_NAME', opcode=101, arg=0, argval='x', argrepr='x', offset=8, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_NAME', opcode=101, arg=0, argval='x', argrepr='x', offset=10, starts_line=None, is_jump_target=False)
Instruction(opname='STORE_ATTR', opcode=95, arg=1, argval='f', argrepr='f', offset=12, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_NAME', opcode=101, arg=0, argval='x', argrepr='x', offset=14, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_METHOD', opcode=160, arg=1, argval='f', argrepr='f', offset=16, starts_line=None, is_jump_target=False)
Instruction(opname='CALL_METHOD', opcode=161, arg=0, argval=0, argrepr='', offset=18, starts_line=None, is_jump_target=False)
Instruction(opname='POP_TOP', opcode=1, arg=None, argval=None, argrepr='', offset=20, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_CONST', opcode=100, arg=2, argval=None, argrepr='None', offset=22, starts_line=None, is_jump_target=False)
Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=24, starts_line=None, is_jump_target=False)

This sequence of opcodes would be the same if I replaced the trivial lambda with any other lambda, and although it contains MAKE_FUNCTION, it doesn’t contain any of the banned CALL opcodes. Also note that we can use this strategy to call the gift function provided to us by the challenge code. We can verify that this strategy works by inputting code such as the following:

This causes the server to respond with x = bar, as expected. This makes sense because the instructions corresponding to this code do not contain any banned opcodes.

Instruction(opname='LOAD_NAME', opcode=101, arg=0, argval='gift', argrepr='gift', offset=0, starts_line=1, is_jump_target=False)
Instruction(opname='LOAD_NAME', opcode=101, arg=0, argval='gift', argrepr='gift', offset=2, starts_line=None, is_jump_target=False)
Instruction(opname='STORE_ATTR', opcode=95, arg=1, argval='f', argrepr='f', offset=4, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_NAME', opcode=101, arg=0, argval='gift', argrepr='gift', offset=6, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_METHOD', opcode=160, arg=1, argval='f', argrepr='f', offset=8, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_NAME', opcode=101, arg=0, argval='gift', argrepr='gift', offset=10, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_CONST', opcode=100, arg=0, argval='foo', argrepr="'foo'", offset=12, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval='bar', argrepr="'bar'", offset=14, starts_line=None, is_jump_target=False)
Instruction(opname='CALL_METHOD', opcode=161, arg=3, argval=3, argrepr='', offset=16, starts_line=None, is_jump_target=False)
Instruction(opname='POP_TOP', opcode=1, arg=None, argval=None, argrepr='', offset=18, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_NAME', opcode=101, arg=0, argval='gift', argrepr='gift', offset=20, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_ATTR', opcode=106, arg=2, argval='foo', argrepr='foo', offset=22, starts_line=None, is_jump_target=False)
Instruction(opname='STORE_NAME', opcode=90, arg=3, argval='x', argrepr='x', offset=24, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_CONST', opcode=100, arg=2, argval=None, argrepr='None', offset=26, starts_line=None, is_jump_target=False)
Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=28, starts_line=None, is_jump_target=False)

The bigger obstacle is still the lack of MAKE_FUNCTION opcodes, which we’ll think about next.

Making a function without making a function

First, note that the MAKE_FUNCTION opcode wasn’t “banned” in the same way that control flow was banned. When the challenge code sees a MAKE_FUNCTION opcode, it doesn’t bail out, it deletes that instruction from the bytecode and carries on. Imagine what would happen in the above code if this happened: the STORE_NAME instruction would store the function name "<lambda>" instead of the lambda into x, and the bytecode would limp along afterwards with an extra code object floating on the stack. If we input:

The server responds:

To be absolutely clear, this is literally the string "<lambda>" and not the string representation of an actual lambda, which whould look something like <function <lambda> at 0x7fe204c37310> instead.

With a little setup, we could easily imagine putting that code object into a variable and then doing other things with it. My experience with this step might have been unusual (compare e.g. the Organizers’ writeup1); but honestly, I didn’t think too hard about this; the first thing I tried worked:

Intuitively, I expected that when this code functioned normally, it would get 0 and a lambda onto the stack and then put the top two elements of the stack into x and y. However, if the lambda never gets made then its “ingredients” would be on top of the stack and would go into x and y instead. And it does seem to do that. The server’s response to this code is:

Once you get this result, it’s not really important to understand why, since you can just use x in the rest of the exploit. But for the curious, let’s disassemble it and take a closer look:

We can tabulate what happens when each instruction is executed. The parentheses indicate what constant from co_consts or name from co_names are referenced by the argument. The stack is listed from bottom to top.

Instruction Effect Stack
LOAD_CONST 0 (0) Push 0 onto stack 0
LOAD_CONST 1 (code object) Push code object 0, code object
LOAD_CONST 2 ("<lambda>") Push string "<lambda>" 0, code object, "<lambda>"
MAKE_FUNCTION Pop code object and name; push lambda function 0, lambda
ROT_TWO Swap 0 and lambda on stack lambda, 0
STORE_NAME 0 (x) Put 0 into x lambda
STORE_NAME 1 (y) Put lambda into y (empty)
LOAD_CONST 3 (None) Push None onto stack None
RETURN_VALUE Return None from this code (empty)

Two asides here:

  1. Why does this stack juggling occur? Why doesn’t Python just do one of pushing/constructing the values and storing them into x and y in the opposite order? I didn’t investigate this in detail, but my guess is that it’s necessary in general to keep the execution order of expressions and assignments as all left-to-right as one might expect. It also sort of motivates why the same evaluation order of subexpressions is deliberately unspecified in C/C++ and similar languages.
  2. Also note that the fact that all this code compiles to pushing two things onto the stack and then directly popping them into two variables is not actually that intuitive if you think about it more. An expression like a, b = x, y is “supposed to” make a tuple (x, y) and then unpack the tuple into the variables a and b, since an assignment like a = x, y works and puts a tuple into a. If you disassemble a = x, y, you’ll see a BUILD_TUPLE instruction; conversely, if you disassemble a, b = x, you’ll see an UNPACK_SEQUENCE instruction. Fortunately for my code, Python optimized out the tuple packing and unpacking, just like it optimized out 1 + 2 = 3 in the first example we looked at.

If we delete the MAKE_FUNCTION instruction and let the rest of the opcodes do what they do, the effects become:

Instruction Effect Stack
LOAD_CONST 0 (0) Push 0 onto stack 0
LOAD_CONST 1 (code object) Push code object 0, code object
LOAD_CONST 2 ("<lambda>") Push string "<lambda>" 0, code object, "<lambda>"
ROT_TWO Swap code object and lambda on stack 0, "<lambda>", code object
STORE_NAME 0 (x) Put code object into x 0, "<lambda>"
STORE_NAME 1 (y) Put "<lambda>" into y 0
LOAD_CONST 3 (None) Push None onto stack 0, None
RETURN_VALUE Return None from this code 0

In any case, we now know how to get a code object of our design into a variable. Unfortunately, but unsurprisingly, we can’t directly call code objects as if they’re functions. If we try

we get this error:

If you want a code object to run, you can’t run it directly; you have to call a function that has a code object inside it. Can we make that happen? Wait, how exactly is the code object run by a function associated with it?

We can discover this (yet again) the inspect module docs, or (yet again) StackOverflow:

functions have the function attribute __code__ (also known as func_code in older versions) from which you can obtain the function’s code object

The million-dollar question now is: is this attribute writable? Let’s run a trivial example to check:

It is, which means we do have a way to get our custom, exploit-containing code object to execute. If we set the __code__ attribute of an existing function to our code object, and then call that function, we can get our code object to run! And it turns out that gift is both an exiting object and the exact tool we need to set a __code__ attribute.

The final exploit

The exploit is even easier than I expected at first: it’s possible to call built-ins in the body of the lambda/code object we’re making, so we can just __import__ what we want instead of having to hunt for a subclass of object. This makes sense because we’re sort of replacing the code in the body of gift with our own code and then running it, and it makes sense that we would be able to call __import__ from the body of gift, which doesn’t execute in the “jail”. So we can just get the flag with an exploit like:

This pops a shell, so we cd .. and cat flag* and we’re done!

dice{i_sh0uldve_upgr4ded_to_th3_color_edit10n}

  1. The team is called Organizers; they are not the (lowercase-o) organizers of DiceCTF.

(note: the commenting setup here is experimental and I may not check my comments often; if you want to tell me something instead of the world, email me!)