Posts

One Step Forward to Tomorrow

2022-12-31 (1074 words) filed under Life

One day I’m going to run out of the energy to find barely adequate allusions for the titles and thematic music videos for the openings of these end-of-year posts, and they’ll just be called “2095 in Review” or whatever. Or maybe I’ll just stop making them. But not today.

Good song. Good animation. Incredibly out of place on its YouTube channel, in the most inspiring chaotic good way.

I closed out last year by saying that I wanted to accomplish a “big milestone” this year. I actually had a specific milestone in mind that I did not actually achieve and will not reveal, but I made good progress towards it, and a lot of other things happened, enough that I think I’ll count that as achieved.

The big thing is that I left my job at Zoom to have some time for myself and family… though not before helping to give feedback on a draft internet standard, publish a cryptography research paper (on which I’m the “first author”, strictly due to the vagaries of the English alphabet), and launch end-to-end encrypted email. It was a productive year! I feel like I should have more to say about all this, but it’s hard to think of anything that I didn’t already write about last year and also doesn’t require a blockbuster-length list of prerequisites. However, if you ever want to hear about the difficulties of actually getting end-to-end encryption into production in excruciating detail, invite me to a cocktail party with a lot of whiteboards.

Introduction to Code Golf and Golflangs

2022-12-22 (9290 words) filed under CS

Code golf is the recreational activity¹ of trying to write programs that are as short as possible.² Golfed programs still have to be correct, but brevity is prioritized above every other concern — e.g., robustness, performance, or legibility — which usually leads to really interesting code.

I think code golf is a lot of fun (although I think a lot of things are fun, so it’s one of those hobbies that I get really into roughly one month every year and then completely forget about for the remaining eleven). I wanted to write an introduction because I don’t know of any good general introductions to code golf, particularly ones that try to be language-agnostic and that cover the fascinating world of programming languages designed specifically for code golf, which I’ll call golflangs for short. But more on that later.

Note: If you are the kind of person who prefers to just dive in and try golfing some code without guidance, you should skip to the code golf sites section.

A simple example

Of course, there’s a reason most code golf tutorials focus on a single language: most code golf techniques are language-specific. The Code Golf & Coding Challenges (CGCC) StackExchange community has a list of some golfing tips that apply to most languages, but there are far more tricks in just about any language-specific list, and most of the intrigue lies in knowing the language you’re golfing well. So to provide a taste of the code golf experience, let’s golf a simple problem, Anarchy Golf’s Factorial, in Python.

In this problem, we have to read a series of positive integers from standard input, one per line, and output the factorial of each, also one per line. Here’s a stab at a simple, direct implementation with no golfing at all:³

def factorial(n):
    if n <= 1: return 1
    return n * factorial(n-1)
try:
    while True:
        print(factorial(int(input())))
except EOFError:
    pass

Flexbox Fun Facts

2022-09-23 (1458 words) filed under CS

This post is brought to you by “I am procrastinating other stuff by doing some long overdue maintenance on my blog”. Mainly, I finally replaced the old float-based layout from the random Hugo theme I forked, which I had been keeping just because it wasn’t broken, with flexbox, so that I could more easily tweak some other things. If things look broken, you may need to force-refresh or clear your cache, and on the off chance things look mostly the same but you feel like something about the layout feels subtly different, that’s what’s up.

While making these changes, I ended up digging through the flexbox spec to debug an issue and learned some interesting things. (This and other links in this post are permalinks to the November 2018 spec, which I believe is the most recent official version as of time of writing, but it’s nearly three years and there have been quite a few changes in the “editor’s draft”. Also, this post is not a flexbox tutorial and will not make sense if you are already familiar with flexbox.)

🅿️🅿️🅿️ordle

PlaidCTF 2022 (350 points)

2022-07-04 (1890 words) filed under CS

Don’t you hate it when CTFs happen faster than you can write them up? This is probably the only PlaidCTF challenge I get to, unfortunately.¹

Web is out, retro is in. Play your favorite word game from the comfort of your terminal!

It’s a terminal Wordle client!

Screenshot of a terminal Wordle client. The puzzle has been solved with the answer COZEY.

I only solved the first half of this challenge. The two halves seem to be unrelated though. (Nobody solved the second half during the CTF.) The challenge was quite big code-wise, with more than a dozen files, so it’s hard to replicate the experience in a post like this, but here’s an attempt.

Mask

ångstromCTF 2022 (200 points)

2022-07-04 (2519 words) filed under CS

Don’t forget to wear your mask…

nc challs.actf.co 31501

If I had a nickel for every CTF challenge I’ve done that involves understanding the internal structure of a QR code, I would have two nickels. Which isn’t a lot, etc etc. That previous challenge probably helped me get first blood on this.

The source code is wonderfully short:

import io, qrcode, string

flag_contents = [REDACTED]
assert all(i in string.ascii_lowercase + '_' for i in flag_contents)

flag = b"actf{" + flag_contents.encode() + b"}"
print("flag is %d characters" % len(flag))

qr = qrcode.QRCode(version=1, error_correction = qrcode.constants.ERROR_CORRECT_L, box_size=1, border=0)
while 1:
    try:
        inp = bytes.fromhex(input("give input (in hex): "))
        assert len(inp) == len(flag)
    except:
        print("bad input, exiting")
        break
    qr.clear()
    qr.add_data(bytes([i^j for i,j in zip(inp, flag)]))
    f = io.StringIO()
    qr.print_ascii(out=f)
    f.seek(0)
    print('\n'.join(i[:11] for i in f))

Kevin Higgs

ångstromCTF 2022 (210 points)

2022-07-04 (1366 words) filed under CS

Now that kmh is gone, clam’s been going through pickle withdrawal. To help him cope, he wrote his own pickle pyjail. It’s nothing like kmh’s, but maybe it’s enough.

Language jails are rapidly becoming one of my CTF areas of expertise. Not sure how I feel about that.

#!/usr/local/bin/python3

import pickle
import io
import sys

module = type(__builtins__)
empty = module("empty")
empty.empty = empty
sys.modules["empty"] = empty


class SafeUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if module == "empty" and name.count(".") <= 1:
            return super().find_class(module, name)
        raise pickle.UnpicklingError("e-legal")


lepickle = bytes.fromhex(input("Enter hex-encoded pickle: "))
if len(lepickle) > 400:
    print("your pickle is too large for my taste >:(")
else:
    SafeUnpickler(io.BytesIO(lepickle)).load()

pickle is a Python object serialization format. As the docs page loudly proclaims, it is not secure. Roughly the simplest possible code to pop a shell (adapted from David Hamann, who constructs a more realistic RCE) looks like:

CaaSio PSE

ångstromCTF 2022 (250 points)

2022-07-04 (773 words) filed under CS

It’s clam’s newest javascript Calculator-as-a-Service: the CaaSio Please Stop Edition! no but actually please stop I hate jsjails js isn’t a good language stop putting one in every ctf I don’t want to look at another jsjail because if I do I might vomit from how much I hate js and js quirks aren’t even cool or funny or quirky they’re just painful because why would you design a language like this ahhhhhhhhhhhhhhhhhhhhh

It’s just a JavaScript eval jail.

#!/usr/local/bin/node

// flag in ./flag.txt

const vm = require("vm");
const readline = require("readline");

const interface = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
});

interface.question(
    "Welcome to CaaSio: Please Stop Edition! Enter your calculation:\n",
    function (input) {
        interface.close();
        if (
            input.length < 215 &&
            /^[\x20-\x7e]+$/.test(input) &&
            !/[.\[\]{}\s;`'"\\_<>?:]/.test(input) &&
            !input.toLowerCase().includes("import")
        ) {
            try {
                const val = vm.runInNewContext(input, {});
                console.log("Result:");
                console.log(val);
                console.log(
                    "See, isn't the calculator so much nicer when you're not trying to hack it?"
                );
            } catch (e) {
                console.log("your tried");
            }
        } else {
            console.log(
                "Third time really is the charm! I've finally created an unhackable system!"
            );
        }
    }
);

Interpreting Some Toy Neural Networks

2022-05-05 (9848 words) filed under CS

I participated in the AGI Safety Fundamentals program recently. The program concludes with a flexible final project, with the default suggestion of “a piece of writing, roughly the length and scope of a typical blog post”, so naturally, I deleted all but the last two words and here we are.

When I previously considered machine learning as a field of study, I came away with an impression that most effort and computation power was going into training bigger, more powerful models; whereas the inner workings of the models themselves, not to mention questions like why certain architectures or design choices work better than others, remained inscrutable and understudied. This impression always bothered me, and it definitely influenced me away from going into AI as a career. Of course, there are important, objective safety concerns around developing and designing models we don’t understand, many of which we discussed in the program; but my discomfort is mostly a completely unrelated nagging feeling I get whenever I’m relying on things I don’t understand.

After the program and all the concurrent developments in AI (including AlphaCode, OpenAI’s math olympiad solver ¹, SayCan, and, of course, DALL-E 2), I still had this impression about the field at a very high level, but I also became more familiar with the subfield of interpretability — designs and tools that allow us to understand and explain decisions by ML systems, rather than treating them as black-boxed mappings from inputs to outputs — and confirmed that enough people study it to make it a thing. One quote from a post on the views of Chris Olah, noted interpretability researcher, captured my feeling particularly eloquently:

interpretability is very aligned with traditional scientific virtues—which can be quite motivating for many people—even if it isn’t very aligned with the present paradigm of machine learning.

I found the whole post insightful, and it happens that the bits before that in the passage were also relevant to me. I don’t have access to lots of compute!

Inspired by that post and by a desire to actually write some code (which I figured might help me understand the inner workings of modern ML systems in a different sense), and after abandoning a few other project ideas that were far too ambitious, I decided to go through some parts of the fast.ai tutorial and riff on it to see how much progress I could make interpreting the models, and to write up the process in a blog post. I tried to capture my experience holistically, bugs and all, to serve as a data point for what it might feel like to start ML engineering (for the rare individuals with a background and inclinations just like mine²), and maybe entertain more experienced practitioners or influence their future tutorial recommendations. A much lower-priority goal was trying to produce “my version of the tutorial”, which would draw more liberally from an undergraduate math education³ and dive more deeply into technical details.

TI-1337 Silver Edition

DiceCTF 2022 (Misc, 299 pts)

2022-02-14 (3740 words) filed under CS

Last weekend Galhacktic Trendsetters sort of spontaneously decided to do DiceCTF 2022, months or years after most of us had done another CTF. It was a lot of fun and we placed 6th!

Back in the day the silver edition was the top of the line Texas Instruments calculator, but now the security is looking a little obsolete. Can you break it?

It’s yet another Python jail. We input a string and, after it makes it through a gauntlet of checks and processing, it gets exec’d.

#!/usr/bin/env python3
import dis
import sys

banned = ["MAKE_FUNCTION", "CALL_FUNCTION", "CALL_FUNCTION_KW", "CALL_FUNCTION_EX"]

used_gift = False

def gift(target, name, value):
    global used_gift
    if used_gift: sys.exit(1)
    used_gift = True
    setattr(target, name, value)

print("Welcome to the TI-1337 Silver Edition. Enter your calculations below:")

math = input("> ")
if len(math) > 1337:
    print("Nobody needs that much math!")
    sys.exit(1)
code = compile(math, "<math>", "exec")

bytecode = list(code.co_code)
instructions = list(dis.get_instructions(code))
for i, inst in enumerate(instructions):
    if inst.is_jump_target:
        print("Math doesn't need control flow!")
        sys.exit(1)
    nextoffset = instructions[i+1].offset if i+1 < len(instructions) else len(bytecode)
    if inst.opname in banned:
        bytecode[inst.offset:instructions[i+1].offset] = [-1]*(instructions[i+1].offset-inst.offset)

names = list(code.co_names)
for i, name in enumerate(code.co_names):
    if "__" in name: names[i] = "$INVALID$"

code = code.replace(co_code=bytes(b for b in bytecode if b >= 0), co_names=tuple(names), co_stacksize=2**20)
v = {}
exec(code, {"__builtins__": {"gift": gift}}, v)
if v: print("\n".join(f"{name} = {val}" for name, val in v.items()))
else: print("No results stored.")

More precisely, the gauntlet does the following:

blazingfast

DiceCTF 2022 (Web, 140 pts)

2022-02-14 (1404 words) filed under CS

Last weekend Galhacktic Trendsetters sort of spontaneously decided to do DiceCTF 2022, months or years after most of us had done another CTF. It was a lot of fun and we placed 6th!

I made a blazing fast MoCkInG CaSe converter!

blazingfast.mc.ax

We’re presented with a website that converts text to AlTeRnAtInG CaSe. The core converter is written in WASM, and also checks that its input doesn’t have any of the characters <>&". The JavaScript wrapper takes an input from the URL, converts it to uppercase, feeds it to the converter, and if the check passes, injects the output into an innerHTML. The goal is to compose a URL that, when visited by an admin bot, leaks the flag from localStorage.

The converter is compiled from this C code:

int length, ptr = 0;
char buf[1000];

void init(int size) {
    length = size;
    ptr = 0;
}

char read() {
    return buf[ptr++];
}

void write(char c) {
    buf[ptr++] = c;
}

int mock() {
    for (int i = 0; i < length; i ++) {
        if (i % 2 == 1 && buf[i] >= 65 && buf[i] <= 90) {
            buf[i] += 32;
        }

        if (buf[i] == '<' || buf[i] == '>' || buf[i] == '&' || buf[i] == '"') {
            return 1;
        }
    }

    ptr = 0;

    return 0;
}