Everything

SQL Selects the Hard FP Way

This post is motivated by reasons very similar to the ones that motivated my React and Redux “tutorial”. Again, it should be more accurately but less informatively titled “How I wish SQL SELECTs were explained to me”. Again, it does not imply that this method of explanation is suitable for anybody else. One difference is that this time, I mostly only wanted to learn about SQL SELECTs to the extent it would help me perform and optimize queries in Django’s ORM, but to prevent this post from languishing forever in my drafts folder, that material has been sectioned off into a possible future post, because I figured out what I wanted, ran out of steam, and am now trying to learn TLA⁺. Just me things.

Background

The SQL standard is confusing and almost never completely implemented; there are huge inconsistencies between SQL implementations. I will focus on SQLite because it’s popular and easy to play with, but generally try to stay away from unpopular or nonstandard features. SQLite’s SELECT documentation is good reading for one particular SQL implementation.

A SQL database is a place where you store and query a bunch of data that’s organized into tables. A table is a homogeneous list of rows. A row is a heterogeneous tuple of values of various simple data types. The data types supported depend on the SQL implementation; typical examples are integers and strings of various sizes, floating point numbers, and dates/datetimes. All of these types can be nullable; NULL is a SQL value that can appear just about anywhere. (Like many of the other SQL features, NULL is handled somewhat inconsistently across SQL implementations, but as a first-order approximation it’s closer to a floating-point NaN than, say, Java’s “null”. We’ll talk more about it later.) However, note that you can’t have a variable-size list of other things in a row. And just to make sure it’s clear, all the rows in a given table must have the same data types in the same order.

A “column” is just what you’d intuitively expect it to be: it’s the homogeneous list of all values in a particular position in each row of a table, which all have the same data type. One thing I haven’t mentioned yet is that table columns all have names. This is true both for tables stored in the database and for the ephemeral tables that are the output of queries.

Since I’ll also be referring to more complex types like lists and tuples often seen in conventional programming languages, I’ll call these simple data types “scalar types” and values of those types “scalars”. This is not SQL terminology; documentation usually just calls these “data types”. Here’s SQLite’s page on data types.

To play along, install SQLite and run it. You should get dropped into a connection to an ephemeral in-memory database, which is plenty enough for our purposes. Make a table and mutter some magic incantations to make the output a little prettier for us:

Blogging Advice For People Exactly Like Me

Did you know that it’s harder to become an MIT admissions blogger as an MIT student than it is to get into MIT as an applicant? It was true my year, in which 18,306 students applied and 1,467 were admitted (8.0%),1 whereas 69 students applied for 5 blogging spots (7.2%).2 Anecdotally it might also be true for future years.

I was among the 92.8% who got rejected in the latter process. Although I obviously would have preferred things go the other way, I can’t say I was surprised, firstly because, objectively, the odds were against me (as they were for every other individual applicant); secondly because, to the extent I can make educated guesses about the criteria the folks at MIT Admissions would have chosen bloggers by, I would have been close to the worst possible candidate;3 thirdly because my application probably wasn’t very good.4

I didn’t dwell on it; I just thought to myself some vague consoling thoughts and moved on. No matter what I missed out on, at least I retained complete freedom: to choose what to write about, when to post it, and how to format and typeset it, down to the very last box-shadow. Right? But, although I mostly successfully avoided thinking about it, there really was a lot to like about being an admissions blogger! I liked writing — or perhaps, I liked being a person who has written a lot more, and having a commitment to blog regularly would be a way to force myself to become that person. I liked the idea of getting to share things with thousands of readers, or less euphemistically I liked the thought of being, if ever so slightly, famous.5 I liked the idea of having a sketched portrait and being part of official events with “Blogger” in the title and all that jazz. Collectively these things just felt cool.

The thing is, though, that there were things I could do to try to get those things for myself, and I didn’t do them. I know how to force myself to blog regularly, which is just by announcing publicly to nobody in particular that I’ll blog regularly (it’s worked effectively at least twice). I know many places I could promote my blog and try to get more readers. I can buy a sketched portrait.6 It’s not that hard.

2021 MIT Mystery Hunt

Note: if you are viewing this shortly after it’s published and somehow don’t want to be spoiled on Mystery Hunt, make sure this spoiler formatting shows up: this text should be spoilered; if it doesn’t, try shift-refreshing. (There are Correct Ways to fix this, but I’m too lazy to do them. Sorry.)


Well, we did it.

✈✈✈ Galactic Trendsetters ✈✈✈ ran an MIT Mystery Hunt.

Somehow, it slipped my mind until trying to write the 2020 year-end post that I’d probably want to write a post on all this. In fact, there was one major project that I started in 2019, but that I didn’t think of mentioning in my 2019 year-end post because it wasn’t ready to be announced at that time, and that I almost forgot that I had never mentioned. But this post is a pretty good place to announce it.

Planning Mystery Hunt is a massive year-long endeavor. I didn’t have any leadership or otherwise high-responsibility roles, which made sense because I was busy writing a masters thesis for the first four months of Mystery Hunt planning. Because of that, and because I know many many other people on our team have written and will be writing blog posts (Rahul’s post, Nathan’s post, CJ’s post, maybe more to come?) I will focus on the things I did specifically. (So there is minimal discussion of the theme, overall organization, or big decisions like our COVID-19 response; I think the linked posts cover these well already.)

Puzzle-Writing Software

One part I did largely have “ownership” of, and that I sank a lot of time into, was maintaining our software for writing puzzles — the website where authors submitted puzzle ideas and drafts, testsolvers tested puzzles, and editors tracked and discussed the statuses of all the puzzles. This role was largely a continuation of me owning the same component for Galactic Puzzle Hunt since 2018, which itself grew out of the comparative advantage of having worked on it a little when writing with Random Fish for the 2015 MIT Mystery Hunt. There is some life lesson about specialization or pigeonholing to be learned here. But, to start at the beginning:

Puzzletron is a piece of PHP software used for organizing puzzle writing for puzzlehunts. The first commit on GitHub says it was imported from Metaphysical Plant in 2011; it’s likely older. There are active commits each year until January 2018, and AMA responses from Setec (2019) and Left Out (2020) mentioning it, so I believe most if not all Mystery Hunt writing teams used and improved Puzzletron and passed it down over those years.

…And I Feel Fine

It’s a strange and darkly funny story — Tim Minchin wrote the album this is from, Apart Together, well before the pandemic hit and social distancing became the norm, and I assume I am not alone in finding that the song resonates unusually strongly as a result.1 By Jove, it resonates.

Nobody needs me to say that it’s been a rough year. People have been complaining that each of the last few years were terrible, and looking forward to the next one, and being disappointed — as if years were coherent bundles of quality, and there was any reason to expect discontinuities in how things are going to occur around January 1st — and as if there were additionally any reason to expect such discontinuities, if they did exist, to be positive ones.

Seriously, do you remember when we thought 2015–2018 were bad?

And yet… I feel like overall, 2020 went quite a bit better than expectations for me. Which maybe means it’s astronomically better than the average person’s 2020. I had a long draft for this post that slowly accumulated words over the year as usual, but a lot of the ramblings I’d usually include now seem unusually vapid, and a lot of the deeper trends and experiences I might normally reflect on are things I don’t think I’ve really gone through or thought about for long enough to achieve closure on. This is partly due to the pandemic scrambling a lot of plans and partly because last January, nearly a full year ago, ✈✈✈ Galactic Trendsetters ✈✈✈ won Mystery Hunt and so we’re writing the 2021 hunt. The ramifications are still being felt and will accelerate until it actually happens two weeks from now, and that’s all I’ll say about it here.

Advent of Code: How to Leaderboard

Advent of Code (briefly, “AoC”) is a series of 25 festive programming puzzles1 released daily December 1–25. Each puzzle has two parts, which use the same text input and are related; to solve a part, you submit the right output corresponding to the input on the website.

If you’re reading this, I suspect there’s a good chance you knew that already, but in case you’re new to Advent of Code, let me try to briefly explain why I like Advent of Code, from the perspective of somebody who’s spent a lot of their life so far doing programming competitions.2

  • The event has a fantastic community surrounding it. I’m the most familiar with the subreddit, which is full of helpful people, interesting discussions, non-programming community games, and the occasional wonderfully, spectacularly overengineered solution to a puzzle; but I know there are also many smaller chatrooms and subcommunities focused on, say, specific timezones or programming languages.
  • Another aspect is the unique two-part format of each puzzle. Even though they use the same input, you don’t get to see the second part until after you’ve solved the first one, a feature that Eric Wastl (AoC’s creator) has taken full advantage of in designing puzzles. The second part is often a surprising twist on the first part, which keeps you on your toes and challenges you to keep your code moderately general or refactorable in a way that I think almost no other programming challenges do. This sometimes even happens between days in a calendar, when a puzzle turns out to be about some model of computation you implemented two or five or ten days ago — hope you kept your code and remember how it works!
  • Finally, AoC also has some non-rigorous puzzles that force you to use your intuition and “human intelligence”, either by interpreting the problem statement heuristically or writing code to let you explore the input. There are quite a few puzzles where it’s infeasible to write code that handles every step of obtaining the output from the input. The result is that Advent of Code can feature quite a few challenges that I’ve found particularly compelling because I think they simply could not be posed on any other contest platform.3

These are the things that make AoC stand out to me, but it also does a lot of other things well — the challenges are fun, approachable, and varied even aside from their interrelations; there is a long, dramatic story tying everything together (although it’s an Excuse Plot if there ever was such a thing); and, although this is obviously subjective, I find the website’s minimalist-adjacent, terminal-esque aesthetic really charming (there is a lot of detail in 2019’s calendar… after you solve everything). I’ve only done the last two years of Advent of Code, but it really seems like a one-of-a-kind event to me.

Anyway, one particular feature Advent of Code has is a leaderboard, which you can get on by being one of the first 100 people worldwide to solve each puzzle. The competition is fierce — every year, thousands of people compete to get on the leaderboard. Near the start of AoC 2019, mcpower (reddit discussion) and Kevin Yap (reddit discussion) wrote some articles about how to do this, both of which are worth reading. I also thought about writing such an article and started a draft, but I didn’t get it anywhere close to publishable before AoC had concluded, at which point I assumed few people would be interested. But here it is now.

C++ Rvalue References: The Unnecessarily Detailed Guide

Move semantics, perfect forwarding, and... everything else

By a strange quirk of fate, I have started writing C++ for a living.

Learning C++ was about as complicated as I think I expected it to be. By line count, I’ve written a lot of C++ for programming competitions, but I knew that I had only ever used a small cross-section of the language: basic control flow and variables, STL containers and algorithms, structs on which you mechanically define bool operator<(const T& other) const so STL algorithms can order them, and the very occasional macro or templated helper function. There were many features I wasn’t even aware existed.

In the process of learning C++ professionally, one rabbit hole I fell into quickly was C++11’s defining feature, the rvalue reference, and how it can be used to implement move semantics and perfect forwarding. By poring over a copy of the widely recommended book Effective Modern C++, by Scott Meyers, and a few dozen StackOverflow answers and blog posts, I roughly understood it after a few days, but still had a sort of blind-men-feeling-the-elephant feeling. I was confused about what lay under some of the abstractions I had been using, unsure of the full shape of the pitfalls that some of the guides had pointed out to me, and generally uncomfortable that there were still many small variations of the code I had seen that I couldn’t predict the behavior of. It took many more days to work myself out of there, and I wished I had had a guide that explained rvalue references and their applications to a bit more depth than what might be necessary for day-to-day use. So here’s my attempt to explain rvalue references in my own fundamental I-want-to-know-how-things-work-no-really style.

(If this vision doesn’t resonate with you, there are many other posts explaining rvalue references out there that you might prefer. Feel free to just skim the executive summary and/or check out some of the linked articles in the Background section.)

Executive Summary

I got… pretty carried away when writing this post, and a lot of it is just for my own understanding, which may or may not be useful to readers. Here’s a much more concise rundown (assuming you know basic C++ already):

45/101

Not a very completionist run. I graded myself pretty strictly though — both sides of every “and” need to count; “all” means literally all; fuzzy actions and phrases require full psychological commitment to qualify.

"101 Thiings To Do Before You Graduate" poster with 45 things crossed off

Signed Modulo

One thing many mathematically-inclined programmers run into when implementing mathematical algorithms, particularly number-theoretic ones, is that the modulo operation doesn’t behave how they expect or prefer.

In many languages, this operator is denoted %. Concretely, one might prefer that, if the second argument is positive, then the modulo operation would always give a nonnegative result. Under this behavior, the expression (-5) % 3 would evaluate to 1 rather than -2. This is a lot more useful for number theory because then for positive integers n, the % n operation actually maps integers to exactly the n canonical representatives for the residue classes. As a result, \(a \equiv b \mod n\) if and only if a % n == b % n. You can also do things like index into a length-n array with a % n and know that the index will be in-bounds. Finally, there are also optimization opportunities: modding by a power of 2 becomes equivalent to a simple bitwise AND, which is really fast on modern computers.

A few programming languages, notably Python, do implement % this way. However, the majority of languages today, including pretty much everything remotely descended from C, do not; instead, (-5) % 3 is -2. This post attempts to track down why.


The first thing to note is that there is a more important number-theoretic identity we’d like to have:

\[\texttt{a} = (\texttt{a / b}) \cdot \texttt{b} + (\texttt{a \% b}). \tag{1}\]

In words, the integer division and modulo operators should give a breakdown of a into a sum of some copies of b plus a remainder. Note that this equation also implies that specifying the rounding behavior of division is equivalent to specifying the sign behavior of the modulo operation, which will come up repeatedly later.

It’s also very uncontroversial that that remainder should have no copies of b, positive or negative, left over, which gives the constraint:

\[|\texttt{a \% b}| < |\texttt{b}|. \tag{2}\]

Every programming language I can think of satisfies these two constraints.1 So far so good. However, these two constraints don’t uniquely determine the values of a % b when a isn’t divisible by b; there are two possible values for a % b, one positive and one negative. Concretely, we could express \(-5\) as either \((-1) \cdot 3 + (-2)\) or \((-2) \cdot 3 + 1\), so (-5) % 3 could be -2 or 1.

It’s still mostly uncontroversial that, if a and b are both positive, then a % b should be nonnegative as well; we could call this constraint (3).2 However, if a is negative and b is positive, programming languages start to diverge in their behavior. Why?

2020 MIT Mystery Hunt

Fifth year with ✈✈✈ Galactic Trendsetters ✈✈✈ (previously: 2019, 2018, 2017, 2016; writing with Random in 2015).

We won!

But apparently I’m still really busy, so I’ll probably just focus on a few highlights of things I personally experienced and get this post out the door.

The theme in a few sentences: As suggested by the invitation sent out to teams, there was a wedding. It turned out to actually be a real wedding between two prolific puzzle writers on Left Out, the organizing team. To the surprise of everybody who expected either one, three, or maybe five subversions of this announcement, the wedding actually succeeded (a COIN that would prevent all love in the universe was briefly reported missing, but then immediately found by the bride). Instead, we joined the newlyweds on a trip to an amusement park called Penny Park, which we learned was struggling and closing this weekend. It was up to help its regain its popularity and stay open by solving puzzles. The Mystery Hunt subreddit has lots of other cool links and discussions, including this animated bar chart, so I won’t spend more time here.

I’m pretty sure the best story I personally experienced was how we solved the metameta, the biggest puzzle that stood between us and winning hunt, and the moment at which we went, holy smokes we might have won Mystery Hunt! We got all the pennies required for the final metameta at Sunday 5:48:09am, and it took us until 12:47:30pm, just under seven hours later, to solve it…

Startup Lag

This is a weird song choice — I have not even watched the movie. But there is a story, and there is a thematic correspondence.

The story is that I was interning remotely at a coworking space over the summer. One night, I attended a karaoke event hosted there, the kind where adult human beings socialize and where I didn’t know anybody else, and I sang this song. Afterwards, another attendee told me that her kid (yeah, you know, people in my reference class have children) loved Moana and was really excited about my performance.

The thematic correspondence is less obvious and harder for me to describe. I’m going much less further this year than I could be, and am less sure about next year than I expected to be at this point for reasons I’m not ready to share yet (this seems to be happening more and more on this blog, but there’s not much I can do about it — so it goes). But it really is the case that there are some things I can’t deny about myself, some attractor states that my values and way of thinking keep dragging me towards.

Frivolous examples: I went through another online Dominion phase and at least two Protobowl phases, the highlight of which is learning a good deal about Émile Durkheim and then buzzing on him the next day. I did Advent of Code again, with the same golfing setup as last year, a foray into making an auxiliary over-the-top leaderboard in Svelte, and (surprisingly to myself) getting first. I have a shiny Charizard with Blast Burn now.

Shiny Charizard in Pokémon Go