2023 MIT Mystery Hunt

My seventh year doing Mystery Hunt with ✈✈✈ Galactic Trendsetters ✈✈✈, and after a hiatus it was in person again! This also makes it my first in-person Mystery Hunt as an alumnus, where I flew in and stayed at a hotel. How time marches on… I appreciated getting to see everybody on Galactic, as well as quite a few internet puzzlers at the location where all the cool people always go, Flour Bakery and Cafe.

Campus hadn’t changed too much. There were more card readers, but also fancy kiosks where ID cards could be printed on demand (via the official 1.2/5★-rated app). I set aside a little time before kickoff to try to locate a working kiosk to print my ID, but the two kiosks I found west of Mass Ave, in W20 and W35, were both out of order; only much later did I print a card in 16. But I am a card-carrying alumnus now. Galactic had two classrooms in 4-2 and lots of masks and tests. One of my teammates brought their dog. It was a fun time.

As typical nowadays, the hunt announcement and kickoff began with a facade theme of a museum. However, the twist was handled a bit differently — kickoff had an additional diegetic level: normally the story is followed by an out-of-character talk about health/safety and policies, but this year that talk, while still in a different universe from the museum, was intertwined with an introduction to MATE, the AI who had ostensibly been writing all the puzzles. Over the course of the hunt, instead of discovering a possibly predictable secret plan or betrayal by MATE, we instead found ourselves on its side because (in the outer fictional diegetic level) teammate had shut off some other “overly creative” AIs and overworked MATE.

Some comments on the plot, website, art, and design. Firstly, I thought the messing with diegetic levels was neat, as I always do. Secondly, I was blown away by the immersiveness of the website. Every single round had a stunning artistic design, the puzzle factory’s point-and-click exploration was executed perfectly and brimming with detail, and the AIs and their rounds each had a ton of personality. Most solvers probably had high expectations for this Mystery Hunt given the last few Teammate Hunts’ websites, and they rose to the challenge. However, one aspect I wasn’t as excited by was the live teamwide multiple-choice dialogues used to introduce a bunch of major plot points. Although this arguably makes more sense in-universe, and although I know many other solvers who really enjoyed it, to me it felt like a kind of interaction I could have with the rest of Galactic nearly every day over voice chat and stream while sitting at home. I totally understand if future teams continue with this — it still seemed generally well-received, and I think teams should do whatever is necessary to give themselves less work to handle while hunt is running. But I think the in-person nature of Mystery Hunt and the possibility of more direct interactions between teams and the huntrunners are some of its big differentiators and was sad that they seemed to have been deemphasized this year. Finally, I have to mention that the super high-level plot concept of AIs taking over puzzle writing felt a little “too real” for me to fully enjoy… there were a few moments during kickoff, however brief, where my primary emotion was existential dread rather than amusement. A bit of an awkward point to end this discussion on, but these minor complaints aside I’d say this Hunt’s presentation was brilliant.

The obvious fact about this hunt is that it ran long. Most Mystery Hunts aim to end solidly within Sunday, but the Team Formerly Known as the Team Formerly Known as the Team Formerly Known as the Team Formerly Known as the Team Formerly Known as the Team to Be Named Later were first to solve the “endgame trigger” at 4:13am Monday and found the coin at 7:23am. Galactic “finished” at 5:07am on Monday and did the endgame at 8am. All this happened in spite of teammate actually having tried to write a shorter hunt and having fewer puzzles than past years — the average puzzle was just far too hard. As solvers we couldn’t really tell how far through the hunt we were for most of it, so we had no idea whether our rate of progression through the hunt was on track; but it appears that as early as Friday afternoon teammate realized that things were going slower than planned. It became increasingly clear to everybody what was happening on Sunday, as teammate handed out a lot of free answers and issued more and more “errata” to make puzzles easier.

This is the first “long hunt” I experienced in person, and some part of me just thinks that getting that experience is neat. (The last long hunt was Manic Sages’ 2013 hunt, which I technically did do — remotely, with Random — but since I had no past experiences to compare against and was also mostly vacationing with family, I wasn’t around enough to experience anything unusual about the hunt length.) But before I talk more about that, non-spoilery highlights:

I think my favorite puzzle is Terminal. I loved the base mechanic already, and it’s one of those puzzles that feel completely impossible right up until the moment when you solve the damn thing and emerge victorious.
Interpretive Art was much simpler, but easily the most entertaining puzzle. Behind every clue is the reward of not only an answer, but also an opportunity to say that answer aloud and bask in the responses of your teammates.
Finally, Win a Game of Bingo is another interactive puzzle that also feels impossible until you solve it, except that there’s more than one “valley of impossibility” in the process. Plus, the last step is very funny. I’d rank Terminal above it mostly just because I think that puzzle’s grunt work is a little more fun than this one’s; I’m fortunate to have teammates who are more willing to do grunt work than I am.

The obligatory hunt length discussion

Okay let’s beat the dead horse a little.

It’s very in vogue to discuss that Mystery Hunt is getting too long, yadda yadda, but personally, I have full faith in teammate’s intentions and think that hunt length estimation is somewhere between really hard and impossible. It’s a miracle to me that as many Mystery Hunts end as close to the desired time as they do. I doubt anybody outside teammate is in a position to really diagnose why the hunt ran so much longer than they expected and what could be done about it in the future (and I hope they do).

How would you estimate the length of a hunt before it runs? The obvious strategy: During testsolving, ask people to record how long they spent testsolving, then use that to calculate an average number of person-hours needed to solve each puzzle. Sum those up and intersect it with a graph of the cumulative number of person-hours you’d expect a top team to have, taking into account meals and sleep and so on.

The result of this will be a number that I’d guess is… basically completely meaningless. Why? Obviously person-hours are not fungible — some puzzles are more parallelizable than others; some solvers are more experienced than others. But the estimate will be off for puzzlehunts for many additional reasons:

If, as is often the case, your team consists of people with broadly similar skills and interests, then testsolvers from your team will be better at solving puzzles written by your team than actual solvers, and your testsolves will underestimate the difficulty. (It’s usually really hard to find testsolvers outside your team to calibrate against. After all, people want to do Mystery Hunt.)
On the other hand, it might be that the solvers on your team who would be best at a certain niche puzzle have all been “used up” as either authors of the puzzle or past testsolvers, whereas teams in the actual hunt can put their most suited solvers on those puzzles. In that case your testsolves might overestimate the difficulty.
In an actual hunt, solvers may be more warmed-up and have better resources. But they also may be more tired.
Individual ahas in individual puzzles can introduce a lot of variance, especially if they’re in an impactful puzzle — a meta, a feeder that’s a likely break-in to a meta, or just something with a special role in the unlock structure (like this year’s Loading Puzzle).
Person-hours can be “wasted” on puzzles that don’t get solved before the meta. Conversely, puzzles can be skipped by backsolving or just solving the meta without them, or made easier by meta constraints.
Finally, a lot about a puzzle can change between a testsolve and the actually running puzzlehunt. Sometimes the presentation of the puzzle in the final hunt introduces or removes red herrings. For example, from our 2021 Mystery Hunt, ⊥IW.giga played out very differently from testsolves due to Twins having an image that people tried to forward-solve (not to mention A Routine Matter being both particularly difficult and a particularly useful feeder). Sometimes some external resource that helps with the puzzle appears, disappears, changes, or becomes much easier/harder to Google between the testsolve and the hunt; I can give examples from my earliest years of puzzlehunting, CiSRA 2012 1B (the link is to the puzzles page but the explanation is in the solution’s notes), all the way to last year, GPH 2022’s intro round.

I can only speculate here, but I’d guess this year the first bullet point probably had the largest effect — no less since Galactic, who I think is likely the most similar team to teammate in terms of skills and interests, was still not trying to win. In general, though, I think it’s unclear what the net effect of all these is, or even that effect’s direction. It’s easy to say that this year’s puzzles were too hard in hindsight, but I have no idea what a writing team should do to estimate how much harder their hunt is than desired.

Still, it may be instructive to ponder: which side do you want the error bars on your estimate to be on? More bluntly, if you had to choose between the coin being found at, say, either Saturday 6am or Monday 6am, which would you choose? I think when I first thought about this, my selfish choice as an avid solver on a top team who can somewhat reliably finish Mystery Hunt would be a mild preference for the latter, but it’s close. And what I think is much clearer is that the majority of hunt participants would prefer the former scenario. More teams get to see most of the hunt and potentially finish it and experience the endgame. Puzzle authors and interaction designers get to show their puzzles and interactions to more solvers. Everybody can go to sleep on Sunday. The downsides would mostly just be felt by top teams who might have one more day than expected with no puzzles to do, but puzzlehunters are clever and I’m sure they can find ways to entertain themselves. (More flippantly: by some accounts, the last time this happened, ✈✈✈ Galactic Trendsetters ✈✈✈ lived up to our name and revolutionized the online puzzlehunt space :)

In the end, the most concretely actionable advice I remember receiving is that huntrunners should leave a lot of knobs they can use to adjust the hunt speed within a healthy margin while the hunt is live. (From that perspective, you could say that the fact that teammate was able to course-correct the hunt length as much as they did makes it a resounding success!)

Thoughts on hunt structure

Although the hunt’s length might be its most memorable aspect, I think I was actually more surprised by the lack of an obvious “intro round” in this year’s hunt. This is especially because teammate listed giving small teams a good experience among their goals during wrap-up. There may not be more to explain beyond calibration being off across all puzzles, as it looks like all the museum puzzles were intended to be easy. But on Saturday, as we solved various important-looked metas, we constantly half-joked that we had just finished the intro round and unlocked the main round. When we finally solved Reactivation, that may not have been a terrible description. Even so, I think we made the joke yet a few more times on Sunday. Maybe everything before Reactivation was Act I, the AI rounds were Act II, and there was an entire Act III yet to unlock! Who could say any more?

During post-hunt discussion, I did learn about perspectives I hadn’t thought about before. I had mostly been thinking of intro rounds as a way to give the smallest teams something to accomplish during hunt weekend, but there’s a wide gap between those small teams and the top few. If you’re somewhere in the middle, say a team who can comfortably finish a small intro round similar to those in last few Mystery Hunts in a few hours but who can’t finish the whole hunt in the weekend, you might spend most of the hunt solving puzzles without ever receiving any further narrative payoff or closure. In contrast, this year some of those teams were able to reach Reactivation right at the end of the weekend and enjoyed that experience more. An off-the-cuff conclusion is that the way to satisfy as many people as possible is to put major plot points \(1/2^n\) of the way through the hunt for as many positive integers \(n\) as you can.

Another benefit of easy puzzles I hadn’t thought about much, and in particular of spreading them through the hunt rather than putting them all in early rounds, is to provide something for casual or drop-in solvers on strong teams to work on. Even super competitive teams often have solvers who are mostly there for the social aspect and just want to puzzle for an hour or two in the middle; it’s nice if they get to fully solve a puzzle in that time. I think this is something I felt was missing this year even despite being on a team that saw the full hunt.

Finally, one curious aspect of this year’s hunt is I felt like the creative rounds were pretty backloaded. The incredible Hall of Innovation was buried under five rounds of fairly standard puzzles and metas, and of course the craziest rounds were all at the end. In comparison, when Galactic ran the 2021 hunt I’m fairly confident we consciously put Infinite Corridor, one of our “coolest” rounds, early to maximize how many teams could see it and solve it. I’m probably the wrong person to analyze this at depth since I was on a team that experienced all of that anyway, but I’m curious how small to medium-sized teams felt about the round order.

Spoilery recap

This is as usual, though since I feel like more people will want to post-solve puzzles than usual, I have double-spoilered a few cluephrases that come late in puzzles as well as the entirety of the discussion of Win a Game of Bingo.

Museum puzzles I worked on and enjoyed, but don’t have much to say about, include You’re Telling Me, Natural Transformation, and Scicabulary. I have already mentioned Interpretive Art as one of the highlights. Swarming Collage was fun.

There were some puzzles I worked on and didn’t forward solve:

Apples Plus Bananas: We spent a few hours on this before backsolving it. On one hand, I do think it’s considerably harder than you’d expect for literally the first puzzle unlocked in the hunt, but on the other hand, I also know teams that got the initial insight in a few minutes. My current belief is ✈✈✈ Galactic Trendsetters ✈✈✈ just struggled due to having a lot of people (myself included) who, upon reading the flavortext, will think, “The word ‘variety’ must be cluing ‘algebraic variety’. There is no other possible explanation.” At some point my leading theory was that the puzzle was about the infamous 95% of people cannot solve this! math riddle.

In attempt two or three I watched the first Google result for the puzzle title and got it stuck in my head. I had never heard the song before and was very confused at why an arbitrary phonics gimmick was being applied to an arbitrary children’s song, but it turns out this is actually just how this song goes. As a bonus I briefly convinced some people around me that the puzzle was about vowel sounds.

It wasn’t until attempt four or five, long after we had backsolved it and were just looking at puzzles for fun, that I finally figured out what was happening. I extracted two letters, verified that they matched the answer, and decided I understood the puzzle to my satisfaction. (I was actually wrong — coincidentally, the two earliest fruits give the same answer if you take first letters instead of indexing — but ehhh I’m sure we would have figured it out.)
G|R|E|A|T W|H|A|L|E S|O|N|G: I spent a lot of time transcribing Morse and then failing to solve easy clues that were just customized enough that you can’t solve them by Googling with the word “crossword”. Fortunately my teammates are smarter than that. I think we figured out how dashes and word breaks crossed in the crossword, but we never understood it the way the puzzle intended and weren’t able to get the grid to work out, and our lack of understanding means we wouldn’t have been able to extract anyway. Given that every crossing in this crossword has a 1/3 chance of working just by coincidence, it still feels underchecked to me; I think the puzzle would have been better and still constructible with a much more densely checked American-style crossword grid and ordered clues (and presumably without the initial transcription or data-munging step).

When Hall of Innovation first unlocked, I looked at it with some people and made some early big-picture discoveries, but then left as it seemed like it was being overwhelmed with solvers. I don’t remember exactly what I did instead but I think I helped solve Quality Assurance. I also participated in the proctored Think Fast interaction. The most entertaining moment was when we developed an elaborate system for remembering Level 6’s letters in preparation for anagramming them, then the letters were revealed and I “saw” the pangram and blurted it out. I can’t really explain this experience and have not been able to replicate it with some other random anagrams served up by the puzzle, but I’m sure Galactic’s long-running tradition of perfecting the NYT Spelling Bee helped, plus our word (ABEFGILNORUV → UNFORGIVABLE) had easy prefixes and suffixes.

We solved the first three museum metas between 6pm and 11pm Friday, but I didn’t meaningfully participate in any of their solves. Instead I spent the last few hours of Friday night in a group finishing the grid for Gears and then trying to extract, to no avail. Eventually it got backsolved. Though I understand the desire to make the final extraction thematic to the answer, I think this puzzle was hampered by having two or three more steps than it needs, as well as cluephrases that were too hard to interpret — we got to DIAGONALIZE CHARS, which it seems that solvers must follow twice, first by interpreting both words in a less intuitive way before interpreting both in the ways we actually convinced ourselves of, to do the puzzle as intended? My opinion is that the fresh and factory-thematic take on a Rows Garden would have been enough to make a solid puzzle even if finished off with some trivial highlighted-blanks extraction.

On Saturday, in stark contrast to Friday, I barely remember any feeders I worked on. I think Quandle was a fun, relaxing puzzle to throw effort at in between other puzzles, but otherwise I was on full meta duty. I contributed only the finishing touches of grunt work on Artistic Vision to read Braille after we had gotten all the puzzle answers, which we solved around noon. The first meta I actually pushed on was A Conspiracy Network — even with a bunch of clues, I had to scroll through the Wikipedia list of hash functions several times to convince myself that there were semantic matches for both EDDY and PHARAOH. In hindsight, it still boggles my mind how anybody could have come up with this meta, in the sense of choosing all its components and deciding to put them into the same meta. Is there a relation between conspiracies and hash functions we never discovered? It also feels like one of those puzzles where Galactic’s similarity to teammate might have been an advantage. We solved both that meta and Hall of Innovation around 3:30pm, followed shortly after by The Blueprint.

Afterwards, I also helped with bits and pieces of MATE’s META, which was a cute metameta that we solved around 4:30pm. I don’t remember what I was working on between that and when we solved Reactivation and The Junk Pile, both around 10pm, but I never even looked at those puzzles during the hunt.

The last puzzle I remember working on Saturday was The Legend, the first Wyrm meta. Somebody had retrieved a bunch of pretty laser-cut triangles and dumped them on a table, above which the meta answers were scrawled on a blackboard; as nobody was doing anything with the tiles, I started idly putting them together to spell CERBERUS in a straight line. After cramming a few other answers into the configuration in increasingly tenuous paths, I apparently convinced the others around me that this was the right thing to do, even before I had convinced myself (this has happened before, as recently as last year’s endgame puzzle…). Eventually the collective extent of conviction was high enough that we started over rigorously, making deductions that certain triangles had to be “hinged together”, in a way that felt like it was exactly the right amount of constrained, until we had a complete arrangement — part of me expected the configuration to start glowing and spin into the air when we put the last triangle in its spot. On that triumphant note, I went to sleep.

On Sunday morning, I remembered from the previous night that people had been making steady progress on Terminal, so I was a bit surprised to see it not even close to done, but I was also happy that I’d still get to work on it. I broke into a few clues by typing “dragon” into the text box, as one does, and clawing my way to phrases like “story from reptile” and “electronic reptile”. After some more teamwork I saw the puzzle past the finish line.

I think I spent a lot of Sunday afternoon generally bouncing around and don’t remember all the puzzles I touched briefly. I think I helped locate one of the last missing quotes on Invisible just by guessing and searching, soon after which the puzzle was solved, and made some very minor contributions to the last few steps of the Boötes meta as I watched people finish it off. I solved one subpuzzle in Cure the Werewolf’s Woe, tried to start extracting, and vaguely recognized the HIRSUTER pun, but didn’t have enough to parse the rest of the cluephrase. I started Run the Gamut before handing it off to the (many) other Galactic members who were more familiar with the subject.

The last puzzle I significantly helped successfully forward-solve was Win a Game of Bingo, which I will double-spoiler:

The last puzzle I remember working on was Sea Bass. After looking at the very stuck sheet, I somehow dredged up solresol from my memory — I didn’t even remember what it was, just that it was the name of something, but once I looked it up, it was pretty clear it was the right idea. Piecing together cryptics in a conlang I knew nothing about was very fun; we unfortunately did not forward-solve it, although the extraction made sense in hindsight. It was a rookie mistake, really: somebody told me the crossword grid was a bass guitar fretboard and I became really anchored to that interpretation.

In the end I was asleep when we solved most of the AI metas, but woke up in time to go on our 8am runaround. I do not know how teammate found the energy to run their endgame for as many teams as they did despite the time, and want to say that I don’t expect this of future writers in the slightest; but it was a great way to conclude the weekend.

And that makes another year! Thanks, as always, to teammate for putting together such a great hunt, and I’m looking forward to see what TTBNL has to offer next year. May the deities of hunt length calibration smile upon you.