2023 MIT Mystery Hunt

My seventh year doing Mystery Hunt with ✈✈✈ Galactic Trendsetters ✈✈✈, and after a hiatus it was in person again! This also makes it my first in-person Mystery Hunt as an alumnus, where I flew in and stayed at a hotel. How time marches on… I appreciated getting to see everybody on Galactic, as well as quite a few internet puzzlers at the location where all the cool people always go, Flour Bakery and Cafe.

Campus hadn’t changed too much. There were more card readers, but also fancy kiosks where ID cards could be printed on demand (via the official 1.2/5★-rated app). I set aside a little time before kickoff to try to locate a working kiosk to print my ID, but the two kiosks I found west of Mass Ave, in W20 and W35, were both out of order; only much later did I print a card in 16. But I am a card-carrying alumnus now. Galactic had two classrooms in 4-2 and lots of masks and tests. One of my teammates brought their dog. It was a fun time.

As typical nowadays, the hunt announcement and kickoff began with a facade theme of a museum. However, the twist was handled a bit differently — kickoff had an additional diegetic level: normally the story is followed by an out-of-character talk about health/safety and policies, but this year that talk, while still in a different universe from the museum, was intertwined with an introduction to MATE, the AI who had ostensibly been writing all the puzzles. Over the course of the hunt, instead of discovering a possibly predictable secret plan or betrayal by MATE, we instead found ourselves on its side because (in the outer fictional diegetic level) teammate had shut off some other “overly creative” AIs and overworked MATE.

Some comments on the plot, website, art, and design. Firstly, I thought the messing with diegetic levels was neat, as I always do. Secondly, I was blown away by the immersiveness of the website. Every single round had a stunning artistic design, the puzzle factory’s point-and-click exploration was executed perfectly and brimming with detail, and the AIs and their rounds each had a ton of personality. Most solvers probably had high expectations for this Mystery Hunt given the last few Teammate Hunts’ websites, and they rose to the challenge. However, one aspect I wasn’t as excited by was the live teamwide multiple-choice dialogues used to introduce a bunch of major plot points. Although this arguably makes more sense in-universe, and although I know many other solvers who really enjoyed it, to me it felt like a kind of interaction I could have with the rest of Galactic nearly every day over voice chat and stream while sitting at home. I totally understand if future teams continue with this — it still seemed generally well-received, and I think teams should do whatever is necessary to give themselves less work to handle while hunt is running. But I think the in-person nature of Mystery Hunt and the possibility of more direct interactions between teams and the huntrunners are some of its big differentiators and was sad that they seemed to have been deemphasized this year. Finally, I have to mention that the super high-level plot concept of AIs taking over puzzle writing felt a little “too real” for me to fully enjoy… there were a few moments during kickoff, however brief, where my primary emotion was existential dread rather than amusement. A bit of an awkward point to end this discussion on, but these minor complaints aside I’d say this Hunt’s presentation was brilliant.

The obvious fact about this hunt is that it ran long. Most Mystery Hunts aim to end solidly within Sunday, but the Team Formerly Known as the Team Formerly Known as the Team Formerly Known as the Team Formerly Known as the Team Formerly Known as the Team to Be Named Later were first to solve the “endgame trigger” at 4:13am Monday and found the coin at 7:23am. Galactic “finished” at 5:07am on Monday and did the endgame at 8am. All this happened in spite of teammate actually having tried to write a shorter hunt and having fewer puzzles than past years — the average puzzle was just far too hard. As solvers we couldn’t really tell how far through the hunt we were for most of it, so we had no idea whether our rate of progression through the hunt was on track; but it appears that as early as Friday afternoon teammate realized that things were going slower than planned. It became increasingly clear to everybody what was happening on Sunday, as teammate handed out a lot of free answers and issued more and more “errata” to make puzzles easier.

This is the first “long hunt” I experienced in person, and some part of me just thinks that getting that experience is neat. (The last long hunt was Manic Sages’ 2013 hunt, which I technically did do — remotely, with Random — but since I had no past experiences to compare against and was also mostly vacationing with family, I wasn’t around enough to experience anything unusual about the hunt length.) But before I talk more about that, non-spoilery highlights:

  • I think my favorite puzzle is Terminal. I loved the base mechanic already, and it’s one of those puzzles that feel completely impossible right up until the moment when you solve the damn thing and emerge victorious.
  • Interpretive Art was much simpler, but easily the most entertaining puzzle. Behind every clue is the reward of not only an answer, but also an opportunity to say that answer aloud and bask in the responses of your teammates.
  • Finally, Win a Game of Bingo is another interactive puzzle that also feels impossible until you solve it, except that there’s more than one “valley of impossibility” in the process. Plus, the last step is very funny. I’d rank Terminal above it mostly just because I think that puzzle’s grunt work is a little more fun than this one’s; I’m fortunate to have teammates who are more willing to do grunt work than I am.

The obligatory hunt length discussion

Okay let’s beat the dead horse a little.

It’s very in vogue to discuss that Mystery Hunt is getting too long, yadda yadda, but personally, I have full faith in teammate’s intentions and think that hunt length estimation is somewhere between really hard and impossible. It’s a miracle to me that as many Mystery Hunts end as close to the desired time as they do. I doubt anybody outside teammate is in a position to really diagnose why the hunt ran so much longer than they expected and what could be done about it in the future (and I hope they do).

How would you estimate the length of a hunt before it runs? The obvious strategy: During testsolving, ask people to record how long they spent testsolving, then use that to calculate an average number of person-hours needed to solve each puzzle. Sum those up and intersect it with a graph of the cumulative number of person-hours you’d expect a top team to have, taking into account meals and sleep and so on.

The result of this will be a number that I’d guess is… basically completely meaningless. Why? Obviously person-hours are not fungible — some puzzles are more parallelizable than others; some solvers are more experienced than others. But the estimate will be off for puzzlehunts for many additional reasons:

  • If, as is often the case, your team consists of people with broadly similar skills and interests, then testsolvers from your team will be better at solving puzzles written by your team than actual solvers, and your testsolves will underestimate the difficulty. (It’s usually really hard to find testsolvers outside your team to calibrate against. After all, people want to do Mystery Hunt.)
  • On the other hand, it might be that the solvers on your team who would be best at a certain niche puzzle have all been “used up” as either authors of the puzzle or past testsolvers, whereas teams in the actual hunt can put their most suited solvers on those puzzles. In that case your testsolves might overestimate the difficulty.
  • In an actual hunt, solvers may be more warmed-up and have better resources. But they also may be more tired.
  • Individual ahas in individual puzzles can introduce a lot of variance, especially if they’re in an impactful puzzle — a meta, a feeder that’s a likely break-in to a meta, or just something with a special role in the unlock structure (like this year’s Loading Puzzle).
  • Person-hours can be “wasted” on puzzles that don’t get solved before the meta. Conversely, puzzles can be skipped by backsolving or just solving the meta without them, or made easier by meta constraints.
  • Finally, a lot about a puzzle can change between a testsolve and the actually running puzzlehunt. Sometimes the presentation of the puzzle in the final hunt introduces or removes red herrings. For example, from our 2021 Mystery Hunt, ⊥IW.giga played out very differently from testsolves due to Twins having an image that people tried to forward-solve (not to mention A Routine Matter being both particularly difficult and a particularly useful feeder). Sometimes some external resource that helps with the puzzle appears, disappears, changes, or becomes much easier/harder to Google between the testsolve and the hunt; I can give examples from my earliest years of puzzlehunting, CiSRA 2012 1B (the link is to the puzzles page but the explanation is in the solution’s notes), all the way to last year, GPH 2022’s intro round.

I can only speculate here, but I’d guess this year the first bullet point probably had the largest effect — no less since Galactic, who I think is likely the most similar team to teammate in terms of skills and interests, was still not trying to win. In general, though, I think it’s unclear what the net effect of all these is, or even that effect’s direction. It’s easy to say that this year’s puzzles were too hard in hindsight, but I have no idea what a writing team should do to estimate how much harder their hunt is than desired.

Still, it may be instructive to ponder: which side do you want the error bars on your estimate to be on? More bluntly, if you had to choose between the coin being found at, say, either Saturday 6am or Monday 6am, which would you choose? I think when I first thought about this, my selfish choice as an avid solver on a top team who can somewhat reliably finish Mystery Hunt would be a mild preference for the latter, but it’s close. And what I think is much clearer is that the majority of hunt participants would prefer the former scenario. More teams get to see most of the hunt and potentially finish it and experience the endgame. Puzzle authors and interaction designers get to show their puzzles and interactions to more solvers. Everybody can go to sleep on Sunday. The downsides would mostly just be felt by top teams who might have one more day than expected with no puzzles to do, but puzzlehunters are clever and I’m sure they can find ways to entertain themselves. (More flippantly: by some accounts, the last time this happened, ✈✈✈ Galactic Trendsetters ✈✈✈ lived up to our name and revolutionized the online puzzlehunt space :)

In the end, the most concretely actionable advice I remember receiving is that huntrunners should leave a lot of knobs they can use to adjust the hunt speed within a healthy margin while the hunt is live. (From that perspective, you could say that the fact that teammate was able to course-correct the hunt length as much as they did makes it a resounding success!)

Thoughts on hunt structure

Although the hunt’s length might be its most memorable aspect, I think I was actually more surprised by the lack of an obvious “intro round” in this year’s hunt. This is especially because teammate listed giving small teams a good experience among their goals during wrap-up. There may not be more to explain beyond calibration being off across all puzzles, as it looks like all the museum puzzles were intended to be easy. But on Saturday, as we solved various important-looked metas, we constantly half-joked that we had just finished the intro round and unlocked the main round. When we finally solved Reactivation, that may not have been a terrible description. Even so, I think we made the joke yet a few more times on Sunday. Maybe everything before Reactivation was Act I, the AI rounds were Act II, and there was an entire Act III yet to unlock! Who could say any more?

During post-hunt discussion, I did learn about perspectives I hadn’t thought about before. I had mostly been thinking of intro rounds as a way to give the smallest teams something to accomplish during hunt weekend, but there’s a wide gap between those small teams and the top few. If you’re somewhere in the middle, say a team who can comfortably finish a small intro round similar to those in last few Mystery Hunts in a few hours but who can’t finish the whole hunt in the weekend, you might spend most of the hunt solving puzzles without ever receiving any further narrative payoff or closure. In contrast, this year some of those teams were able to reach Reactivation right at the end of the weekend and enjoyed that experience more. An off-the-cuff conclusion is that the way to satisfy as many people as possible is to put major plot points \(1/2^n\) of the way through the hunt for as many positive integers \(n\) as you can.

Another benefit of easy puzzles I hadn’t thought about much, and in particular of spreading them through the hunt rather than putting them all in early rounds, is to provide something for casual or drop-in solvers on strong teams to work on. Even super competitive teams often have solvers who are mostly there for the social aspect and just want to puzzle for an hour or two in the middle; it’s nice if they get to fully solve a puzzle in that time. I think this is something I felt was missing this year even despite being on a team that saw the full hunt.

Finally, one curious aspect of this year’s hunt is I felt like the creative rounds were pretty backloaded. The incredible Hall of Innovation was buried under five rounds of fairly standard puzzles and metas, and of course the craziest rounds were all at the end. In comparison, when Galactic ran the 2021 hunt I’m fairly confident we consciously put Infinite Corridor, one of our “coolest” rounds, early to maximize how many teams could see it and solve it. I’m probably the wrong person to analyze this at depth since I was on a team that experienced all of that anyway, but I’m curious how small to medium-sized teams felt about the round order.

Spoilery recap

This is as usual, though since I feel like more people will want to post-solve puzzles than usual, I have double-spoilered a few cluephrases that come late in puzzles as well as the entirety of the discussion of Win a Game of Bingo.

And that makes another year! Thanks, as always, to teammate for putting together such a great hunt, and I’m looking forward to see what TTBNL has to offer next year. May the deities of hunt length calibration smile upon you.

(note: the commenting setup here is experimental and I may not check my comments often; if you want to tell me something instead of the world, email me!)