Variance

It’s olympiad season. Taiwan placed 18th in the IMO rankings¹. Next day there are news stories about how it’s our “third worst performance in history”, and commenters drawing casual arrows from changes in Taiwan’s standardized tests and curriculum to this result, and the Ministry of Education saying they’d review their procedures or something.

What.

Did you forget our performance last year? Do you think our olympiad training system is completely overhauled on an annual basis, or has even a tangential relationship with the overall education system?? Do you think people can be turned towards or away from mathematical olympiads because of standardized tests?? Do you think these contestants’ training plans are so transient that they’d be completely overthrown by one year of progress towards the implementation of faraway edicts about math topics for the layman???? If a measurement gives you a result of 3rd place one year and 18th place the next — both of which are more than 1.5 standard deviations from the mean² — how can your first (or second, or third) thought possibly be “we performed 15 places worse so something needs to be changed” instead of “oh wait this is not a very precise metric, maybe we shouldn’t read too much into it”?

Yes, I’m ranting here, but even though these things are so clear to me, I know it’s not obvious to most people at all because they don’t understand just how huge the variance in mathematics competitions are. Or in any similar competition, really. (And I know I’m likely preaching to the choir here by writing on my blog — a very understaffed choir to boot. But I have to do this. Remember? Streak?)

It’s like… I’m trying to come up with a suitably crazy analogy, but it’s not easy… it’s like, you decide one day to look at the front pages of The New York Times and of Al Jazeera. You don’t even read any articles, you just look at the front page layout and graphics and text, and you notice that Al Jazeera has a video on its front page and the NYT doesn’t. You don’t even come back the next day to check again, you just conclude based on that one day’s whim that the New York Times needs more variety in its journalism, and you write and publish an article to that effect.

Where do I even begin.

We are looking at the performance of six people chosen from a country with 15,000,000 people aged 15 to 19 (source ³). The people on the team change every year. In fact this is truer for us than a lot of other countries — many contestants (including me) choose not to participate three or more times even if they would be eligible and could easily get a spot on the team. I’m not actually sure what everybody’s reasons are, but I can say now that it is actually somewhat of a custom and not just coincidence. This pretty much ensures that the set of contestants we get some years are going to be luckier than some others. This need not have anything to do with any part of the country’s education system. You can’t get around the component of chance.
The variance increases even more when you consider that we are measuring our performance with a ranking between countries, not some absolute unit of problem-solving ability. Even if the skill level of Taiwan’s contestants is precisely constant — even if we somehow sent clones of the same six students to solve the same six problems every year and they got the exact same scores each year — our ranking would still fluctuate depending on how good other countries’ teams happened to be at those problems.
And speaking of the problems: okay, seriously. This is not the SAT or PISA or whatever, where they norm the test every year and try to cover each subject proportionally or anything. This is an olympiad competition with six problems, and any way you slice it, the number of potential olympiad topics is way more than six. Some years there will be combinatorial geometry or a really weird inequality. Other years there won’t. Some people will happen to be good at these types of problems, even if they might not be as good at many other types that just didn’t happen to get chosen for the paper, and so perform above expectations. In fact, some people may well happen to be good at these types of problems but not the particular problem that came up, and then choose to spend all their time on the particular problem because they expect to be able to solve it and lose out on points elsewhere. The decision of how to allocate one’s time between problems is often far from obvious and just as often hugely influential on one’s score.
If you know my olympiad experience you’ll know that this is kind of a personal rant now, so one more example strategic choice — when faced with a geometry problem, you can choose to bash it or to try to do it synthetically; often, bashing requires fewer insights, but more careful work, and carries with it the risk that if you don’t manage to finish the bash or make an error early on, you may not get any points at all, whereas synthetic partial results are more likely to be rewarded by the marking scheme. The strategic considerations behind such a choice and their accuracy have practically no bearing on mathematical ability, but the points can cause a huge jump in the rankings just the same.
And finally, great, even if our system for creating IMO-chart-topping students isn’t really optimal, why do you reporters and parents care? This is very, very different from optimizing an education system for creating a baseline level of literacy or marketable skills, or for boosting economic productivity, or pretty much all the metrics that society cares about and that justify having a general education system in the first place. Even if you are concerned about nurturing the top mathematicians and scientists to serve as pioneers or visionaries for the rest of the country (and I suppose there are good reasons to be thus concerned), there is still a significant difference between this and olympiad performance, and I suspect that beyond a certain point (which we’re far from, but I can’t say the same for every single country in the IMO), optimizing for these reportable victories will actually tend to destroy the students’ future prospects by inducing burnout or overemphasizing particular rigid forms of problem solving. I’m not saying you can’t have all of these things, or that one can only be done at the expense of the others, or that I wouldn’t like to see a better system for creating IMO-chart-topping students. But except for the present and former contestants and their close friends and coaches, this ranking really shouldn’t matter to you because it is an extremely imprecise signal for most of the things that affect society at large. And we certainly don’t need your officious diagnoses of the problem and wildly off-base suggestions for fixing it.
And finally, this is quite a tangent but Jesus zarking Babelfishes, MIT did not give me a full scholarship. I’ve already agonized about this before on this blog enough, so suffice it to say: where the hell do you get your facts, reporters?

I suppose this is one of those incidents that supports what is apparently known as Knoll’s Law of Media Accuracy: “Everything you read in the newspapers is absolutely true except for the rare story of which you happen to have firsthand knowledge.” Something important to keep in mind.

(edit: clarified purpose of item 3, added extra points I wanted to make to item 4)