Quixotic Reimagining of Standardized Tests (Part 1)

Life update: I got my driver’s license from the place where I learned to drive. Then I drove home from there with my mom, and it was zarking terrifying.

Also, WordPress says it has protected my blog from 38 spam comments.

Early in the morning tomorrow, I have a small surgical operation, so I can’t sleep too late. (Well, it ended up being pretty late anyway. Darn.) Therefore I think I’m going to do something unprecedented on this blog for the daily posting streak: I’m going to post an incomplete non-expository post.

Yes, the only purpose of the title is to get initials that are four consecutive letters of the alphabet..


One of the more argumentative post sequences on my blog involved ranting against standardized tests.

My very first stab was probably the silly satire directed at the test everybody has to take that takes up two hours per day of an entire week. Once college became a thing in my life, I wrote a humblebrag rant after I took the SAT and then a summary post after I snagged this subject for an English class research paper and finished said paper.

It should be plenty clear that I am not ranting against this part of the system because it’s disadvantageous to me.

But it should also be said that I’ve read some convincing arguments for using standardized tests more in college admissions (Pinker, then Aaronson). Despite the imperfections of tests, they argue, the alternatives are likely to be less fair and more easily gamed. The fear that selecting only high test-scorers will yield a class of one-dimensional boring thinkers is unfounded. And the idea that standardized tests “reduce a human being to a number” may be uncomfortable for some, but it makes no sense to prioritize avoiding a vague feeling of discomfort over trusting reliable social science studies. Neither article, you will note, advocates selecting all of one’s college admits based on highest score. Just a certain unspecified proportion, one that’s probably a lot larger than it is today.

And although I wish the first article linked its studies, I mostly agree with their arguments. So this puts me in a tricky position. These positions I’ve expressed seem hard to reconcile! So, after arguing about all this with a friend who told me things like

I think you fail to understand how anti-intellectual american society is

(comments on this statement are also welcome) I think some clarifications and updates on how I feel are in order.

Firstly, in my community, I think the perception that standardized tests matter is much stronger than it is in the States. If not, at least the actions people take to have their children do well on standardized tests are more drastic. I know students who start doing standardized test prep by early ninth grade and classmates who spent full weeks of mornings and afternoons at summer study programs for the same purpose. These are the people my rants were primarily intended for. (Link from the future: It’s kind of a bravery debate; the majority attitude in my community towards standardized tests may well not be representative of the broader college-attending community’s attitude.)

Secondly, my primary beef with standardized tests is not argued from the perspective of the colleges, it’s from the perspective of the students. I don’t have strong beliefs about whether standardized tests assess people well. Okay, the research paper I said I wrote for school was argued from the former perspective and claimed in its thesis statement that the SAT “should be de-emphasized relative to other predictors for predicting success”. But, well, it’s a research paper school assignment, and I had to pick a topic for which I could cite lots of sources. You will also note that “de-emphasized” and “relative” are pretty waffly words. That was, in fact, already a compromise with the teacher; my earlier theses were even wafflier. I don’t stand very firmly behind that thesis statement. Even in the version I ended up putting on my blog, I exaggerated my position a bit. I’m not statistics-literate enough to know whether an “improvement of 0.08 correlation” deserves to be described by the adjective “marginal”.

This was my stubborn belief-persistent tweet in response to the SAT overhaul:

(the tweet has since been deleted but here’s its content)
College Board Shakes Up SAT: http://t.co/Jn4rOLz5Fe Standardized tests are still bad but this does look like an improvement.

— Brian Chen (@betaveros) March 6, 2014

I’d like to recant the statement “Standardized tests are still bad”. It’s a great overgeneralization. The driving test I just took was pretty standardized, and studying for it and taking it left me a nervous wreck, but I would not advocate de-emphasizing it relative to other predictors of driving skill. I would not like to drive in a society where the other drivers do not have a standardized understanding of traffic rules and signal light customs and what all the weirdly colored arrow signs mean.

There’s also this parallel: many of my beloved math competitions or olympiads could be argued to be kind of like standardized tests. The IMO has a detailed appeal chain to ensure each paper is graded equally and a crude norming process in the proportions of medals it awards so that their values are approximately constant from year to year, right? (Of course the small number of problems inevitably makes luck play a huge role, but still.)

Of course, driving tests and math competitions are very different in structure and function from aptitude/achievement tests that are supposed to test college-readiness. The first test is very specific and narrowly tailored to test one skill — driving — and passing the test is required for certification to perform one narrow task — driving. Math competitions are also specific and target a narrow audience of people who are already interested in mathematics, and nobody is requiring that these competitions be taken to do anything (although presumably it does look good on one’s college applications), so (I hope) most contestants are contestants because they enjoy it. Meanwhile, the standardized tests for college mash together tests for various nebulous areas of skills as part of a gateway to college, where you might choose to study anything from physics to politics or poetry, and the degree to which you enjoy or require the skills you’ve been tested on will vary wildly. Standardization is a lot more acceptable to me without the business of “judging a fish by its ability to climb a tree”.

I’ll come back to this point, but the key takeaway is that standardization does not necessarily lead to unfair or arguably imprudent comparisons.

My far more strongly held belief was expressed with everything in the second half of the blog post: it sucks that students have to prepare for the SAT in particular, because practicing for the types of questions they pose is so soul-suckingly unproductive.

For example, when I read Aaronson’s (probably rhetorical) question:

…spots at the top universities are so coveted, and so much rarer than the demand, that no matter what you use as your admissions criterion, that thing will instantly get fetishized and turned into a commodity by students, parents, and companies eager to profit from their anxiety. If it’s grades, you’ll get a grades fetish; if sports, you’ll get a sports fetish; if community involvement, you’ll get soup kitchens sprouting up for the sole purpose of giving ambitious 17-year-olds something to write about in their application essays. […] So, given that reality, why not at least make the fetishized criterion one that’s uniform, explicit, predictively valid, relatively hard to game, and relevant to universities’ core intellectual mission?

this is what I wanted to reply: Since the people who will fetishize admissions criteria are guaranteed to do so, why not pick a criterion that aligns the goal of self-actualization with that of getting into a good college, for those people who don’t want a top college spot that badly? Measuring certain forms of extracurriculars, if it could be done fairly, lets students who want to get into a “good school” without selling their soul in the process know that they can pursue the activities they like without sacrificing college-quality points. Measuring a standardized test — or, at least, measuring the lifeless SAT — means those students have to allocate time and effort between preparing for it and preparing for excellence in whatever they’re interested in, with no way of achieving both.

But then I realized this response pretty much ignores all the beneficial qualities of standardized tests that the actual question listed, instead simply aiming to make the students’ life decisions easier. So this is probably not very convincing to admissions people, to put it mildly.

There are still several ways out. If standardized tests measure inherent aptitude that can’t be improved by studying, and everybody could be convinced of this, that would resolve this issue too. I think it would be somewhat cruel to match students to certain tiers of colleges without offering them any hope via ways to get ahead, but at least they could reassuredly spend the spare time of their middle- and high-school years for themselves. Unfortunately, even if standardized testing really isn’t amenable to coaching (my beliefs about this statement are quite weak), I an reasonably confident that social science studies could not produce evidence that’s strong enough to convince most people (in the society that I know and grew up in) not to have their child take those SAT or ACT classes advertised by the local test prep company. This would be much worse if colleges ever emphasize standardized test scores more than they do now.

So my conclusion is that standardized tests, and studying for said tests, will both still have to be things. But I don’t think my two goals are irreconcilable. If I had to design and implement a standardized test from scratch, one that I might hope could satisfy both institutions seeking objective metrics and students seeking a study experience that wasn’t completely awful, what would I imagine?

Part 2 will pick up here when I feel like it.

(note: the commenting setup here is experimental and I may not check my comments often; if you want to tell me something instead of the world, email me!)