Last week the New York Times published an article on a possible Obama effect on test scores of black test takers. It was unusual for a major newspaper to publish a story on a social science study before that study has been published, let alone reviewed. But when you hear that so-and-so reported their results at some national conference, that isn’t really peer reviewed either. The conference organizers have often only seen a 200 word description of what the researchers thought they would present. So although unusual, it’s not entirely out of line to try to get the first step on a story like this, and the Times did circulate the study to some academics to get professional opinions.
Let me say at the outset that I hope the central result is true. The authors claim that they gave a short academic aptitude type test to black and white test-takers. When they administered the test last summer, they noted a difference between average scores for blacks and whites. However, after (now) President Obama had received his party’s nomination and given his acceptance speech, the difference in scores disappeared. The theory is that Obama’s rise has had a positive motivating influence on test taking performance.
The story has legs because there is a well-documented body of research on test performance, and how it can be affected by contextual cues. You can start with the cultural beliefs about aptitude tests in general. If there is a belief among one target group that the tests always show underperformance, then that belief can have a self-fulfilling aspect. Researchers have experimentally manipulated that contextual clue by describing tests differently to participants before they take them. Researchers have also manipulated the race and gender of the test administrator and done a variety of clever tricks to see to what extent performance can be affected by context. One enterprising team actually had women of Asian heritage take a math test, randomly dividing them into one group who answered a questionnaire designed to get them to think of their female identity, while the other group answered questions about their Asian identify. Guess what? One group underperformed relative to the other, and because the study was conducted as a randomized experiment, the authors are allowed to infer that their contextual manipulation caused the differences in performance.
So I’m sympathetic to the study described in the Times, and I fully appreciate the research tradition it comes from. That said, there are a couple of warning flags about the study. First, it is unclear from the Times piece whether there was any reference at all to Obama before the participants took the test. If not, then the story must be that if there was a difference in performance over time it was because Obama was “in the air”. That’s true enough – he certainly was in the air. The country was electrified. But most studies on test taking performance try to make the contextual cue more closely connected to the test taking event. Lots of things happened from last summer to now…millions of jobs were lost, the stock market tanked, Tom Brady was injured, and the seasons changed.
But the more worrisome concern is the quality of the data. Based on the Time article, it seems like there were four tests, and at each occasion there were maybe 20 black participants. Furthermore, the age range of the participants was around 50 years. I don’t want to make your eyes swirl with statistical mumbo jumbo…but let me throw out these two points. The degree of sampling variability from occasion to occasion would be huge. Would you trust the results of an opinion poll that gathered a group of 20 participants? So why trust the results of a test taken by 20 people? It’s all the more problematic that the researchers are trying to prove a lack of difference. With such a small sample size, and such wide variability of participants in age and occupation, it becomes very difficult to prove that a difference exists. But as I have to remind my PhD students everyday – failing to prove that there is a difference is not the same thing as proving that there is no difference. Their eyes swirl at me too.
Come to think of it, it makes you wonder why everyone is looking at the data in this particular way. The story is that on the one testing occasion before Obama’s meteoric rise, there was a black white difference, and then it disappeared over the next three testing occasions. The implicit reasoning is that something has happened. But why privilege the summer result so much? Why not ask “What was happening last summer that made a black-white difference show up?” Why assume that that result is somehow “true” and that it has recently “disappeared”?
At any rate, more data is already in hand. There have been several administrations of the SAT during the election run, and even one since President Obama’s inauguration. Let’s take a look at the national trend based on millions of scores. I’d be very happy if there is something to write about. I personally expect that there will be something to write about over time, but I also believe that the evidence is going to take some time to develop. Let’s hope the New York Times is still paying attention then, and not just trying to front-run another study that has barely been mailed out for review.