Posted by Eric Loken on Wed, Jan 28, 2009 @ 01:40 PM
Last week the New York Times published an article on a possible Obama
effect on test scores of black test takers. It was unusual for a
major newspaper to publish a story on a social science study before that study
has been published, let alone reviewed. But when you hear that so-and-so
reported their results at some national conference, that isn't really peer
reviewed either. The conference organizers have often only seen a 200
word description of what the researchers thought they would present. So
although unusual, it's not entirely out of line to try to get the first step on
a story like this, and the Times did circulate the study to some academics to
get professional opinions.
Let me say at the outset that I hope the central result is true. The authors
claim that they gave a short academic aptitude type test to black and white
test-takers. When they administered the test last summer, they
noted a difference between average scores for blacks and whites. However,
after (now) President Obama had received his party's nomination and given
his acceptance speech, the difference in scores disappeared. The theory
is that Obama's rise has had a positive motivating influence on test taking
performance.
The story has legs because there is a well-documented body of research on test performance,
and how it can be affected by contextual cues. You can start with the
cultural beliefs about aptitude tests in general. If there is a belief
among one target group that the tests always show underperformance, then that
belief can have a self-fulfilling aspect. Researchers have
experimentally manipulated that contextual clue by describing tests differently
to participants before they take them. Researchers have also manipulated
the race and gender of the test administrator and done a variety of clever
tricks to see to what extent performance can be affected by context. One
enterprising team actually had women of Asian heritage take a math test, randomly
dividing them into one group who answered a questionnaire designed to get them
to think of their female identity, while the other group answered questions
about their Asian identify. Guess what? One group underperformed
relative to the other, and because the study was conducted as a randomized
experiment, the authors are allowed to infer that their contextual manipulation
caused the differences in performance.
So I'm sympathetic to the study described in the Times, and I fully appreciate
the research tradition it comes from. That said, there are a couple of
warning flags about the study. First, it is unclear from the Times piece
whether there was any reference at all to Obama before the participants took
the test. If not, then the story must be that if there was a difference
in performance over time it was because Obama was "in the air". That's true
enough – he certainly was in the air. The country was electrified.
But most studies on test taking performance try to make the contextual cue more
closely connected to the test taking event. Lots of things happened
from last summer to now...millions of jobs were lost, the stock market tanked,
Tom Brady was injured, and the seasons changed.
But the more worrisome concern is the quality of the data. Based on the Time
article, it seems like there were four tests, and at each occasion there were
maybe 20 black participants. Furthermore, the age range of the
participants was around 50 years. I don't want to make your eyes
swirl with statistical mumbo jumbo...but let me throw out these two points.
The degree of sampling variability from occasion to occasion would be
huge. Would you trust the results of an opinion poll that gathered a
group of 20 participants? So why trust the results of a test taken by 20
people? It's all the more problematic that the researchers are trying to
prove a lack of difference. With such a small sample size, and such
wide variability of participants in age and occupation, it becomes very
difficult to prove that a difference exists. But as I have to
remind my PhD students everyday – failing to prove that there is a difference
is not the same thing as proving that there is no difference. Their eyes swirl at me too.
Come to think of it, it makes you wonder why everyone is looking at the data in
this particular way. The story is that on the one testing occasion before
Obama's meteoric rise, there was a black white difference, and then it
disappeared over the next three testing occasions. The implicit
reasoning is that something has happened. But why privilege the summer
result so much? Why not ask "What was happening last summer that made a
black-white difference show up?" Why assume that that result is somehow
"true" and that it has recently "disappeared"?
At any rate, more data is already in hand. There have been several
administrations of the SAT during the election run, and even one since
President Obama's inauguration. Let's take a look at the national trend
based on millions of scores. I'd be very happy if there is something to
write about. I personally expect that there will be something to write about
over time, but I also believe that the evidence is going to take some time to
develop. Let's hope the New York Times is still paying attention then,
and not just trying to front-run another study that has barely been mailed out
for review.