Resources

Subscribe by Email

Your email:

Helpful Links

Criteria’s Employee Testing Blog

Current Articles | RSS Feed RSS Feed

Don't Ask, Don't Tell: The New Rules of the SAT and College Admissions

  | Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn 

Today's blog post is by Dr. Howard Wainer, who is the Distinguished Research Scientist at the National Board of Medical Examiners, as well as Professor of Statistics at the Wharton School of the University of Pennsylvania. Dr. Wainer received his Ph.D. from Princeton Univeristy, has won numerous scholarly awards, and spent 21 years as Principal Research Scientist in the Research Statistics Group at the Educational Testing Service.  He is also, as far as we know, the only member of Criteria's Scientific Advisory Board to have swam the English Channel.

 

On September 22, 2008, the New York Times carried the first of three articles about a report, commissioned by the National Association for College Admission Counseling, that was critical of the current college admission exams, the SAT and the ACT. The commission was chaired by William R. Fitzsimmons, the dean of admissions and financial aid at Harvard.

The report was reasonably wide-ranging and drew many conclusions while offering alternatives. Although well-meaning, many of the suggestions only make sense if you say them fast.

Among their conclusions were:

  1. Schools should consider making their admissions "SAT optional," that is allowing their applicants to submit their SAT/ACT scores if they wish, but they should not be mandatory. The commission cites the success that pioneering schools with this policy have had in the past as proof of concept.
  2. Schools should consider eliminating the SAT/ACT altogether and substituting instead achievement tests. They cite the unfair effect of coaching as the motivation for this — they weren't naive enough to suggest that because there was no coaching for achievement tests now that, if they became more high stakes coaching for them would not be offered. Rather, they argued that such coaching would be related to schooling and hence more beneficial to education than is coaching that focuses on test-taking skills.
  3. That the use of the PSAT with a rigid qualification cut-score for such scholarship programs as the Merit Scholarships be immediately halted.

I will not attempt to discuss all three of these here, just the first one — if there is sufficient interest shown in this topic this entry will be followed by others. 

Has the admissions process been hampered in schools that have instituted an SAT optional policy?

The first reasonably competitive school to institute such a policy was Bowdoin College, in 1969. Bowdoin is a small, highly competitive liberal arts college in Brunswick, Maine. A shade under 400 students a year elect to matriculate at Bowdoin, and roughly a quarter of them choose not to submit their SAT scores. In Table 1 is a summary of the classes at Bowdoin and five other institutions whose entering freshman class had approximately the same average SAT score. At the other five institutions the students who didn't submit SAT scores used ACT scores instead. 

  All
Students
Submitted
SAT Scores
Did not
Submit
Institution N N Mean N
Northwestern University 1,654 1,505 1347 149
Bowdoin College 379 273 1323 106
Carnegie Mellon University 1,132 1,039 1319 93
Barnard College 419 399 1297 20
Georgia Institute of Technology 1,667 1,498 1294 169
Colby College 463 403 1286 60
Means and Totals 5,714 5,117 1316 597
Table 1: Six Colleges/Universities with similar observed mean SAT scores for the entering class of 1999.

To know how Bowdoin's SAT policy is working we will need to know two things. First, how did the students who didn't submit SAT scores do at Bowdoin in comparison to those students that did submit them? And second, would the non-submitters performance at Bowdoin have been predicted by their SAT scores?

The first question is easily answered by looking at their first year grades at Bowdoin. These are shown in Figure 1 below.

Bowdoin students who did not send their SAT scores performed worse in their first year courses than those who did submit them.

We see that non-SAT submitters did about a standard deviation worse than students who did submit SAT scores. And so, we can conclude that if the admissions office were using other variables to make up for the missing SAT scores, those variables did not contain enough information to prevent them from admitting a class that was academically inferior to the rest.

But would their SAT scores have provided information missing from other submitted information? Ordinarily this would be a question that is impossible to answer, for these students did not submit their SAT scores. However, all of these students actually took the SAT, and through a special data-gathering effort at the Educational Testing Service we find that the students who didn't submit their scores behaved sensibly. Realizing that their lower than average scores would not help their scores at Bowdoin, they chose not to submit them. Below (Figure 2) is the distribution of SAT scores for those who submitted them as well as those who did not.

Those students who don't submit SAT scores to Bowdoin score about 120 points lower than those who do submit their scores.

As it turns out the SAT scores for the students who did not submit them would have accurately predicted their lower performance at Bowdoin. In fact the correlation between grades and SAT scores was 12% higher for those who didn't submit them than for those who did.

So not having this information does not improve the academic performance of Bowdoin's entering class — on the contrary it diminishes it. Why would a school opt for such a policy? Why is less information preferred to more? There are surely many answers to this, but one is seen in an augmented version of Table 1 (below).

  All
Students
Submitted
SAT Scores
Did not
Submit
Institution N Mean N Mean N Mean
Northwestern University 1,654 1338 1,505 1347 149 1250
Bowdoin College 379 1288 273 1323 106 1201
Carnegie Mellon University 1,132 1312 1,039 1319 93 1242
Barnard College 419 1293 399 1297 20 1213
Georgia Institute of Technology 1,667 1288 1,498 1294 169 1241
Colby College 463 1278 403 1286 60 1226
Means and Totals 5,714 1307 5,117 1316 597 1234

We see that if all of the students in Bowdoin's entering class had their SAT scores included the average SAT at Bowdoin would sink from 1323 to 1288, and instead of being second among these six schools they would have been tied for next to last. Since mean SAT scores are a key component in school rankings, a school can game those rankings by allowing their lowest scoring students to not be included in the average. I believe that Bowdoin's adoption of this policy pre-dates US News and World Report's rankings, so that was unlikely to have been their motivation, but I cannot say the same for schools that have chosen such a policy more recently.

Is it Time to Dump Job Interviews?

  | Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn 

Last week the New York Times published an interview with the authors of Sway, a new book that documents a series of psychological forces that lead people to disregard logic and act irrationally. One chapter in Sway deals with the phenomenon of the job interview, and describes the "first date" format of job interview that is so ubiquitous in America today. Most job interviews resemble first dates because employers utilize an unstructured "get to know the candidate" approach in which the interviewers try to establish a rapport with the interviewee, discover common interests, and form an impression as to whether the candidate will be a "good fit" at the company. The problem with "first date" interviews is that asking candidates to "describe themselves" or assess their "strengths and weaknesses" too often leads to canned answers that don't reveal much about future performance.

The authors argue that managers consistently overestimate their ability to form objective opinions based on interviews, and argue that structured interviews are much better predictors of future performance because they focus on relevant, objective data. The fact that unstructured interviews aren't a very good way of gathering objective data on candidates isn't news to anyone who is familiar with research in this area. What's surprising is that the authors actually significantly understate their case when they conclude that:

"As counterintuitive as it sounds, you don't need interviews at all. Research shows that an aptitude test predicts performance just as well as a structured interview."

Actually, what most research shows is that employment aptitude tests are far better predictors than are structured interviews. (The authors refer often to a meta-analysis, but don't reference which meta-analysis they are using in the Notes. The most comprehensive meta-analysis of employee selection techniques is the ubiquitously cited Hunter and Schmidt from the 1990s, which shows that aptitude tests are better predictors than are structured interviews.)

Employment tests predict performance more accurately than interviews do precisely because they yield objective, relevant data about a candidate's problem-solving ability, critical thinking skills, and job-relevant personality traits. In short, they provide the kind of data that is not susceptible to "Sway" -- the authors' catch-all phrase used to describe the effect of irrational impulses on human behavior. Although I think "Sway" simplifies the issues somewhat, the authors' basic point about job interviews is very sound. After all, if you wouldn't marry someone based on hitting it off on a first date, should you be hiring based on an interview?

The Wonderlic as a Predictor of Performance in the NFL

  | Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn 

This Saturday is the NFL draft, which means that NFL scouts have spent the past months going over 40-yard dash times and college game tapes, and fans have debated which prospect would be the best fit for their team. It also means it's time for media and fans to recycle the usual punchlines about the folly of using an aptitude test like the Wonderlic on NFL prospects. Football, more than any other American team sport, is about physicality, and the idea that performance on an aptitude test could have much to do with success on the football field seems absurd. Skeptics point out that a low Wonderlic score didn't prevent Dan Marino from becoming one of the most prolific passers in history, or Vince Young from making the Pro Bowl in his rookie year. When Criteria works with customers to gather evidence for the validity of our employment tests at their organization, we sometimes hear similar anecdotes. I've often heard HR managers express concern that "one of our best performers did poorly on the test." (Criteria has an aptitude test, the CCAT, that is similar to the Wonderlic.) Such reactions are understandable, but the measure of a test's predictive validity can't be judged from one test score--the only meaningful way to measure a test's ability to predict productivity is to study the correlations between test scores and job performance across a broad sample of people. Based on this standard, the Wonderlic may be a better predictor of performance in the NFL than you might think.

Two business professors from the University of Louisville recently did such a study with NFL data. They correlated test scores with performance measures and concluded that there was no association between test scores and performance in the NFL. If there is no association between the two, why is the Wonderlic used on NFL prospects? The study was critical of the selection measures used by the NFL.

This is the kind of study we often conduct for our clients, BUT we also point out that you have to be careful when evaluating how well a selection measure predicts performance. Success criteria must be chosen appropriately, and the sample has to be appropriate. I have concerns with exactly these issues in the Louisville study.

As a performance measure, the authors use average salary in a player's first three years as one of the "success metrics," but any football fan knows that a player's salary in his first years in the league is a function of draft order, not performance in the league, since he hasn't played any games when he signs a contract. The authors also use draft order as a "success measure." Both draft order and first-year salary are meaningful measures of a player's success only from the point of a view of the player--they reflect the collective wisdom about a player's future prospects. To owners and fans, on-field performance after entering the NFL is a much more meaningful measure of productivity.

The second problem with the study is that the authors include everyone in the performance evaluation, even if they never had a chance to perform. They found data on 68 quarterbacks drafted between 1999 and 2004, and included them all in the analysis comparing test scores to "success." The problem is that many of these QBs saw no or limited action in the NFL. So what does it mean to assess their performance when they didn't get to perform?

We tried a similar study by using data from NFL.com and other websites to find data on QBs drafted between 2000 and 2004. (We didn't use data from before 2000 because the data on players scores is unreliable and incomplete.) The simplest way to measure the predictive validity of an employment test is to compare test scores to one or more metrics used to measure productivity in a given job. We chose QBs because that position requires the decision-making and problem solving skills that aptitude tests are supposed to measure, and as productivity metrics we chose yards passing and number of TDs thrown in the first four years (four years is the average length of an NFL player's career.) Passing yards and TDs thrown aren't the perfect metric (did you know Joey Harrington threw for more yards in his first four years than did Tom Brady, who didn't start until his second year?) You can check out the data we used here: there were 68 QBs drafted from 2000 to 2004, but we eliminated the 5 QBs for whom we couldn't find Wonderlic scores, as well as two others who ended up playing other positons (Ronald Curry) or other sports (Drew Henson).

The data is very interesting. If you look at the data for all 61 QBs, there is only a fairly weak correlation between aptitude and passing yards (r=.19) and TDs thrown (r=.20) But we made a plot of the test scores (Figure 1, x-axis) and the passing yards (y-axis) and saw that the story was much more complicated than that. As it turns out, there does appear to be a strong association between test score and performance (yds thrown)--you just don't see it until you look at QBs who threw for 1000 or more yards (which is where we put the horizontal line).

A performance measure can have multiple meanings. Some QBs don't throw for many yards because they barely get on the field, and this can happen for many reasons; they might not be good enough, but they also could be drafted to a team with a good starter in place, or get injured, etc. Below the 1000 yards passing mark, the data are all spread out across the score spectrum--there is no correlation there. Above the line, however, the correlation is a whopping r=.51 (r=.49 for TDs thrown), right up there with some of the strongest coefficients reported anywhere in organizational psychology.

Another way to look at the strength of these correlations is that for this sample, the QBs who scored below the median Wonderlic score (for QBs) of 27 averaged 5,202 passing yards and 31.2 TDs over their first four years, whereas those scoring above the median averaged 6,570 yards and 40.8 TDs over the same period. Seems like the cognitive measure might be worth something after all!

Of course, you might ask where should the cut-off be? How did we pick 1,000 yards? We tried it again with QBs who had started more than 5 games, and the same pattern replicates, but there is a bit of a caveat. Craig Krenzel, who studied molecular genetics at Ohio State and had a less-than-stellar stint with the Bears, scored very high on his aptitude test. He threw for about 800 yards in the pros, and started 5 games. If you change the thresholds and include him, then the overall predictive validity of the aptitude test goes down a little--that's the problem with small sample sizes--the anecdotes can actually affect the statistics in a non-trivial way.

All in all we think the data linking aptitude test scores with NFL performance is much more interesting than is currently recognized. In fact, for QBs drafted between 2000 and 2004 the data suggest there is a definite link between aptitude test scores and on-field performance. And we went through this exercise because it illustrates a lot of lessons we try to share with our customers. Think carefully about the measure of performance; make sure you recognize that there can be many reasons for good or bad performance that are unrelated to the test; plot your data so that you can visualize what's going on; and beware of making inferences in small samples.

All Posts