Resources

Subscribe by Email

Your email:

Helpful Links

Criteria’s Employee Testing Blog

Current Articles | RSS Feed RSS Feed

The Wonderlic as a Predictor of Performance in the NFL

Digg digg it | Reddit reddit | del.icio.us del.icio.us | StumbleUpon StumbleUpon 

This Saturday is the NFL draft, which means that NFL scouts have spent the past months going over 40-yard dash times and college game tapes, and fans have debated which prospect would be the best fit for their team. It also means it's time for media and fans to recycle the usual punchlines about the folly of using an aptitude test like the Wonderlic on NFL prospects. Football, more than any other American team sport, is about physicality, and the idea that performance on an aptitude test could have much to do with success on the football field seems absurd. Skeptics point out that a low Wonderlic score didn't prevent Dan Marino from becoming one of the most prolific passers in history, or Vince Young from making the Pro Bowl in his rookie year. When Criteria works with customers to gather evidence for the validity of our employment tests at their organization, we sometimes hear similar anecdotes. I've often heard HR managers express concern that "one of our best performers did poorly on the test." (Criteria has an aptitude test, the CCAT, that is similar to the Wonderlic.) Such reactions are understandable, but the measure of a test's predictive validity can't be judged from one test score--the only meaningful way to measure a test's ability to predict productivity is to study the correlations between test scores and job performance across a broad sample of people. Based on this standard, the Wonderlic may be a better predictor of performance in the NFL than you might think.

Two business professors from the University of Louisville recently did such a study with NFL data. They correlated test scores with performance measures and concluded that there was no association between test scores and performance in the NFL. If there is no association between the two, why is the Wonderlic used on NFL prospects? The study was critical of the selection measures used by the NFL.

This is the kind of study we often conduct for our clients, BUT we also point out that you have to be careful when evaluating how well a selection measure predicts performance. Success criteria must be chosen appropriately, and the sample has to be appropriate. I have concerns with exactly these issues in the Louisville study.

As a performance measure, the authors use average salary in a player's first three years as one of the "success metrics," but any football fan knows that a player's salary in his first years in the league is a function of draft order, not performance in the league, since he hasn't played any games when he signs a contract. The authors also use draft order as a "success measure." Both draft order and first-year salary are meaningful measures of a player's success only from the point of a view of the player--they reflect the collective wisdom about a player's future prospects. To owners and fans, on-field performance after entering the NFL is a much more meaningful measure of productivity.

The second problem with the study is that the authors include everyone in the performance evaluation, even if they never had a chance to perform. They found data on 68 quarterbacks drafted between 1999 and 2004, and included them all in the analysis comparing test scores to "success." The problem is that many of these QBs saw no or limited action in the NFL. So what does it mean to assess their performance when they didn't get to perform?

We tried a similar study by using data from NFL.com and other websites to find data on QBs drafted between 2000 and 2004. (We didn't use data from before 2000 because the data on players scores is unreliable and incomplete.) The simplest way to measure the predictive validity of an employment test is to compare test scores to one or more metrics used to measure productivity in a given job. We chose QBs because that position requires the decision-making and problem solving skills that aptitude tests are supposed to measure, and as productivity metrics we chose yards passing and number of TDs thrown in the first four years (four years is the average length of an NFL player's career.) Passing yards and TDs thrown aren't the perfect metric (did you know Joey Harrington threw for more yards in his first four years than did Tom Brady, who didn't start until his second year?) You can check out the data we used here: there were 68 QBs drafted from 2000 to 2004, but we eliminated the 5 QBs for whom we couldn't find Wonderlic scores, as well as two others who ended up playing other positons (Ronald Curry) or other sports (Drew Henson).

The data is very interesting. If you look at the data for all 61 QBs, there is only a fairly weak correlation between aptitude and passing yards (r=.19) and TDs thrown (r=.20) But we made a plot of the test scores (Figure 1, x-axis) and the passing yards (y-axis) and saw that the story was much more complicated than that. As it turns out, there does appear to be a strong association between test score and performance (yds thrown)--you just don't see it until you look at QBs who threw for 1000 or more yards (which is where we put the horizontal line).

A performance measure can have multiple meanings. Some QBs don't throw for many yards because they barely get on the field, and this can happen for many reasons; they might not be good enough, but they also could be drafted to a team with a good starter in place, or get injured, etc. Below the 1000 yards passing mark, the data are all spread out across the score spectrum--there is no correlation there. Above the line, however, the correlation is a whopping r=.51 (r=.49 for TDs thrown), right up there with some of the strongest coefficients reported anywhere in organizational psychology.

Another way to look at the strength of these correlations is that for this sample, the QBs who scored below the median Wonderlic score (for QBs) of 27 averaged 5,202 passing yards and 31.2 TDs over their first four years, whereas those scoring above the median averaged 6,570 yards and 40.8 TDs over the same period. Seems like the cognitive measure might be worth something after all!

Of course, you might ask where should the cut-off be? How did we pick 1,000 yards? We tried it again with QBs who had started more than 5 games, and the same pattern replicates, but there is a bit of a caveat. Craig Krenzel, who studied molecular genetics at Ohio State and had a less-than-stellar stint with the Bears, scored very high on his aptitude test. He threw for about 800 yards in the pros, and started 5 games. If you change the thresholds and include him, then the overall predictive validity of the aptitude test goes down a little--that's the problem with small sample sizes--the anecdotes can actually affect the statistics in a non-trivial way.

All in all we think the data linking aptitude test scores with NFL performance is much more interesting than is currently recognized. In fact, for QBs drafted between 2000 and 2004 the data suggest there is a definite link between aptitude test scores and on-field performance. And we went through this exercise because it illustrates a lot of lessons we try to share with our customers. Think carefully about the measure of performance; make sure you recognize that there can be many reasons for good or bad performance that are unrelated to the test; plot your data so that you can visualize what's going on; and beware of making inferences in small samples.

Comments

Very interesting article, and it's extra-nice that you shared the data. Where do you get Wonderlic scores for players?
Does the same trend hold for non-QB positions?
Posted @ Thursday, April 24, 2008 1:22 PM by Yoav Shapira
Wow, I just sent this to the NY Mets front office. Maybe we can stop wasting $ on bad players.
Posted @ Thursday, April 24, 2008 1:32 PM by Dan Tyre
Yoav, the Wonderlic scores were pulled from several websites, they are pretty widely available. Interesting question about other positions--we haven't looked at those yet, but my suspicion is that there wouldn't be as much of a correlation for non-QB positions, as the QB position requires more decision making skills than most other positions.
Posted @ Thursday, April 24, 2008 2:11 PM by Josh Millet
As a former college lineman I am always fascinated by the Wonderlic. I actually think that it doesmake sense for QBs, and even some other positions that require decisionmaking and analysis - like wide receivers changing routes based on coverage and linebackers too, and even a center or guardthat needs to be able to djust and call blocking schemes based on the defensive alignments.
Posted @ Thursday, April 24, 2008 2:25 PM by Mike Volpe
Interesting stuff! Keep it comin'!
Posted @ Thursday, April 24, 2008 6:11 PM by Bryan Baldwin
Fantastic article, Josh. As a science writer I am often trying to find ways to help people understand the fundaments of math and science and demonstrate how they can be found in everyday circumstances. I recently blogged about the intuitive, yet incorrect assessment of probability most of us make when confronted with a situation called The Monty Hall Problem. Our intuitive sense of a situation can often mislead us. Assessing instant cases (or small sample sizes/anecdotal evidence) is how we evolved to think about the world. While they are often adaptive, they aren't always correct. Proper use of tools like Criteria's testing software and proper understanding of how to interpret data will make for better predictions and higher productivity. Who doesn't want that? Now, for the most important question: Who should the Jets draft?
Posted @ Saturday, April 26, 2008 2:48 PM by damon gambuto
Doesn't the removal of the guys with less than 1000 yards really skew your results? You have a bunch of guys with scores in the low 30's who are under the cutoff. If Wonderlic really is a good predictor of success, shouldn't those guys be successful, too?
It seems like you've rather arbitrarily removed a big chunk of your sample, nearly half of which are guys who don't support your thesis.
Posted @ Friday, May 23, 2008 3:32 PM by Jon
I wonder if NFL teams like to have a high IQ Brian Griese-type QB buried on the taxi squad who, in case the top two quarterbacks are injured, can learn the playbook fast and not throw too many interceptions until somebody more talented gets healthy again.
Posted @ Friday, May 30, 2008 7:13 PM by Steve Sailer
Maybe the NFL uses the Wonderlic for purposes of negotiating contracts? "This kid got 6 right out of 50 and he's being represented by his uncle -- offer them a $10 million dollar contract with $9 million deferred until 2040."
Posted @ Friday, May 30, 2008 7:15 PM by Steve Sailer
With respect to Jon's comments about excluding players with less than 1000 yards passing skewing the results, I think it is important to point out that there are many physical talents and attributes that contribute to QB success in addition to the proposed link between success and psychometric intelligence. There is probably some minimum floor in size, speed, arm strength, hand eye-coordination, etc. to be a successful NFL QB. My hunch is that many of the high wonderlic-scoring QBs that don't play lack these other attributes. It would be interesting if the authors of the above study would expand their examination by adding other variables from the combine (e.g., size, speed, passing tests, etc.) to their dataset and perform a multiple regression analysis so that we can see the relative importance of each of these variables. I suspect that psychometric intelligence will still be found to have a large impact on QB success when these other variable are held constant.
Posted @ Monday, June 02, 2008 11:54 PM by PhillyGuy
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics

Receive email when someone replies.