Criteria's Employee Testing Blog

Resumés are Unreliable

In the past week, we got another high profile reminder of just how widespread the problem of “resumé-enhancement” has become. Yahoo’s latest CEO Scott Thompson is now under fire because his resumé incorrectly states that he holds a Bachelor’s degree in Computer Science, when in fact his degree is in Accounting. This disclosure is only the latest instance of a high profile executive being damaged by inaccuracies or exaggerations in his or her resumé.

The remarkable part of this latest episode of resumé padding is that it went undiscovered for so long. Not only did the search committee at Yahoo not notice the mistake, but apparently neither did his previous employers (including PayPal).  While it’s hard to imagine how a company could fail to verify the basic facts of a prospective CEO’s biography, the underlying issue with this story is that it highlights just how problematic resumés are as information-gathering devices for employers.

This is because resumés are pieces of content generated by candidates to present themselves in the best possible light. When a candidate crosses the line from embellishment to prevarication, the misinformation usually goes uncorrected because it’s difficult for a company to verify every detail in a resumé. Because reviewing a resumé for a candidate usually happens near the top of the hiring funnel, it’s impractical and time-consuming to follow up on every fact in every resumé that a company receives.

Given how unreliable resumés are (and how ineffective they are as predictors of job performance, as many studies have shown) it’s surprising how much attention they still receive. But thankfully this is an area where pre-employment testing can help; by gathering objective, verifiable data on candidates early in the hiring process, tests can help hiring managers filter through large applicant pools, and allow them to spend more time reviewing (and verifying) resumé information for the candidates who seem to be the best fit.

Tagged , , Leave a comment

Integrity Test Added to HireSelect

Here’s a link to a press release we put out today announcing the newest addition to our test portfolio. The Workplace Productivity Profile (WPP) is a personality assessment designed to be used for lower and mid-level positions for which an employer needs trustworthy, reliable, and conscientious employees. The WPP has actually been live in HireSelect for more than a month, as we soft launched it in March. I am going to do a blog post with more details on integrity tests later, but if you are interested in learning more about it check out this info on our integrity test.

Leave a comment

Learning analytics in college: Predicting the grades you’ll get…that is… earn

Everyone wants to compare themselves to Netflix, whose data-driven, personally tailored movie suggestions improve customer satisfaction and retention.  Among the latest domains to see this trend: “learning analytics” in higher education. The basic idea is to use institutional data to help students successfully navigate towards their college degrees.  Doesn’t sound controversial yet – data-driven decision making is usually just plain common sense.

But the details can get a little tricky. Consider the following effort  out of Austin Peay State University in Clarksville TN.  What caught our attention was that in addition to sensible suggestions for ways to meet course requirements en route to graduation, the system also predicts the grades students are expected to receive in their upcoming classes.  The author of the article is impressed with the accuracy of prediction, saying that end of semester GPA was predicted within 0.02 on the 4 point scale, and that individual class grades were predicted within .6.  Furthermore, the probability of getting a C or better was predicted with 90% accuracy.

Seems like one of those numbers is very likely wrong, and the others are not that impressive given a little thought.  There is no way that end of semester GPA is predicted to within 0.02 unless previous GPA is used as part of the prediction, in which case the weighted sum might not move much.  To predict individual class grades, keep in mind that offering a certainty of +/- .6  means offering a range as wide as A to B– .  A prediction interval that wide, centered on the course average from previous semesters, will cover most of the students in the class.  Add in a tweak for the student’s own prior GPA and voila, good prediction. There’s not much magic required to achieve that level of accuracy and I’m sure any university could replicate it in a heartbeat.

But would that be wise? Injecting grade predictions into the student decision making process is likely to affect choices.  Oh, of course the student will decide which courses to take based on interests, goals, sense of personal identity, and other thoughtful and reflective criteria.  But like all rational agents they will also want to maximize their utility and it would be hard to ignore differences in expected reward (grade) “all else being equal”.  That kind of feedback could easily operate as a market force changing the distribution of supply and demand.  And instructors might begin to feel pressure to make their classes easier if they see students voting with their feet.

Isn’t it already good practice to notify a student during a course that their performance on assignments and tests has them at risk for failure? Sure, but such feedback is focused on those who are at risk for facing the consequences (financial and otherwise) of failure, and is based on evaluation of effort and achievement in the course.  Whatever the predicted outcome for a student before a course begins, a grade must still be earned.   Netflix predicts how someone will rate a future movie experience based on how they rated previous experiences.  But a grade (or salary or other reward) must be earned, and the prediction accuracy rests entirely on the assumption that future effort will resemble past effort.   A college student makes a sequence of choices – they choose which classes to take and then they choose whether to expend the necessary effort to succeed.

And here’s where it is unclear how much an individual’s course selection should be driven by their own prior success, and by comparison with the average successes of others. Should we tell people their actual likelihood of overcoming difficult challenges?  The smart money will always bet against anyone trying to lose weight or quit smoking. Should predicted failure be emphasized up front?  Maybe, as it’s certainly not irrelevant to have full knowledge. But what about the risk of discouraging effort, and of enabling complacency? For all the virtues of helping college students avoid getting themselves into classes that are too difficult, there is considerable risk for driving a market for classes that are too easy.  It sounds great to make data-driven decisions, but sometimes introducing data into a system can create unintended feedback dynamics.

Here’s some other suggestions for so-called “learning analytics” in higher ed: we could apply “learning analytics” to actual learning, and guide instruction and assessment within classes in a tailored and evidence based manner.  Or we could build a course selection system with a much broader view.  If it’s fair game to predict grades in upcoming courses, how about also giving hard data on the career success of students who have taken those courses? That would extend the decision making horizon and add more context to the process. And how about also expanding a course selection system to the broader higher education market, including online providers?  Imagine an advising system that suggested a dozen alternatives for meeting a linear algebra or intro stats requirement, each much cheaper than the cost of the resident instruction class at the student’s current school.  That would be a disruptive technological innovation.

Leave a comment

Chrome and Mac users are smarter than IE users

by Josh Millet and Eric Loken.

This week the website Calcudoku revisited the question of intelligence differences among users of different web browsers. You may recall the frenzy last summer when a phony report circulated that Internet Explorer users were less intelligent.  That hoax should have been easily spotted because the fabricated data suggested a massive gap on the IQ scale.  Anyone with any knowledge of psychological testing should have immediately understood that the data were impossible. The Calcudoku report is interesting because it uses real data on time to completion of the online puzzles, and also concludes that Internet Explorer users are not as numerically talented. Google CEO Larry Page was, not surprisingly, excited by the findings.

We have access to much more definitive data on this topic.  We provide online pre-employment assessment solutions with a suite of tests covering a broad range of skills and aptitudes. One of our tests, the CCAT, measures critical thinking skills and is often assigned to applicants for higher level positions (managers, analysts etc.).  We have six years of data (more than 1.3 million tests) and recently started tracking browser and operating system information.

Why should any of this be interesting? Many observers last summer and this week lament the emphasis on measures of trait intelligence. Last summer’s hoax was motivated by a software engineer exasperated about wasted hours spent making web code compatible with old browsers like IE 6. He felt that it was stupid to waste so many person-hours, and so he concocted a hoax to blame the users for being stupid. It was an attempt to use “low intelligence” as an insult.  Sensitive as we are to the misuse of intelligence tests to justify prejudice based on gender or race, we have wondered about the social value of any of these discussions.

However, for many people (not all), computer platform and web browser represent a choice.  They represent behavioral choices, and it is of some interest to see if those choices are associated with a trait measure of intelligence. This is certainly relevant to how companies like Apple and Google target their markets. Furthermore, we reasoned that it is better to share good data rather than allow fake or incomplete data to spread. So without further ado, here we go…

The CCAT is a timed test with maximum score of 50.  Scores greater than 40 are rare, and the mean for our sample is in the low to mid twenties. (Keep in mind that on our website, this test is selected by employers trying to fill jobs with greater than average complexity and responsibility, so the pool is tilted to the upper end of the distribution already.)  When we break out the results of 14,264 tests by browser, we see clear differences.  Internet Explorer users are lagging Chrome, Safari and Firefox users by approximately a ¼ standard deviation. The difference is highly statistically significant (F(4,14259) = 66, p < .0001), and it is of modest practical significance (it would correspond to something like 3 or 4 points on an IQ scale).

Mean CCAT Score by Browser

Browser Mean Std. Deviation N
Chrome 24.40 8.13 2,613
Firefox 24.01 7.85 3,022
Internet Explorer 22.16 7.27 7,522
Opera 20.11 7.70 19
Safari 24.23 7.10 1,088
Total 23.12 7.61 14,264

We’ll discuss it a bit more below, but first let’s look at operating systems. When we isolate the Mac OS X users (1706) from the rest of the pack (almost exclusively Windows), we see a similar and slightly stronger difference.   The mean score for Mac users was 25.26 and the mean for the rest 22.83.  This one-third standard deviation difference was of course highly statistically significant (t(14262) = 12.4, p < .0001).

Platform Mean Std. Deviation N
Windows 22.83 7.62 12,558
Mac OS X 25.26 7.25 1,706
Total 23.12 7.61 14,264

So what does it all mean?  On the one hand, it’s not worth getting worked up about these results. After all, the groups are only separated by a small amount – 2 to 3 questions out of 50 for an average difference – and this means that the within group variance is much greater than the between group differences. The overlap in the distributions is high, and it is only with marginally higher probability that we would expect any randomly chosen Chrome user to outperform an IE user. No employer should interpret browser use as predictive of the ability of a single prospective employee. And no individual computer user should feel that their browser choice reflects something definitive about their abilities. On the other hand, such robust differences in group means have a basis, and there are implications for the tails of the distribution.  About 2.2% of applicants score 40 points or higher on this test. Even though IE users outnumber all the other browser types combined in the overall sample, they are a 2:1 underdog among those scoring 40 or above. Among Chrome users, more than 1 in 25 scored 40+, while among IE users it was 1 in 75.

The upshot is that there is definitely substance to the recent claims about intelligence differences among users of different browsers and operating systems. We find that Chrome, Firefox and Safari users significantly outperform IE users in a pre-employment assessment designed to measure higher order thinking skills. As Larry Page’s post suggests, Google (and Apple) might instinctively feel some pride in appealing to a higher ability demographic. This has come about either through clever marketing, or because the products have an intellectual appeal. But let’s not get carried away with any of this. By definition Google and Apple are hungry for market share, and that means selling to everyone. Care to bet how things will look 15 years from now?  It’s possible we’ll be saying “Google Chrome is the new IE,” and the tables in this post will look awfully out of date.

6 Comments

Facebook Profiles as Predictors of Job Performance? Maybe…but not yet.

Some newspapers and radio stations recently picked up a story that Facebook profiles can be revealing, and can yield information more predictive of job performance than typical self-report personality questionnaires or even an IQ test.

So first off, this is clearly an interesting idea. A rich Facebook profile contains information about a person’s actual behavior, and past behavior is the best predictor of future behavior.  Every parent knows that a quick Facebook search can reveal a lot about a potential new babysitter.  One wonders if that could scale to the corporate level for hiring, and whether it would be ethical to do so?

But let’s start with the surprising assertion, at least as represented in the LA Times story, that the Facebook profile ratings were better predictors of job performance than an IQ test.  A most consistent finding from the last 50 years of organizational psychology research is that cognitive ability is the strongest predictor of job performance, sometimes followed closely by measures of conscientiousness (and recently there has been interest in perseverance or grit).  So has the Facebook study upended all this established research?  Not at all, and the reason lies in the enormous gap between the claims about the study’s outcomes, and the details of what was actually done.

The researchers had two college population samples.  In Study 1 they had job performance ratings for the part-time college jobs of about 10% of the original sample.  But in study 1 they did not have any IQ or cognitive ability measure.  In Study 2 they gathered Wonderlic’s measure of cognitive ability, but this time they had no job performance data but rather college GPA which they say is correlated with job performance.   And it should be put into context too that only some of the college students were careless enough to have publicly available facebook profiles.  All in all this particular research has very little of value to add about predicting job performance in any real world setting.

Ultimately the clues we reveal about ourselves as we socialize and work on the web will prove to be highly predictive of job performance.  After all, internet marketing based on search and social network behavior is enormously successful.  It is obvious that job performance and health and any number of other future high-stakes outcomes will be predicted with increasing accuracy from online behavior, and a great number of ethical questions will ensue.  But those fascinating developments will come from better data and better designed studies than the material recently referenced by the LA Times.

1 Comment

Help! One of my top performers bombed your test!

The most effective method we have for selling our pre-employment testing software, HireSelect, is our 30-day free trial. It allows prospective customers to try the tests, preview the software, and ask our sales team questions about how to best use HireSelect. We also encourage people to evaluate HireSelect by administering the tests to a group of their existing employees. Since companies have a good idea of how their existing employees are performing, testing incumbents can be an effective way to analyze the accuracy and predictive validity of our tests. Most testing companies won’t let potential customers preview their tests in such a comprehensive way, but for us it’s a great sales tool-we have plenty of evidence about the predictive accuracy of our tests, and we want to make sure people see the value in our assessments before they invest in using our service.

But one scenario that we face is when we get comments like, “I don’t know… one of my top performers failed your tests.” Our sales staff will hear comments like this from people who doubt the effectiveness of the tests because of a notable case where the results don’t correspond with what they know about a particular employee. When this happens, we often ask if they’d be willing to share performance data for the employees they tested. We often get back something like this (the data below is not real, but is pretty typical of the data sets we frequently review):

Employee # CCAT Percentile SalesAP Score Monthly Sales
1 71 Highly Recommended $69,243
2 34 Not Recommended $67,445
3 84 Recommended $55,767
4 71 Highly Recommended $50,240
5 61 Recommended $46,772
6 58 Not Recommended $41,389
7 92 Recommended $40,102
8 65 Recommended $37,655
9 45 Highly Recommended $34,241
10 74 Recommended $31,498
11 53 Recommended $31,400
12 65 Recommended $30,084
13 45 Recommended $29,751
14 50 Not Recommended $27,782
15 41 Recommended $26,997
16 45 Not Recommended $24,408
17 29 Highly Recommended $21,126
18 38 Not Recommended $18,665
19 78 Recommended $12,505
20 34 Not Recommended $9,449
0.34 0.25

In this case it seems clear that employee #2, who is one of the company’s top performing salespeople, didn’t do very well on either the Criteria Cognitive Aptitude Test (CCAT) OR the SalesAP, our sales personality test. In employee #2′s case, the test “didn’t work” in the sense that it dramatically under-predicted her potential. But in any sample of any size, there can always be cases where the test results “didn’t work”; no test is a crystal ball. But the way we should evaluate the predictive accuracy of selection tools is by looking at the whole data set, to see how well the tests predicted performance across the sample population. With this in mind, take another look at the table above.

If you are looking for instances the test “didn’t work” you might also notice that employee #19 got good scores on both tests, but evidently can’t sell a lick. But other than these two outliers, the correlation between test results and job performance (as measured in this case by monthly sales) is pretty strong. How can we be sure of this? (Besides noticing that the scores at the top of the chart, which is sorted by monthly sales, tend to be higher than those at the bottom.) Organizational psychologists measure the predictive validity of a test by calculating a correlation coefficient — a measure statisticians use to represent the strength of a relationship between two things: in this case test scores and job performance. The correlations for the two tests in this case are .34 and .25, respectively. A correlation coefficient can range from -1 (perfectly uncorrelated) to 1 (perfectly correlated): for a pre-employment test a correlation of .21 -.35 is likely to be useful–anything higher than .35 is very beneficial as a predictor. Correlation coefficients of .34 and .25 are respectable: although this particular sample is small, a 20 person sample is much more representative than a one person sample. Calculating the correlation coefficient is a great way to combat “the curse of the anecdote;” letting one prominent data point obscure the trend that is the real story of this data set. The scatter plot below provides another way to visualize this data — it shows that as CCAT scores increase, so does performance — with the two notable outliers as exceptions to the rule. Remember, don’t look at anecdotal evidence if you have a whole data set to examine.

CCAT Scores and Performance

Tagged , , , 2 Comments

America’s Computer Literacy Problem

As we announced in this blog post earlier this year, our newest test is called the Computer Literacy and Internet Knowledge test (CLIK). We developed the CLIK because many of our customers  requested a test of general computer literacy.  The CLIK consists of two short simulations in which the test-taker is asked to perform basic tasks (opening a document, copying and pasting, sending an email, doing a Google search, etc.) on a simulated desktop, followed by some multiple choice questions.   The CLIK has quickly become one of our most popular tests, which to me is a sign that employers are definitely seeing the need for a test that measures basic computer skills, rather than specific knowledge of a particular application, like Microsoft Excel of Word.

As with all of our tests, we have monitored the data collected from the CLIK, and we recently did a thorough analysis of item-by-item responses for 20,000 CLIK administrations. The findings were pretty surprising. First of all, 24% of all test-takers received an overall score of “Not Proficient.”  But the more alarming data came from the item-by-item analysis, which showed that some very basic elements of computer literacy were not performed correctly by large numbers of test-takers.  Specifically, 37% of people were unable to retrieve basic information through a Google search, 32% were unable to correctly format and send an email, and 21% were unable to copy and paste a text passage.

Now, we should caution that the sample of people who took the CLIK may not be representative of the general population.   Our customers tend to administer the CLIK for entry level positions for which basic computer proficiency is required, but perhaps cannot be assumed—it would be uncommon, for example, to administer the CLIK when screening for a professional position.  The CLIK tends to be used for positions like customer service reps, medical billers, clerical workers, etc.  Conversely, however, it’s also true that there are many positions for which computer literacy may not be necessary, and one would assume that the applicant pools for these positions might be made up of people with even lower rates of computer literacy.  So although it’s difficult to make any decisive conclusions on the basis of 20,000 test results, it certainly looks like America has a computer literacy problem.  The data we examined confirms what we were hearing from our customers who asked for this kind of test—too many job applicants lack basic computer proficiency.


Tagged , Leave a comment

Moneyball and Pre-employment Testing

I finally got around to seeing “Moneyball” this weekend, the movie adaptation of the Michael Lewis book of the same name. The movie documents the role played by Billy Beane, General Manager of the Oakland A’s, in transforming the way baseball teams drafted and evaluated players a decade ago. Beane and his staff pioneered the application of sophisticated statistical analysis to the process of player selection. In so doing he was able to help his chronically underfunded Oakland As compete with the big budget teams like the Yankees, whose payroll was four times that of the As. His methods have since been imitated by many other teams, including the Boston Red Sox, who used it to win two World Series championships.

The lessons of Moneyball have obvious implications that reach beyond baseball, and it has garnered some lively discussion in HR circles. Beane’s breakthrough was that he found objective, quantifiable ways to measure player potential that turned out to be more accurate predictors of on field success than the collective wisdom of baseball scouts and insiders. This is exactly the promise of pre-employment testing. Well designed tests provide employers a way to gather objective, reliable data that predicts performance more accurately than traditional, more subjective methods of employee selection such as interviews.  Most of our clients are small and medium-sized businesses, for whom hiring smarter is one of their best chances to compete with the bigger “Yankees” of their respective fields.

Leave a comment

The HAI and Tomorrow’s Jobs Report

This holiday shortened week has brought some weak economic news to the fore, and yesterday the stock market took a steep loss.  Some analysts are pointing to tomorrow’s monthly payrolls numbers as an important event that could significantly impact the markets.  As avid readers of this blog will remember, our Hiring Activity Index (HAI) is a metric based on the proportion of our customers who are actively conducting pre-employment testing in a given month. The HAI touched an all-time high for the month of May.  We were also stable and high for March and April.

We’re not sure whether this strength reflects the hiring environment, or a new maturity in our business model as we build a loyal customer base.  Either way, we’re pleased with the indicators and expect that the jobs number tomorrow will likely be decent. The consensus estimate seems to be that we’ll have added between 90,000 and 200,000 non-farm payroll jobs: if the HAI is any guide, as it has been in the past, we’d be surprised if the number isn’t on the high end of that range.

Leave a comment

More NFL Draft Selection Geekiness

Today is that time of year again, the NFL draft. Not quite the same with the labor situation overhang, but that doesn’t seem to have slowed the perennial debates about draft order and the professional prospects of various members of this year’s draft class.  Our blog posts on the NFL draft are always among our more widely read posts, and we’re very interested in the draft because it is such an iconic example of NFL teams devising methods to tackle the challenge we think about every day: devising employee selection systems that help organizations hire better and derive long-term competitive advantages.  I don’t have much new to add on the draft this year, but did want to highlight a really interesting article on a new entrant to the field.

http://www.slate.com/id/2292312

If you have thoughts on this approach let us know in the comments section.

Leave a comment