Today’s blog post is the second by Dr. Howard Wainer, who is the Distinguished Research Scientist at the National Board of Medical Examiners, as well as Professor of Statistics at the Wharton School of the University of Pennsylvania. Dr. Wainer is also a member of Criteria’s Scientific Advisory Board.
In an earlier post I commented on one aspect of a report, commissioned by the National Association for College Admission Counseling, that was critical of the current college admission exams, the SAT and the ACT. The commission was chaired by William R. Fitzsimmons, the dean of admissions and financial aid at Harvard.
One of the recommendations of the Commission was for colleges to consider making their admissions tests (SAT or ACT) optional. Using data from Bowdoin College, which has had such a policy for almost 40 years, I showed that those students who did not submit their SAT scores had, in fact, scored about a standard deviation lower than those students that did submit them. This isn’t surprising. More important, the students who did not submit SAT scores also performed about a standard deviation lower in their freshmen grade point average at Bowdoin. This would have been predictable from their SAT scores had the College insisted on them. My conclusion is that colleges deny themselves useful information by making SAT’s optional. And the Commission, by making their recommendations in the absence of such data, was shooting in the dark.
In this post I’d like to discuss another of their other principal recommendations:
Schools should consider eliminating the SAT/ACT altogether and substituting instead achievement tests. They cite the unfair effect of coaching as the motivation for this — they weren’t naive enough to suggest that if achievement tests were to become more high stakes coaching for them would not be offered. Rather, they argued that such coaching would be related to schooling and hence more beneficial to education than is coaching that focuses on test-taking skills.
Driving the Commission’s recommendations was the notion that the differential availability of commercial coaching made admissions testing unfair. They recognized that the 100 point gain (on the 1200 point SAT scale) test prep providers often tout as a typical outcome was hype and agreed with the estimates from more neutral sources that about 20 points was more likely. However, they deemed even 20 points too many. The Commission pointed out that there was no wide-spread coaching for achievement tests, but agreed that should the admissions option shift to achievement tests the coaching would likely follow. This would be no fairer to those applicants who could not afford extra coaching, but at least the coaching would be of material more germane to the subject matter and less related to test-taking strategies.
One can argue with the logic of this – that a test that is less subject oriented and related more to the estimation of a general aptitude might have greater generality. And that a test that is less related to specific subject matter might be fairer to those students whose schools have more limited resources for teaching a broad range of courses. I find these arguments persuasive, but I have no data at hand to support them. So instead I will take a different, albeit more technical, tack. I will argue that the psychometric reality associated with replacing general aptitude tests with achievement tests means that making the kinds of comparisons that schools need among different candidates impossible.
When all students take the same tests we can compare their scores on the same basis. The SAT and ACT were constructed specifically to be suitable for a wide range of curricula. SAT–Math is based on mathematics no more advanced than 8th grade. Contrast this with what would be the case with achievement tests. There would need to be a range of tests and students would chose a subset of them that best displayed both the coursework they had had and the areas they felt they were best in. Some might take chemistry, others physics; some French, others music. The current system has students typically taking three achievement tests (SAT-II). How can such very different tests be scored so that the outcome on different tests can be compared? Do you know more French than I know physics? Was Mozart a better composer than Einstein was a physicist? How can admissions officers make sensible decisions through incomparable scores?
How are SAT-II exams scored currently? Or more specifically, how they had been scored for decades when I left the employ of ETS seven years ago – I don’t know if they have changed anything in the interim. They were all scored on the familiar 200-800 scales, but similar scores on two different tests are only vaguely comparable. How could they be? What is currently done is that tests in mathematics and science are roughly equated using the SAT-Math, the aptitude test that everyone takes, as an equating link. In the same way tests in the humanities and social sciences are equated using the SAT-Verbal. This is not a great solution, but is the best that can be done in a very difficult situation. Comparing history with physics is not worth doing for even moderately close comparisons.
One obvious approach would be to norm reference each test, so that someone who scores average for all those who take a particular test gets a 500 and someone a standard deviation higher gets a 600, etc. This would work if the people who take each test were, in some sense, of equal ability. But that is not only unlikely, it is empirically false. The average student taking the French achievement test might starve to death in a French restaurant, whereas the average person taking the Hebrew achievement test, might do just fine if dropped in the middle of the night onto the streets of Tel Aviv. Happily the latter students also do much better on the SAT-VERBAL test and so the equating helps. This is not true for the Spanish test, where a substantial portion of those taking it come from Spanish speaking homes.
Substituting achievement tests is not a practical option unless admissions officers are prepared to have subject matter quotas. I believe that solution would be too inflexible to be feasible.