And We Call This Accountability?

"Holding our schools accountable" is repeated so often by government officials that it almost qualifies as the mantra of American education. The mechanism for accomplishing this task is the state assessment, administered annually beginning in grade 3. The results of these assessments are used to evaluate teachers, determine if a student moves forward or is placed in remediation, and in some cases, if a student can be promoted or graduate.

A reasonable person, therefore, would assume that these assessments are the very pinnacle of our ability to measure educational achievement. A reasonable person would be dead wrong in making that assumption.

The process begins with the employment of scorers. Here is how Data Recognition Corporation, a testing contractor hired by a number of states to administer and score these high stakes assessments, handles this step. The DRC website is advertising for "Seasonal Employment Opportunities," with the following:

DRC hires individuals to work in temporary full and part-time positions scoring written responses to standardized tests that are administered to elementary through high school age children. The tests scored are for various states in the subjects of reading, writing, math, science and social studies. Training is provided on the scoring criteria/rubric for each test and a qualifying exam must be passed demonstrating an understanding of the scoring process.

The recruiting events are held in multiple locations beginning on February 10, and the scorers begin their work in March. Notice that the scorers do not have to be educators. So the futures of our teachers and our children will be determined by individuals who have no required background or experience in education and have received, at most, 2 weeks of training in how to score the tests.

As the employment advertisement indicates, the tests are scored on rubrics. These are the guidelines that the scorers will use to determine what score each student will receive.

The Scoring Guidelines for the constructed response items in the math assessments assign points based on not only if the student correctly completed the math in the problem, but on how well that student wrote about the process he used. The rubrics are consistent across the states, so Pennsylvania language will be used for example purposes. Data Recognition Corporation is Pennsylvania’s testing contractor, so DRC scorers will be looking at this language…

A 4-Point response

· "demonstrates a thorough understanding of the mathematical concepts and procedures required by this task.

· Provides correct answer(s) with clear and complete mathematical procedures shown and a correct explanation as required by the task. Response may contain a minor "blemish" or omission in work or explanation that does not detract from demonstrating a thorough understanding."

So the highest point score must provide correct answers, unless in the judgment of the non-professional scorer who has no more than 10 days training in educational evaluation, the answer does not have to be correct.

The 3 Point response guideline is identical, except that instead of a "thorough understanding", the student must demonstrate a "general understanding" and the response is "mostly complete and correct."

The difference between "thorough understanding" and "general understanding" is left to the judgment of the non-professional scorers based on their 10-day-or-less training session. It would seem reasonable to think that "completely correct with a minor blemish" IS "mostly correct", but evidently that is not the case.

The 2-Point guideline calls for "partial understanding" and the 1-Point for "minimal understanding". In each case, the judgment of the non-professional scorers is the sole arbiter of how these words are interpreted.

To fully understand the lack of any objectivity in this process, compare it to Olympic figure skating. Nine different judges, each of whom is a recognized expert in the field of competitive figure skating, watch the same athlete perform at the same time, and give nine different, and sometimes widely disparate, scores. The Olympic Committee goes to great lengths to try to correct for this fact.

Yet in education, the careers of teachers and the academic futures of students now rests on the subjective judgments of one or two non-professional scorers with minimal training who must evaluate responses against a vague scoring rubric.

And we call this accountability.