From the Chronicle of Higher Education, November 20, 1998, p.
A56
POINT OF VIEW
When Professors Get A's and Machines Get F's
By DENNIS BARON
It's time to put a stop to talk about new computer programs that can grade
students' essays with less effort and more accuracy than any teacher could --
and do so more cheaply and quickly, to boot. Such programs are neither
necessary nor desirable. I'm not worried that computers will replace teachers.
Technology, used well, can enhance the human aspects of learning. But I am
concerned that essay-grading programs may give both students and teachers a
false picture of how readers evaluate what they read, and that in turn will
lead to false ideas about how we should write.
According to its promoters, the Intelligent Essay Assessor, this year's entry
in the grading-software sweepstakes, can scan essays and reliably identify what
students have learned in seconds. Like older grading programs, it can measure
the length of words and sentences and analyze punctuation. But it also does
something its developers call "latent semantic analysis." By
searching for keywords and their synonyms, it can tell whether a student is on
the right topic and how much information the essay contains compared to stored
samples of essays on the topic. It can tell students what they've left out, and
it can signal instructors that a student may have plagiarized. The program
looks not just for single words, but for word patterns and phrases, so students
can't fool it by parroting lists of concepts. And the program can learn what's
good by being fed model essays, written by experts and professionals, on the
topic in question.
Pitted against human graders, we are told, this software program gives grades
that mirror those of real teachers. The program won't do away with teachers,
but it will allow them to escape the drudgery of reading essays and to spend
more time with students. The program also promises to be more consistent than
human readers, who are notoriously inconsistent.
Teachers often insist that an objective standard of reading does indeed exist,
that they can reward good writing in the same way that they can recognize a
literary masterpiece. But in spite of such assurances, students know how erratic
their teachers can be. From the students' point of view, every teacher seems to
have a different set of expectations when it comes to written work. Students
spend half their lives trying to figure out what each new teacher wants from a
writing assignment, and the rest of their time trying to figure out how to
adapt their writing to those ever-shifting expectations. The Intelligent Essay
Assessor would assure students that they are being held to the same standard --
time after time, class after class. In theory, the program could rank the 100
best student essays much as a Random House panel recently ranked the 100 best
English-language novels of the 20th century.
I'm not convinced the program can perform consistently, and I don't think
consistency, if it could be achieved, would improve the assessment of written
assignments. Teachers believe in standards for grading, but they don't all
share the same standard. Every teacher can tell a good essay from a bad one, a
C paper from an A or an F. They can do so blindfolded, upside down, underwater,
with both hands tied behind their backs. What they can't do is agree
unanimously on which paper deserves the A, the C, or the F. Give a group of
teachers an essay to grade, and, among them, they will manage to assign it
every grade on the scale, citing plenty of reasons to back up their decisions.
The consistent grades that the computer promises to award may be comforting to
students, to their parents, and to deans, but consistency in reading is not a
human trait. That is why teachers don't share a common standard for evaluating
writing. Such a standard is not something we should either expect or strive for
in education, since it's not something students will encounter in their
out-of-school writing.
Students may complain that their roommate's freshman-composition teacher is an
easier grader than the one they're stuck with, or that their mother thinks the
essay they got a C-minus on is really worth an A. But both the history of
reading and our everyday experience confirm that even the most rigorous and
attentive of readers will disagree over texts. If we've learned anything about
reading and writing in the past 20 years, it is that each person brings to the
activities such individual perspectives that it's amazing readers can agree on
any aspect of the meaning or value of texts at all. Heraclitus, commenting on
the vagaries of life in Greece in the 6th and 5th centuries B.C.E., said you
can't put your foot into the same river twice, since it is always changing. Some
wag amended this to claim you can't even put your foot into the same river
once.
No two readers approach a text in exactly the same way; even the same person
reading a text repeatedly will come away with different feelings each time.
Like it or not, students must learn that their human readers will be
inconsistent, a hard and frustrating lesson.
Writers learn to take their best shot, but they all know it is impossible to
predict a reader's response with complete accuracy. Professional readers are no
more consistent or predictable than any others. As a writer, I know I can send
a manuscript to one editor and get it back by return mail -- by return e-mail
these days -- with a polite, "Not for us, thanks," while another
editor will pounce on the same piece and rush it into print. As a manuscript
reader, I know I've accepted essays for publication that others have rejected,
while ones I've turned down have found their way into print, where they have
been well received.
Writers and editors learn that this is part of the natural give and take of the
word game. It is harder, and often more frustrating, for teachers to face up to
the fact that they too disagree over the relative value of different texts,
whether they are student-generated essays or lists of great books produced by
the cultural elite. That realization may not change how teachers grade, but
acknowledging it would help them teach their students to expect real, rather
than ideal, readings of their work.
Teachers want to grade less, but I doubt many will want computers to do their
reading for them. We don't always relish grading student essays, but neither do
we want to give up the insights that reading students' work gives us into how
they are thinking. I don't think student writers will be happy with
computer-graded essays either. Students may be content filling in
machine-gradeable sheets for multiple-choice tests, but they don't want their
written work read and graded by machines. After years of being reminded by
teachers to consider their audience, students don't want that audience to be a
silicon chip. They want to know that there's a real person out there responding
to them, even if that reader isn't predictable or knowable.
I asked students in my "Literacy and Technology" class, who are more
committed than not to the new technologies, how they felt about the Intelligent
Essay Assessor, and they were not encouraging. One student pointed out that
computers won't be able to tell if their essays have improved over time, while
another wondered if the program could detect and reward students who go way
beyond what the question calls for, or who answer a better question that they
have posed themselves.
The World-Wide Web site for the Intelligent Essay Assessor allows writers to
submit their own answers to several set essay questions. To test the efficacy
of the program, I submitted the essay you are reading (concluding with the
paragraph above) for evaluation. None of the essay questions on the site
matched my topic very well, but that did not prove a great obstacle for the
virtual grader. I tried submitting my work as a response to a question in the
Graduate Management Admission Test that asked for an evaluation of a particular
kind of business practice. It crunched my essay in a couple of seconds, rating
it a 3.36 on a six-point scale. The program assured me its score was valid,
adding, "None of the confidence measures detected anything inappropriate
with this essay."
Knowing my essay was completely inappropriate, since it didn't address the required
topic at all, I resubmitted it, this time as a response to another set GMAT
topic -- an analysis of a marketing campaign. The program quickly slapped on a
grade of 4.82. This time, the program acknowledged that it had low confidence
in the assigned grade, presumably because my essay was once again completely
off the topic. Nonetheless, it did give me a fairly high mark (I actually think
it deserves more), and added this message: "In a real grading situation,
this essay would likely be forwarded to a human grader for verification."
Note that referral to a human grader would be likely, not certain.
Interestingly, my off-topic essay got a high score when submitted as a response
to one topic and a lower score as a response to a different topic. The program showed
an inconsistency approaching that of human readers after all. So much for a
gold standard of consistency in grading.
When I submitted the essay a third time, as the answer to a question for a
freshman class on the function of the heart and circulatory system, the
computer finally realized something was amiss and responded to my writing
sample with the wry comment: "A very ZEN essay. Your emptiness impresses
me." Apparently grading machines can get testy, too.
Perhaps it is wrong for me to stand in the way of technological progress. After
all, technology serves students as well as teachers; Web sites now offer
students the ability to download term papers free. If we're going to eliminate
the human reader, why not eliminate the writer as well, and let the computer
grading program interface directly with the computer-generated essay? In a way,
they deserve each other. Besides, eliminating the drudgery of writing would
allow students more free time that they could then spend with their newly
leisured instructors.
In any case, until the virtual writing and reading technology is perfected, my
advice to teachers, students, and people about to take the GMAT is, Don't quit
your day job.
_______________
Dennis Baron is a professor of English and linguistics and head of the English
department at the University of Illinois at
Urbana-Champaign.
Copyright 1998 by The Chronicle of Higher Education