Four Questions for the Assessment Committee

With notes on the firing of Maitland Jones

Oct 18, 2022

person's hand burst out of box holding assorted-color pens — Photo by Lucas Sankey on Unsplash

By now you might have heard about the former NYU Chemistry professor, Maitland Jones, who was fired after 82 of his students filed a petition with the university claiming that his organic chemistry course was too difficult. The story is a Rorsach blot of sorts. Some see it as a symptom of eroding standards in higher education. Others worry about encouraging students to think of themselves as entitled customers. But some see the high rate of failure in Jones’s class as a referendum on his teaching, which could have been more empathetic and supportive.

There are a dozen other debates that one might extract from Jones’s story, but it got me thinking about how we define quality in higher education. Jones earned all of his degrees — a B.S., M.S., and Ph.D. — from Yale University. “Mait,” as he is known to friends and colleagues, taught at Princeton from 1964-2007, where he earned an endowed chair and emeritus status before continuing to teach at NYU on an annual basis. Many of his former students remember his rigorous courses with gratitude. By all of these measures, you’d be hard-pressed to find a higher-quality professor than Jones.

But another common measure of quality these days is academic assessment. You could be forgiven for thinking that Jones was doing academic assessment by failing so many students. Here is the bar, he was saying, and a lot of folks weren’t meeting it. According to the New York Times, the average score on one of his midterm exams was thirty percent (sixty percent is passing). But academic assessment typically uses student performance to evaluate the teacher’s effectiveness in helping students achieve what a course or program lists as its intended outcomes. By that standard, Jones was nearly as bad as you could get.

I think of academic assessment the way Willa Cather thought of modernity. “The world broke in two in 1922 or thereabouts,” she wrote. And just as World War I marked a bright line between the past and a brave new world, one might say that higher education has entered a new Age of Assessment.

Like Cather, you know where you stand. You either belong to the former or the latter half of that divide.

The stealthy rise of assessment

There is no clear origin story for academic assessment. I can’t find any evidence of who first proposed reviewing courses and programs in the way that is commonplace now. But odds are good that a federal bureaucrat was the first to propose that teachers ought to be held accountable for what their students were learning and that measuring teacher effectiveness ought to be baked into the accreditation process. My hunch — not an evidence-driven claim — is that this philosophy first spawned in K-12, which is (so far) more heavily regulated by state and federal governments, and then migrated like a freshwater mussel into postsecondary education. I base that educated guess on the legacy of No Child Left Behind, which required schools to assess student progress toward state benchmarks every year, but I invite better explanations in the comments.

What is “evidence” in evidence-based teaching?

Just as defining quality in higher education depends a great deal on the source, it’s not obvious what qualifies as evidence when you’re evaluating learning. Artifacts of student work — papers or exams — are common sources. But for anything to function as evidence it has to be leveraged within a persuasive argument, and the parameters for persuasion are largely defined by a prevailing worldview.

For instance, NYU claims that academic assessment is “the process of using evidence to understand and improve student learning in academic programs.” Maitland Jones thought he was doing that. To his mind, students weren’t using the resources available to them. They were skipping class, ignoring the supplemental videos that he reportedly paid $5,000 out of his own pocket to produce, and misreading exam questions. By the time he was fired in 2022, he’d been using essentially the same standards for 58 years. And he is no stuck-in-the-mud octogenarian using outdated material; W.W. Norton is still distributing the fifth edition of his textbook. Jones thought the best way to improve learning was to expect more from students the way he always had. But NYU interpreted poor student performance as evidence of Jones’s ineffectiveness. Low student evaluations, failing grades, course withdrawals, and ultimately the student petition bolstered the university’s case that Jones was derelict in his teaching.

There is no doubt that a professor’s style can get in the way of the material. I had a Calculus professor once who was so shy that he closed his eyes whenever he turned from the blackboard to face the class. He often smiled as he spoke, his face lifted as if he were a musician going deep into the music, and sometimes when he was feeling especially enthusiastic he’d toss and catch his chalk in one hand while his eyes were shut. The effect was hypnotic in an unhelpful way, and I often had to snap myself back on task after waiting for the chalk to shatter on the floor (somehow it never did).

But mostly my generation prized eccentricity in our professors. We didn’t want someone who followed generic best practices; we wanted passionate teachers whose unique approaches made the material feel personal. My Political Science professor was so limber that he often rested one knee on the shelf of his lectern, which made him look like a Great Blue Heron standing on one leg in a marsh. When a classmate asked him once where he put his other leg, he replied, “In Narnia.” And we never stopped talking about the Psychology professor who came to a Halloween party dressed in a white sheet that said “I am the moment of conception.” I don’t know how to prove that these flourishes improved my learning except to say that they made me eager to go to class and made me feel affectionate toward my teachers, which translated to more engagement and, I’d argue, a better overall effort on my part.

Jones seems to have been one of these celebrated professors, too, beloved as much for his verve as for his mastery of chemistry. And so his firing, to me, represents the paradigm shift away from teachers as colorful individuals whose quirks are part of the package toward teachers who suppress their individuality to follow more or less the same protocols.

My visceral rebellion against academic assessment has largely sprung from its explicit attempt to depersonalize education in the interest of goals like alignment and standardization. At times I have wondered if I was simply turning into a curmudgeon. Maybe I was growing incurious or resented the implication that my more intuitive methods were *not* evidence-based and therefore less authoritative? But the more I have reflected on the Romantic underpinnings of my teaching philosophy, the more I have understood my opposition to assessment (as it is presently administered) to be a principled stance.

A rallying cry for my parents’ generation during the Vietnam Era was “Question the Question.” A military draft was a more extreme example of government overreach, but if it is true that universities cannot forego academic assessment without being penalized by accreditors, who likewise fear the wrath of the Department of Education, then a similar principle of conscientious objection might apply. One need not simply accept the question “What is your evidence of teaching effectiveness?” It is reasonable to ask how such evidence might be defined and whether it can be meaningfully measured at all. If you, too, find yourself resisting the dominant paradigm, here are four questions you might ask an assessment committee, consultant, or dean.

1. How do we know that a student essay or exam represents their best effort?

Without knowledge of the contextual factors that might have influenced student performance, real harm might be done from drawing any meaningful conclusions about what a given artifact proves as an assessment of course design, programmatic success, or teacher effectiveness. There is a reason why medicine abandoned the biomedical model, which often dehumanized patients by focusing solely on their symptoms, in favor of the biopsychosocial model, which regards psychological and social histories as relevant to diagnosis.

I was once shocked to discover, while grading a timed essay exam, that it was the best writing I’d seen from many students all semester, even though they’d had at least a week to complete every other assignment. The simplest explanation? It was likely the only time that many of them had shut themselves in a room and focused on nothing but writing for two straight hours. On another occasion, a young man stayed after our Advanced Poetry class to say he was having difficulty drafting his poems. When I learned that he was trying to compose poems on his laptop, I recommended writing the first drafts longhand, citing a Sharon Olds interview where she described the importance of feeling the words coming out of your fingertips, as if that initial expression were a physiological act. As he turned to leave, hand on the doorknob, I asked on a whim where he was trying to write. It turned out that he was often trying to write poetry on a laptop in his dorm room while his roommate was playing Halo in the background. One thing that will never appear on an academic assessment? Where a student completed the artifact under review. The secrets to better performance for my student were a pen, a legal notepad, and a pew at the back of an empty college chapel. But I’d never have even known to suggest those things if he hadn’t first approached me for help, and even then I almost missed the most salient cause of his trouble.

There is no way to know where a given class ranks in a student’s list of priorities. If a student is taking four or five classes at once, playing a sport, and/or working twenty hours a week, every formal assignment likely gets some form of short shrift. A student might be hungover, high, grieving the death of a parent, suffering from depression, or mourning a breakup with “the one.” In Jones’s case, relaxed standards and burnout from the COVID crisis seem to have contributed more to student performance than his methods, which presumably had remained consistent for nearly five decades.

2. What does this assessment ignore?

The assessment tools I’ve seen target median benchmarks. These targets could be anything from critical thinking to crafting thesis statements to setting up a critical dialogue between research sources. All worthy outcomes, and largely teachable, but there are a lot of things that can’t be measured in this way.

In creative writing courses, there is a material difference between a student who has found her voice and a student who has simply mastered a rubric. An assessment won’t tell you that — both students will be judged equally proficient and their artifacts will be regarded as equivalent forms of evidence — but by my standards there is a world of difference in teaching effectiveness between the two. One student will never forget the class as long as she lives. She’ll still be drawing from it decades later, even if it has no direct relevance to her work life. The other student will have forgotten the entire experience by the time her first supervisor explains the metrics for performance review. And yet I cannot quantify “finding your voice.” This is, to paraphrase Cather again, a presence that a writer creates upon the page that is more felt than seen there.

Atul Gawande, one of the champions of assessment in medicine, acknowledges that human beings are “somewhere between a hurricane and an ice cube: in some respects, permanently mysterious, but in others — with enough science and careful probing — entirely scrutable.” Students are human beings. Some aspects of their learning might be as predictable as an ice cube melting in a fire. But there are meaningful and vital dimensions to the teacher-student exchange that are impenetrably mysterious. Devaluing those things because they can’t be measured hurts both teachers and students.

Incidentally, even economists are moving away from formulaic models. The Freakonomics podcast discussed this recently with James Choi, Professor of Finance at the Yale School of Management, and Morgan Housel, author of The Psychology of Money. The upshot of their conversation? There is no objective best practice for personal finance, even when it comes to home mortgages. Standard advice would be to stick with a fixed mortgage and invest any extra income. But for some the psychological benefit of owning a home outright outweighs the extra earnings. Same for renting versus buying a home. As Choi writes, even if an optimal financial strategy is knowable, it might not be realistic for individuals with limited willpower. Experienced teachers make these kinds of calculations all the time, gauging what is optimal against what is possible with a class or a given student. When it comes to something as intractably complex as what makes teaching effective, I’ll always trust the intuition of a veteran professor to hit closer to the mark than a disembodied assessment.

3. Who does this assessment serve?

Assessment largely renders individual teachers replaceable, which serves the interests of administrations and Boards of Trustees who are seeking to weaken longtime bulwarks like tenure. Anyone can help students meet an assessment benchmark. Often you don’t even need an advanced degree to do it.

Statisticians will tell you that data often conceals as much as it reveals. Literary theorists say the same about words. It’s okay to question the question. What is the hidden curriculum of academic assessment?

Kurt Vonnegut captures it in the opening to his fabulous short story, “Harrison Bergeron,” published in 1961. The opening is worth quoting in full.

“The year was 2081, and everybody was finally equal. They weren't only equal before God and the law. They were equal every which way. Nobody was smarter than anybody else. Nobody was better looking than anybody else. Nobody was stronger or quicker than anybody else. All this equality was due to the 211th, 212th, and 213th Amendments to the Constitution, and to the unceasing vigilance of agents of the United States Handicapper General.”

Vonnegut imagines a society in which excellence of any kind is suppressed. Those with above average intelligence are required to wear radios in their ears that buzz periodically to interrupt their thoughts, so they are just as scatterbrained as everyone else. Attractive people must disguise their beauty. Gifted dancers and athletes must wear bags of birdshot that keep them from distinguishing themselves.

If you find this a strained metaphor for the culture of assessment in higher education, I might ask whether the time required to duplicate efforts that you’ve already completed while recording grades for your classes works in service of or in opposition to the time required to distinguish yourself as a scholar or to design a teaching experiment that no one has ever attempted before?

4. How can creativity be reconciled with assessment?

Innovation requires thinking outside the box. Yet assessment is governed by boxes: rubrics, checklists, lines within which students are encouraged to color. You can’t just label a box as “creativity” and assess it meaningfully. Innovation is, by definition, unpredictable. Innovation breaks the paradigm. Innovation scoffs at convention. Innovation breaks the rules intentionally.

It is no accident that the arts and humanities are sidelined by a culture that believes student learning can be reliably assessed. Lyricism requires straining against the limits of conventional expression. Visual art continually experiments with how to render abstract feelings or concepts visible. These creative modes are sometimes commodified under the heading “marketing,” usually for puns or humorous ad concepts like the Geico Gecko, but that’s not the reason universities offer courses in drawing or poetry. Sometimes the purpose of meaningful art is to make us feel deeply uncomfortable with the world we inhabit, to be suspicious of authority, and to refuse the conclusions that the Handicapper General draws.

The Recovering Academic

Discussion about this post

The Recovering Academic

Four Questions for the Assessment Committee

With notes on the firing of Maitland Jones

The stealthy rise of assessment

What is “evidence” in evidence-based teaching?