Thursday, May 10, 2018

Qualitative Formative Assessment


This week I read a thirty-year old article by D. Royce Sadler on formative assessment, with a focus on qualitative and multiple-criteria approaches. These are situations where simple standard rubrics are found severely wanting and “warp teaching and assessment”. Creativity and complex learning outcomes, with less well-defined criteria, are among those situations. They are not just limited to qualitative judgments in the fine arts and humanities, but also in “science and mathematics, where students are required to devise experiments, formulate hypotheses or explanations, carry out open-ended field or laboratory investigations, or engage in creative problem solving.” (Citation and abstract are shown below, accessible from JSTOR.)


Sadler defines a qualitative judgment as where the assessor is “both the source and the instrument for the appraisal”. The assessment therefore cannot be reduced to a formula or a rubric that could easily be used to check-off criteria. Unfortunately, the push for measurable or quantitative assessment either ignores these complexities or an unsatisfactory proxy is chosen. We the faculty, subject-matter experts in tertiary education, aim to guide our students into depth and complexity in our fields; and we should push back strongly against the tide that focuses on things that can be counted easily.

There are five characteristics where qualitative judgments apply according to Sadler: (1) when multi-dimensional criteria are required and their inter-relationships are important, (2) when criteria are fuzzy and continuous, (3) when the criteria pool is large and comprises both manifest and latent criteria, (4) when there isn’t a single objective independent measurable standard for comparison, and (5) “the final decision is never made by counting things, making physical measurements, or compounding numbers and looking at the sheer magnitude of the result.” In fact, our simple rubrics are often an attempt to do #5, and Sadler warns specifically against this.

Two things that Sadler then discusses in detail are what may constitute effective feedback from instructor to student (or expert to novice), and how students might practically benefit from the feedback – essentially, learning how to learn, as self-monitoring approaches are honed by the student. I recommend reading the article in detail because Sadler does an excellent job setting up the conceptual framework for his ideas and provides good examples to illustrate the theory. Instead, I want to briefly mention three ancillary things that jumped out at me from the article.

First, it can be useful to think of the community of subject-matter-experts as members of a guild. Members of the guild instinctively recognize quality broadly in our fields, even if it be in different shades. If asked to articulate the criteria that define quality, we might have partial and differing lists from one another (I mentioned above that criteria can be manifest or latent), but we might not be able to agree easily on a simple explicit rubric or checklist to assess that quality. In the guild system, the “novice is, by definition, unable to invoke the implicit criteria for making refined judgments about quality. Knowledge of the criteria is ‘caught’ through experience, not defined… [through] a prolonged engagement… shared with and under the tutelage of a person who is already something of a connoisseur.” This is why many of us scientists at liberal arts colleges consider the undergraduate research experience to be crucial for a high-quality education of our majors, and many of us embrace a guild-tutelage model in our research labs.

Second, I was struck by Sadler’s claim that “some teachers feel threatened by the idea that students should engage openly and cooperatively in making evaluative judgments”. He provides several plausible reasons for this: loss of control, undermining expertise authority, and others. In the sciences, especially at the undergraduate course level, we still gravitate towards the “one right answer”. In Fall 2016, I experimented with take-home exams in my first semester general chemistry course. It had its pros and cons. Reading Sadler’s article made me think of adding a student self-evaluative piece to the take-home exam. After taking the exam individually (closed-book and with time constraints), students must evaluate their own performance, by going through the exam again but may do so collaboratively or looking up resources. I think I’ve come up with a workable scheme to try next semester with a new group of students.

Third, Sadler argues that continuous cumulative assessment for grades may actually subvert or hinder the benefits of formative assessment. Let me explain. In most U.S. college introductory science classes, the bulk of the grade comes from midterm exams (2-4 throughout the term) and a final exam; there might be some homework or quiz grades throughout the semester too. The student’s final grade comes from summing up all these mini-grades. I grew up in a different country with a different educational system, where the final (nation-wide) exam was the end-all. Nothing counted but the final performance. Both systems have their pros and cons, but perhaps there is a way of maximizing the pros and minimizing the cons in a hybrid system. I have some ideas of how this might work, but I think I will float them to some students I’ve had in the past, and see what they think.

If formative assessment interests you, the appropriate bible is Dylan Wiliam’s book. In my opinion, it’s one of the best books out there on this subject. I highly, highly recommend it. The figure below is taken from Greg Ashman’s blog (also recommended) and his book Ouroboros – the sampler chapter on Rubrics is available.

No comments:

Post a Comment