Tuesday, November 20, 2018

The Purpose of Grades


In the previous post, I discussed some broad themes with regard to summative and formative assessment. Today, let’s dig down into the weeds of why we have grades in the context of summative assessment, at least according to Daisy Christodoulou in Making Good Progress – the book I’ve been reading. I will begin by using the question and quote approach similar to the previous post.
(I recommend reading the previous post for definitions I’ll be using.)

Why are summative assessments needed?

To support large and broad inferences about how pupils will perform beyond the school and in comparison to their peers nationally.

Why are there ‘grades’?*

The purpose of grades and other similar labelling systems is to provide an easy way of communicating this shared meaning… An employer or a teacher at a further education [university or college] is able to infer from it how well the pupil as done in that subject.

Are there other assessment alternatives to letter grades?

There are other methods of communicating a shared meaning which don’t look like a grade but in practice fulfil the same function… [for example, reporting] performance in terms of labels such as ‘emerging, expected and exceeding’ [a particular standard].

If being able to communicate the shared meaning is key, what restrictions does this impose on summative assessments?

1. They need to be taken in standard conditions with strict restrictions on the type of external help that is, and that is not, available.
2. They have to include questions that allow us to distinguish between different pupils.
3. They have to sample from [a larger] body of content [because] it will not be possible to cover all of it in a test that is a couple of hours long.

While I’ve provided soundbites from Christodoulou, she does explain each of the points more fully and provides clear examples. I recommend reading her book if you find any of this interesting. With the main purpose of summative assessment is laid out, we can see why formative assessment might look different. In particular, the teacher does not need to worry about the three restrictions above in designing such activities and assessments, particularly if the purpose of formative assessment is figuring out what to do next given where the students are at a certain point. Christodoulou argues that the ‘responsive’ aspect of formative assessment is important. Feedback runs both ways from students to teacher and vice-versa.

Let’s take a closer look at exams, since this is one of the most common methods of assessment in science courses. (I certainly use it regularly.) Exams can certainly provide summative information, but can they also provide useful formative assessment? One way they might do so is via detailed question-level analysis – if the questions are sufficiently detailed, and can be broken down into discrete parts. If a student answered a question incorrectly, why did the student err? Christodoulou provides a nice example of an exam question on electrolysis with commentary from a science teacher analyzing the possible errors. While this isn’t a multiple-choice question (MCQ), I’ve seen a number of cleverly paired MCQs where the right and wrong answers in the first question are paired with suitable “I chose my answer because…” MCQ explanations. Such paired MCQs are difficult to design if you’re a novice teacher, but an experienced instructor who knows the common pitfalls can zero in on this. I sincerely hope there is a good test bank of such questions – or I should really start collecting all the examples I find now so I can use them!

One of the challenges in summative assessments where the grades function as a discriminator of student activity is that (at the college level), the exam questions are more complex and require synthesizing different conceptual knowledge and skills. These sorts of questions work well to distinguish the ‘A’ student performance from the ‘C’ student performance. However as the complexity of the question increases, the reason a student may err becomes increasingly difficult to pinpoint. We certainly want to get our students up to the point of being able to tackle these more ‘authentic’ problems, but throwing such problems at them early on is likely to be more confusing than enlightening.

It makes you wonder if grading the earlier efforts is helpful from a learning point-of-view. I use low stakes quizzes throughout the semester, but it’s mainly as a motivating factor for students to keep up with the material and do the reading and homework. It has very little impact on the final grade. In my general chemistry class this semester, I’m experimenting (again) with low-stakes take-home exams but a higher stakes final exam. I worry that this approach favors the stronger students and disadvantages the weaker students, but I don’t have enough data to conclude if this is truly the case. Christodoulou provides a suggestion I haven’t yet tried out – that formative ‘grades’ be assigned based on improvement measures. But there are caveats to this approach (which she also tackles). I’ll have to think a little more how to avoid the pitfalls should I try this. There are also other approaches to qualitative formative assessment.

Chapter 5 (“Exam Based Assessment”) of Christodoulou’s book closes with whether exams and grades provide valid summative information. We’ve discussed the challenges of reliability in such assessments in the previous post. I’ve also previously considered the aversive control of grades. There’s also the elephant-in-the-room of ‘teaching to the test’ or students employing short-term strategies at the expense of long-term learning. Gosh, teaching is complicated!

*Did you know that college grades influenced the meat-packing industry?

No comments:

Post a Comment