Monday, December 11, 2023

Multiple Choice

There are very few multiple-choice questions in my chemistry exams. This is because, at the college level, I think students should be able to generate answers. It’s easier to recognize a better answer or explanation given several choices compared to generating one. And since I think that being able to generate answers is a learning goal, most of my exam questions require students to write out their answers and explanations.

 

I’d like to think that the grades my students earn on those exams provide a fair assessment of their knowledge of chemistry. By fair, I mean that I hope my cumulative final exams, which are summative assessments, are both reliable and valid. But because I don’t use multiple-choice questions on the exam, there might be additional inconsistency from the grader (me!) even though I have a tight grading rubric. Furthermore, my exam would have fewer questions than a multiple-choice equivalent because generating answers takes more time than recognizing them. Thus, it is questionable whether the “coverage” of my final exams allows me to make a reasonable inference that the students’ final grade reflects their knowledge of the semester’s cumulative material. Asking fewer questions provides less coverage and may be a poorer proxy.

 

While I haven’t taken the plunge, I occasionally consider what it would take for me to switch my summative assessment to a multiple-choice format. There are several advantages and disadvantages to this. A big advantage is that I might well be able to construct a final exam that has better validity and reliability. Grading will also be much faster. And analyzing the results per question will allow me to refine the questions on subsequent exams in a systematic way to improve their quality as differentiators. A disadvantage is that I can’t assess student’s ability to generate. Also it takes a lot of time and work to design high-quality multiple-choice questions and answers.

 

I decided to read a decade-old paper by Marcy Towns: “Guide to Developing High-Quality, Reliable, and Valid Multiple-Choice Assessments” (J. Chem. Educ. 2014, 91, 1426-1431) to help me think about all this. Towns provides good guidelines on writing questions and putting together the response set (or answers). I was familiar with the principles, but I found it helpful to see actual examples and be convinced by why the guidelines work well. I was also reminded how hard it is to put together good questions and answers.

 

Eye-opening things I didn’t know: (1) The optimum number of responses in a set is probably three. Towns backs this up with data and examples. I would have guessed four, but I find Towns’ argument convincing. (2) Item order and response order effects can be significant. Towns cites the data that having four challenging stoichiometry or equilibrium problems in a row impacts the student getting the right answer on the last one. I should think about this for my non-multiple-choice exams too. There are also priming effects where the order of questions can aid a student in “setting up” the appropriate cognitive processes to answer a question. This is both good and bad depending on whether you have a summative or formative assessment. (3) Sometimes students “choose an earlier answer without reading the entire response set”. Huh, that did not occur to me.

 

Towns also suggests simple ways to perform item analysis on the results to gauge the quality of the questions and response set. Were they good discriminators of knowledge? These should be easy to set up, and I can even start doing this on non-multiple-choice exams although with a bit more work, maybe with the help of GradeScope. Towns provides rule-of-thumb values to estimate item difficulty and item discrimination. I could easily write these into a program that does the analysis for me.

 

Am I willing to switch to multiple-choice questions for the final exam? I don’t know, although I have considered a possible partial mix. Being at a liberal arts college with small class sizes, my grading is not a chore. But if I want to prepare students for the summative final, my formative assessments should be of similar format so students get used to the format. But I feel that requiring the students to generate answers is of value to their learning the material at a deeper level. Thus providing both formative assessment and summative assessment in my current exam/test/quiz format does what I want it to do. I’m probably sacrificing some validity and reliability within my class. More so if my department was trying to compare grades across sections. We haven’t done so in any formal way, but I can see reasons why it might be a good thing to do. And if we do so, a multiple-choice assessment is likely the path forward.


No comments:

Post a Comment