Thursday, September 23, 2021

Noise Reduction

Daniel Kahneman’s latest book, along with co-authors Olivier Sibony and Cass Sunstein, is Noise: A Flaw in Human Judgment. What is this noise and where does it come from? Given their background and expertise, this book is about humans making judgment calls, machines or automated systems making judgment calls, and how this plays into psychology, public policy, and economics, among other things. 

 


Noise is distinguished from Bias. The authors introduce these two types of error with rifle shots at a bulls-eye. Here’s a picture from CommonCog

 


My introductory chemistry students learn about this on the second day of class. However, we use the words Precision and Accuracy. In this case, noise is equivalent to imprecision, bias is equivalent to accuracy. In chemistry, the judgment is a measurement – errors may come from the human observer, the measuring device, external ‘environmental’ factors, or a combination of all these.

 

The authors argue that in the context of human judgment calls, we often pay undue attention towards bias and not enough on noise. Their clarion call is to take noise more seriously. They break down the concept of noise into several types: (1) level noise is “the variability of the average judgments made by different individuals”, (2) pattern noise is “the difference in the personal, idiosyncratic responses of judges to the same case”, (3) occasion noise is “the effect of an irrelevant feature of the context on judgments”.

 

Using myself as an example to illustrate these: one of the jobs where I play a role as “judge” is as an instructor examining the quality of student work. I teach first-year college general chemistry, as do many of my colleagues. Let’s say I’m known as being a tough grader in general compared to my colleagues, and indeed students earn lower grades in my section compared to my colleagues’ sections – that’s level noise. My specialty is chemical bonding so I’m particularly sensitive to minor details in that area, and may be particularly harsh on student ‘errors’, but I’m perhaps I’m lax on stoichiometric calculations and I don’t mark students off for what I consider minor errors in calculations or significant figures – that’s pattern noise. One day, I get into a fender bender on my way to work and I’m feeling huffed and annoyed causing me to grade more harshly than I would on another day where I had a smooth drive into work with no traffic – that’s occasion noise.

 

A university administrator (perhaps a department chair) who hears student complaints might want to reduce the noise judgments of professors. Let’s get the faculty members who teach a common course together to iron out differences so we can have more uniform grade distributions – maybe we should even have common assignment questions or final exams! If you’re in academia, you know this is like herding cats. Nevertheless, it can be done (expect resistance!), for good or ill (or likely aspects of both).

 

There are many ways to potentially reduce noise in social judgment contexts and the authors provide a number of interesting examples. Some of them involve the idiosyncracy and variability among court-case judges, as you might expect. More general examples applicable to a wider swath of society may include hiring managers who interview candidates, or the admissions team at a university deciding which student applicants should receive offers. Much of the advice provided in Noise are things you’ve likely heard about, e.g., standardizing interview questions and making them less open-ended or over-dependent on personal idiosyncracy. When grading something more subjective such as essays, it’s easier to read them all and rank them before assigning grades to any one of them. Account for base rates before making your estimate. The authors summarize their advice in the following bullet points.

 

·      The goal of judgment is accuracy, not individual expression.

·      Think statistically, and take the outside view of the case.

·      Structure judgments into several independent tasks.

·      Resist premature intuitions.

·      Obtain independent judgments from multiple judges, then consider aggregating those judgments.

·      Favor relative judgments and relative scales.

 

The authors also acknowledge that there’s a trade-off between reducing noise and the effort required. Finding this balance is, well… a judgment call. And if you’re going to outsource the effort to an algorithm? Well, there are trade-offs to that too.

 

As a quantum chemist, I see nature at its base as being fuzzy. Even as we draw boundaries to distinguish one thing from another, things become fuzzy if we start to look too closely. To get geeky for a moment, I’m having an epiphany that the reason why the fundamental bits of matter (protons, electrons, neutrons, and therefore atoms) are fermions that obey the Pauli Principle, is to give some discreteness to what otherwise would be an undistinguishable goo. It’s a sort of reduction in fuzziness.

 

As an instructor, the issue of noise is challenging. Not only are we human beings with all our attendant individual quirks, the process of learning is somewhat mysterious – I’d say it is complex, not merely complicated. What does it mean to ‘know’ something? Are there levels or types of knowing, and if so, how would you distinguish them? Could we even agree on a scale? I don’t know. I don’t always agree with my colleagues on which topics are crucial in general chemistry – yes, we agree on a lot in common, but there are differences especially when you get to the finer points. Part of why we disagree comes from our different backgrounds, our relative expertise, and our relative experience in teaching.

 

There are several things I do to reduce occasion noise when grading. Not looking at the student’s name. Grading exams question-by-question rather than one student after another. Shuffling the exams (after I grade each question). Making sure I’m not in a bad mood, and I’m ready to grade fairly. When I set exams, I let a day or two pass after writing them, and then I take the exam at a particular time-of-day so that my body is roughly in the same physical state of alertness, and I time myself so I can gauge the difficulty of the exam. It’s harder to reduce pattern noise – I’d say that I grade according to what I emphasize in my class. (I expect a different instructor would emphasize different things and therefore set exam questions and grade differently.) I’m not sure how we would reduce level noise without having common exams and grading schemes. Some amount of work is involved to try and fairly assess common assignments.

 

Noise is here to stay. We should beware the siren call of algorithms to reduce noise and have the wisdom to know when it is warranted and when it is not. It’s part of what it means to be human, and we should be careful not to dehumanize in our efforts to reduce noise and error.

No comments:

Post a Comment