Wednesday, May 23, 2018

Fine-Tuned Grades


A common speculation in physics, armchair or otherwise, is that our universe is fine-tuned. The proposal of a fine-tuned universe hinges on the argument that a tiny change in the fundamental ‘constants’ of physics will lead to a very different universe, one devoid of life or complex molecular structure.

The fine-tuned argument often invokes the anthropic principle; but while the two concepts overlap but are not identical. The anthropic principle has two forms, neither of which I find useful from a scientist point of view. The weak anthropic principle (Brandon Carter), in my opinion, is obvious. “Only in a universe capable of eventually supporting life will there be beings capable of observing and reflecting on the matter.” The strong anthropic principle (John Barrow and Frank Tipler) avers that the universe compels the formation of intelligent life to muse upon it. I don’t think this is testable scientifically, and it remains in the philosophical realm.

Let me try to answer something simpler. Do I fine-tune grades in my classes? Do I design my exams with this fine-tuning in mind, consciously or unconsciously? That’s what I’ve been pondering this week as I’ve been grading final exams. I have some external data and some internal musings in my head, so let’s see where it leads.

Teaching at a liberal arts college with small class sizes means there could be significant variations in student grades depending on the ‘sample’ of students I get in any particular class. I decided to analyze General Chemistry I and II, because I teach these classes most years. The “Regular” sections are typically 40 students and the “Honors” sections are typically 20 students. Some years I teach a smaller 20-student section of GChem1 in the fall semester, where incoming first-year students have often indicated an “Interest” in chemistry or biochemistry, i.e., these students are science or chemistry-inclined, but not in the honors program.


The x-axis in the graph represents relative time in each category, but the years are not necessarily consecutive. Otherwise the lines would jump between categories; I might teach a Regular section one year, an Honors section the next year, and then another Regular section the following year. Hence, I’ve chosen to group the data by category. I’ve omitted a couple of data points – one where my spreadsheet was empty (I must have deleted the data by mistake some years ago), and another where I structured the grade distribution in the class very differently because of a final project. I’ve also included both the mean and median average scores.

On the first day of class, I tell my students that I grade on an absolute scale, i.e., there is no curve in the class. (I also explain why so the class learns something about sample size and the normal distribution.) This means everyone could get an A, a possible but unlikely scenario. Everyone could fail, a possible but very, very unlikely scenario. I band my grades in 15% increments and my plus/minus increments are in the top and bottom 3% of each band. On average 85% of the class grade is based on exams, and the final exam is typically a third of the overall course grade.

The Regular GChem1 mean is close to the C+/B- borderline, with the median in the B- range. When I teach the smaller class of science-inclined students, both mean and median are mostly in the low B range. Regular GChem2 has mean and median scores close to or in the C+ range. The material in GChem2 is more challenging than GChem1. Some students who do well in GChem1 coast on prior knowledge if they’ve had a good high school chemistry class, but then run into difficulty in GChem2. I don’t have as many data points for the Honors sections, but they are typically in the mid-to-high B range with GChem1 slightly higher than GChem2. The exams in the “Interest” group do not differ in difficulty compared to the “Regular”. The exams in the “Honors” group are only slightly longer with marginally higher expectations, but overall not too different.

There are some ups and downs in the average grades, but my broad brush says they are by and large consistent. Early in my teaching career, I might rescale the grade on a particular exam, but these were minor adjustments and happened rarely. No rescaling was needed in recent years. In my first year, I asked my department chair what the averages were and was told that C+/B- was typical in General Chemistry. That likely provided a reference point for my exams in the early years, and then it was just fine-tuning to the present day. Nowadays, I can pretty much sit down and write an exam from scratch in a couple of hours and it will likely yield an average grade close to the average in each category. I still keep up the practice of taking my own exams (usually a day or two after writing them so my memory buffer has cleared) and occasionally fine-tune it further depending on how long the exam took me and how much I had to write. (I write up the Answer Key in full when I take the exam; this key is provided to students when I hand back their graded exams.)

What do my data tell me? Somehow over the years, I have subconsciously imbibed an internal standard to exam-writing that seems to work well on average providing a consistent distribution in student grades, at least for the larger Regular sections.

Is the universe designed by an intelligent being? There are philosophical or theological arguments for and against, but I don’t think it can be proved scientifically either way because we honestly don’t know how or what to measure. (Yes, I’ve read many of the claims and counterclaims closely.) Are my exams designed by an intelligent being? Me, I’d like to think so. But there’s an intuition to the way I write exams, now that I’ve had many years of experience, rather than the consciously carefully calibrated scientist-ideal. In the early years, I fussed a lot over details and took a lot longer. Now it seems fine-tuned in a way, but not analytically. Could my intuition be translated into a computer A.I. that generates and grades exams? If so, would it be intelligent? Will it fine-tune through machine-learning? I don’t know, but I hope I’m not replaced anytime soon.

No comments:

Post a Comment