“I grade on an absolute scale. There is no curve. This
means that potentially everyone in the class could get an A. This is unlikely
to happen. It also means that potentially everyone could fail. This is even
less likely.”
That’s what I tell my students on the first day of class,
in every single one of my classes – because the law of small numbers applies when
one teaches in a liberal arts college with small-ish class sizes. My students
know what this absolute scale is score-wise because it’s explicitly stated in
the syllabus. Mostly, they are heartened by there not being a curve; they
likely focus on the part where they might score A’s and not where they might
fail. The reality at semester’s end is, in my intro-level chemistry courses, something
resembling a normal distribution (with fatter tails). The mean and median grades
depend on the size of the class and average student interest and ability. (In physical chemistry, the distribution is often bimodal.)
Larger institutions with huge introductory lecture
courses might grade on a curve. In other countries, professors may not have the
final say on the final grades because the university administration may impose
grade norming. This sounds like anathema to faculty at U.S. institutions, but
the issue is more complex than at first glance. I, for one, have no plans to
change my grading policies. (I tell students I don’t assign their grades;
rather students earn their grades.) That being said, it’s not because I’m absolutist.
Rather I’m in a system that gives faculty autonomy over grades and I have
particular ideas as to what constitutes A work, B work, C work, and so on. Also,
the law of small numbers. (For more on the purpose of final grades, see here.)
All the above is to preface the hullaballoo this week
in higher education circles from a white paper published by the National Bureau
of Economic Research (NBER). The somewhat cryptic title is “Equilibrium Grade
Inflation with Implications for Female Interest in STEM Majors”. I wouldn’t have
noticed it if not for the InsideHigherEd (IHE) article “Grading for STEM Equity”, with the provocative lede:
Study suggests that
professors should standardize their grading curves, saying it’s an efficient
way to boost women’s enrollment in STEM.
That definitely catches eyeballs, particular since the
article opens with:
Harsher grading
policies in [STEM] courses disproportionately affect women – because women
value good grades significantly more than men do according to [NBER paper].
What to do? The study’s authors suggest restricting grading policies that
equalize average grades across classes, such as curving all courses around a B
grade. Beyond helping close STEM’s gender gap, they wrote, such a policy change
would boost overall enrollment in STEM classes.
The IHE article is short, yet does a good job
summarizing the main research points. Cue the extensive comments section; some
thoughtful, others clearly indicating they have not read the NBER paper (and
skimmed the IHE article too quickly), none of which is surprising.
The paper itself is quite interesting. The suggestion
of grade norm(aliz)ing around a B average comes from building an economic model
based on extensive data from the University of Kentucky, and then applying counterfactuals
to examine how students might sort themselves differently into majors and their
associated classes. One can quibble with the model parameters, for example, I
thought the professor utility function they applied was much too simplistic,
but overall I felt that they had reasonable justification for their model (from
my non-expert point of view). I recommend reading the paper if you’re
interested in the details. (I read the actual NBER Dec 2019 article but if you’re
trying to avoid a paywall, searching the article title will reveal earlier
working copies that are somewhat close to the final version.)
The data is interesting. Unsurprising was that STEM
classes were associated with lower grades. (The authors grouped Economics,
Finance, Accounting, and Data Sciences with the standard STEM areas.) Average grades
were 2.94 and 3.27 for STEM and non-STEM respectively. For women, these
averages were 3.00 and 3.37, i.e., women score better than men in both STEM and
non-STEM. One compounding factor is that STEM classes were on average twice as
large as non-STEM classes (80 versus 40) – likely due to those large intro STEM
classes. Also unsurprising was that self-reported outside-of-class study time
was 40% higher in STEM classes. More shocking were the actual average numbers of
3.37 and 2.45 hrs/week for STEM and non-STEM respectively. That’s very low! While
self-reporting is always suspect, that’s 20,000 students and the over and under-estimations
might cancel out. We also know from other longitudinal studies that study hours
have decreased steadily over the years and the U of K numbers are not out-of-whack
for the present decade.
Looking at details more closely, larger classes do
indeed show inverse correlation with grades. Classes with more women have
higher average grades. Classes with more women have higher study hours. And
then the kicker: Classes with higher grades show less self-reported study time.
The authors note that “grade inflation may have negative consequences for
learning.”
The meat of the NBER paper is the model they build
whereby “grading policies influence enrollment decisions directly because
students value grades but also indirectly through incentivizing (costly) study
effort.” Each course is assigned a payoff based on a student’s preference for
the course, how much time he/she is willing to study, and an expected grade based
on such effort. Students sort themselves into courses and receive potential
grades that depend on academic preparation, study effort, professor “grading
policy”, among other things. There’s a bunch of math and the model is parameterized.
Some interesting things that come out of the model:
Women study a third more than men. Doubling study effort leads to larger grade
increases in STEM versus non-STEM; the extremes are Engineering (0.37 grade increase)
and Management & Marketing (0.13 grade increase). There’s a
likely-to-be-controversial table showing the ability weights of women being
lower in STEM areas, with Chemistry & Physics at the bottom of the pack.
Expected GPAs for both women and men are also lowest in Chemistry & Physics
(and lower in STEM overall), however, interestingly, stronger students tend to
sort towards STEM. This is not because men are necessarily better; women still
earn higher grades in STEM, but they also earn higher grades in non-STEM and
tend to flock there. Women study more regardless.
The modeling of professor preferences is interesting.
STEM professors prefer lower average
grades and higher workloads than
non-STEM. Hmm… I wonder if that’s true of me compared to my non-science colleagues.
The model also suggests that “both STEM and non-STEM professors prefer to give
out higher grades with lower workloads in upper-division classes.” Hmm… that’s
definitely not true for me workload-wise because my standard upper-division
class is Physical Chemistry – considered the hardest and least liked by our
majors. I might prefer to give higher
grades, but I don’t actually end up doing so. The average grade in my P-Chem
classes is slightly lower than in my G-Chem classes, but not by much. The model
assumes professors prefer smaller classes (true, I think), but the weighting
factor in the model leads to lower grades in STEM classes even though there is
higher demand from students. That’s eerie. I don’t think I subconsciously give
lower grades in larger classes – students earn their grades! – but I don’t disagree
with the trend. I see it in my own classes. I’d like to think it’s because I’m
more effective at helping a larger proportion of individual students (who need
the extra help) in a smaller class. Time taken up by students in office hours
doesn’t change substantially with class size (but it does change a little).
After building and parameterizing their model, the
researchers can start testing counterfactuals and examining how this affects
the so-called STEM gap – that women disproportionately choose non-STEM areas.
The three largest factors that narrow the gap are equalizing non-grade
preferences, equalizing grade preferences, and grade norming around a B
average. There isn’t much an institution can do about the first two areas. While
much outreach has been done to encourage more women into STEM, non-grade
preferences remain – not necessarily good or bad, just different. I’m not sure
what, if anything, can be done to equalize grade preferences between men and
women. I’m certainly not going to ask women to lower their expectations and
study less. That leaves grade norming to a B average. The model suggests that this
would actually make a difference to the STEM gap, and it’s one that an
institution could institute. Mind you, this is grade norming across all areas, i.e., STEM classes would have
their grade norms moved up, while non-STEM classes would have their grade norms
moved down. I’m not sure you’d get sufficient faculty buy-in to do this in the
U.S., while institutions in other countries might already do this. Interestingly,
one of the counterfactuals that has little effect is having more women faculty
in STEM. I’m not going to comment any further on that one.
I don’t like the idea of norming to a B average. I don’t
like grade norming at all. If a student showed they understood roughly 75-80% of
the material, then I think they would be deserving of a B. (My B-range is
pegged from 70-84%.) If the average student shows less (as determined by exams,
homework, quizzes, etc), then the average student shouldn’t be earning a B.
Then again, I’m the one writing the exams and setting the level of difficulty.
If I made my exams “easier”, the average would go up. The question is: What is “average”?
In a Chronicle of Higher Education article twenty years ago, refreshing for its candor, a Dartmouth professor
writes that “we imagine our students to be at a mythical Average U., and give
the grades that they would get there.”
Maybe that’s what I’m subconsciously doing. I think C
is average, and that the average student in my average class is slightly above
that average (i.e., C+/B- borderline). When I have a stronger, smaller, more
motivated, class – the average goes up. Not because of bias, I don’t think. I’ve
tested this unsystematically by occasionally recycling final exam questions.
And now that this post is four pages long and I’m
starting to wade into the phenomena of grade inflation, I think I should hold
my flood of thoughts for the moment. You can wait eagerly (or not) for my next
post!
No comments:
Post a Comment