Monday, October 23, 2023

Tests and Polls

I had not considered that polling has remarkably much in common with school exams and tests. This epiphany comes from reading Measuring Up by Daniel Koretz, an education professor at Harvard. His area of expertise is the impact of high-stakes testing in American schools at the K-12 level. I’m only read the first three chapters and there is already lots of meat for me to chew on. Koretz begins the prologue by noting the ubiquity of testing, and that “achievement testing seems reassuringly straightforward and commonsensical: we give students tasks to perform, see how they do on them, and thereby judge how successful they or their schools are.”

 

Then he proceeds to dismantle the notion. From what I gather the point of the book is to give the reader a sense of the complex enterprise of achievement testing, and discuss why “test scores are widely misunderstood and misused”. And that misuse has far-reaching ramifications. Koretz knows that he’s wading into a minefield. He thinks that test scores can be valuable but only if one understands their strengths and limitations. The problem with us human beings is that we want quick and easy answers. We quickly gravitate to what numbers and things we can count – surprising, perhaps, given the general phobia that many have towards math.

 

The fundamental issue, according to Koretz, is that “test scores do not provide a direct and complete measure of educational achievement. Rather, they are incomplete measures, proxies for the more comprehensive measures that we would ideally use but that are generally unavailable to us.” Why are they incomplete? First, “tests can measure only a subset of the goals of education”. Second, “tests are generally very small samples of behavior that we use to make estimates of students’ mastery of very large domains of knowledge and skill.” No surprises here.

 

So how is testing like polling? When conducting a poll we’re using the responses from a thousand or so people as a proxy to get a sense of how millions of people might answer those questions. Getting a representative sample is therefore crucial to polling. In the same way, when constructing a test, our hope is that by testing a limit yet representative sample of the subject matter we are getting a measure (of sorts) of the larger range of knowledge and/or skills we want our students to have learned. Just as it isn’t feasible to poll everybody, it isn’t feasible to ask questions to test every tidbit of knowledge covered in class.

 

There’s an additional piece I hadn’t considered. In a poll for say voting outcomes, what you care about is what it tells you about how millions might have responded. You’re not actually concerned about the impact of the thousand people you polled – their specific votes won’t change the outcome of a large election. Similarly, in a test, we shouldn’t be worried about how students answered a particular question, what we care about is “the larger set of knowledge and skills it represents”. That being said, if all my G-Chem students bombed Lewis structures on an exam, I’d be concerned that they haven’t learned one of the most important things in the course.

 

Koretz also argues for the importance of standardization in exams. What it means is uniformity: “examinees face the same tasks, administered in the same manner and scored in the same way… to avoid irrelevant factors that might distort comparisons among individuals.” How do you go about doing this? By thinking carefully about what you’re asking on the test and whether your questions allow you to differentiate the knowledge and skills between different examinees in the subject matter. Koretz emphasizes that you are not creating such differences by your test, rather you are “revealing differences that already existed”. Another point that struck home for me is when Koretz argues that validity “is the single most important criterion for evaluating achievement testing”. But it’s not the validity of the tests we’re concerned about, but rather the validity of what we are inferring from the test scores.

 

What about the incompleteness of the test with respect to the goals of education? Koretz cites the work of E. F. Lindquist, a giant in test construction – versions of which are used all over the country. Lindquist argues that “only some of these goals of education are amenable standardized testing” and therefore one should never use only test scores to draw larger conclusions about students’ abilities. Furthermore, what instructors focus on in the classroom, being a proxy for the broader goal of education, can be far removed. The utility of what students are learning today may in some cases be revealed only years down the road and in an oblique fashion such that one doesn’t realize that the schooling provided the foundation knowledge or skill.

 

All this make me think about how I construct my exams. I do try to ensure that the questions I ask are representative of the larger domain area. Can I validly draw the inference that a student who does well on my exams has a good grasp of the domain area? To some extent, but I’m not sure I have the quantitative data to prove it. There is no ‘clean’ experiment I’ve carried to separate out any confounding variables. Do my exams differentiate the students? Yes, to some extent. Failing is rare in my classes, but there can be quite a number of C’s (and a few D’s) depending on the class and the year. There can also be quite a number of A’s and B’s, especially if I’m teaching the Honors section – the students are indeed academically stronger. Because my class sizes are small (not exceeding 40), there can be fluctuations in the distribution from one year to the next. But by and large, my grades have stayed steady over the years.

 

Reading Measuring Up reminded me of the incompleteness of exams. In G-Chem and P-Chem, my exams are the lion’s share of the grade. I think exams are reasonably good proxies in these classes, but not necessarily for other classes I’ve taught (labs, special topics, research methods). It made me stop to think about how I weight different aspects of my class and how I think about what a student’s grade in the class represents. I recognize that my interpretation of these matters may differ significantly from my colleagues, especially those outside the natural sciences. And that’s okay. I’m looking forward to reading the rest of Koretz’s book!

Sunday, October 22, 2023

Footsteps of Orpheus

 

Why are living humans so intrigued about what happens after we die? Why do we think that the world of the dead is like a shadowy world of the living? Why are people enthralled by stories of ghosts? For some, the great unknown becomes a growing itch that needs to be scratched. The unknown of what comes next becomes so terrifying that some folks with money and resources to spare will go through great lengths to cheat death. Thus begins a quest for the magic pill, the elixir of life, the holy grail, the philosopher’s stone, and let’s not forget the name Voldermort chose for himself and his obsession with exploring deeper magic.

 


Jonathan Stroud taps into this longing more explicitly in the fourth book of his Lockwood & Co series, The Creeping Shadow. It’s an apt title. There is a ghostly apparition that takes on the title, and Lockwood’s intrepid team are hired to suss out this particular manifestation of The Problem. But the title could also refer to that fear of mortality coming upon the aged, looking for a way to stave off aging and death. The Problem – the sudden increased appearances of hauntings and ghosts in England – is the backdrop for Lockwood & Co, and we get more of a glimpse in this fourth book of its potential origins.

 

To get rid of a ghost, one needs to find its Source – often the bones of the deceased but not always – and wrap it with silver. These sources or relics are then destroyed by placing them in a fiery furnace. The challenge is that only children and youth can see the apparitions, and thus they have to be the investigators to locate and disable the ghosts. Their weapons and protection are chemical: salt, iron, silver, and magnesium for explosive effect. The team’s historian cum scientist, George Cubbins, discusses the situation they are facing when there is an outbreak of ghosts: “… ordinary Sources represent weak points, where Visitors can slip through from… wherever it is they ought to be. Imagine them as holes worn in an old fabric. Like when the seat of your jeans wears through… The fabric gets tin, then stringy, then widens to an actual hole… [then] something else comes through.”

 

Implicit in the argument is that there is a place for the dead and a place for the living. When the wall between the two begins to break down, that’s when Problems begin. Apparitions pass into the world of the living – and can be deadly with ghost-touch. But the journey could proceed in the opposite direction. And in Book 4, we get a glimpse of what it might be like on the other side. As far back as Homer, we have imbibed the idea that the world of the dead is like the world of the living, only fainter, shadier, colder, and less energetic. It’s where Hades rules. The living are not welcome. But intrepid adventurers trying to reverse death make the journey. Thus, we have Orpheus in the Underworld. Or a group of kids visiting the Upside Down in Stranger Things.

 

The young think they have forever. The older and more aged – they’re the ones worrying about how to cheat death. We see this more starkly in The Creeping Shadow. The adults are up to no good, trying to follow in the footsteps of Orpheus – maybe to find a solution to the malady of entropy triumphant. Interestingly, the book features two groups of adults: The aptly named Orpheus society is a hard-to-access ivory tower akin to academics studying The Problem. But in competition is an Institute churning out weapons, defenses, sensors, and other doodads – an industrial counterpart that’s exploring The Problem while trying to profit from it. The researchers are referred to as scientists. I suppose that’s what science does. We scientists are constantly poking our nose into the unseen. We want to know the unseen mysterious forces that govern the world and perhaps tame them into technology. The science of aging is big money today as it was during the days of patronage to alchemists. We don’t have the art and music of Orpheus to gain access to Hades’ kingdom, but we have our tools of science nevertheless to follow in his footsteps.

 

I cannot help but feel that we have a lack of imagination when it comes to the afterlife. Why should it be a paler version of life? The version of the scientist is no better. When you hit thermodynamic equilibrium, there is no longer any free energy to do any work. That would be a very boring place to be. So if Stroud and others can drum up excitement for the adventures between the dead and undead, who am I to complain? I certainly have enjoyed his books so far.


P.S. It looks like the adventures of Lockwood & Co are shaping up to a dramatic conclusion in the fifth and final book. I just need to be patient and wait for my library copy to arrive!

Friday, October 20, 2023

Chemistry Prompt Engineering

 

After my initial foray into ChatGPT in March and April, I haven’t used it much. The context in which I used ChatGPT was to explore how it can be leveraged in chemical education. Clearly it has many limitations, but it could be useful to students in generating study guides, test questions, initial ideas for a research topic, and bits of code. Critiquing ChatGPT’s answers to chemistry questions could potentially help sharpen a student’s conceptual understanding as they have to think carefully about the answers, differentiating the right from the wrong.

 

I had given little thought to how GPT might aid my research or the chemical enterprise more generally. I have seen A.I. methods used for materials discovery, retrosynthetic analysis, and cheminformatics; but these were usually optimized towards the particular problem to be solved rather than utilizing a more general LLM (large language model). So it was clickbait for me when I saw the following title and abstract for a recently published paper (shown below).

 



Reading through the paper, the “tests” are somewhat limited in scope, but I appreciated how the authors organized their investigation around different aspects of the multifaceted chemical research enterprise. Not surprisingly GPT-4 can do simple tasks reasonably well, though it gets some things wrong. The examples are interesting and somewhat illuminating. In one case, after inputting temperature and vapor pressure data, GPT-4 is asked to find the boiling point, and its results are also compared to a Bayesian optimization. (GPT does worse.) In another case, GPT-4 is used to generate python code to control a robot arm.

 

The authors include several caveats. Molecular recognition is still a problem for GPT-4, but one could get GPT-4 to talk to another modeling system (it already does so with Wolfram to do math) that can handle the coding and decoding of molecules cheminformatics-style. GPT-4’s database doesn’t always include the most recent chemical literature. Again, one could build a local model that includes this – which has the advantage of keeping data local and proprietary in chemical industry. GPT-4 still gets things “wrong” but that’s because GPT isn’t optimized to get things right but to sound plausible. Will GPT get better? Probably. But for cutting-edge work in the chemical enterprise, I expect local specifically-trained models with chemical goals in mind to do better. The authors also consider how one might define “language objects” pertaining to chemistry and this might be an approach worth further testing.

 

Here's a picture of what the authors think GPT-4 can and cannot do. I add the caveat that the example prompts they used were limited so even in the areas in which they have green check marks, I would say that it works better in some subareas than others. None of these have been “solved” by GPT.

 


 

Wednesday, October 18, 2023

At the Hump

 

It’s Wednesday. Hump Day. I’m having my least intense week since the beginning of the semester. I was even able to leave work earlier today and run some errands after work such as grocery shopping. It’s also Hump Week of the Semester. I’m in Week 8, and half the instructional days have gone by. We don’t have a Fall break/holiday this year – something to do with how the calendar works – so my students are feeling tired at this point. Me too. But being able to take it easier this week (I didn’t give any exams) has been nice. I did have a P-Chem problem set due but surprisingly my office hours were not as heavily attended this week.

 

Why am I having such an intense semester? Because I’m teaching a class that’s brand new to me but also workload intensive: First-semester biochemistry! I’m very much enjoying teaching the class but the prep has been relentless. My goal was to maintain being three weeks ahead in terms of detailed class prep. Thus far I’ve succeeded, but I’ve been working slightly more hours every week compared to my average, and almost all my time has gone to class prep. Since I keep a timelog, I know that in September slightly over 80% of my time went to class prep. That’s probably the highest it has been except my first year as a professor. It might be dipping down slightly for October (closer to 75%) but we’ll see where the numbers stand at the end of the month. I’ve done close to zero research although I have managed some reading and there are still committee meetings and admin work.

 

I’m now in the middle of enzyme kinetics in Biochemistry. Students seemed a little shell-shocked by the full derivation of Michaelis-Menten. We’ve also talked about how the equations and plots change when a competitive inhibitor is present. Non-competitive and mixed inhibitors are up next. That will bring the protein unit to a close. Nucleic acids will be up next. I think the first exam in Biochem went well. The class average was in the mid-B range. I’d like to think it means students are on average understanding the material.

 

We’ve just finished the hydrogen atom in my Quantum Chemistry class. That’s about right since we’re smack in the middle of the semester. The second exam is coming up next week. The results for the first exam were bimodal – not a surprise for a P-Chem course, and since it’s a small class, deviations from a normal distribution should be expected. One new thing I’m introducing into P-Chem this year is some computational work in class. The one exercise we’ve done went smoothly. Three more to go. Overall, I think the class is going well. I’m enjoying it – but then I’m a quantum chemist by training.

 

I have the Honors general chemistry class this semester. We’re smack in the middle of Lewis structures. We went through 20+ “easier” structures today, culminating in sulfur dioxide – which illustrates the four main guidelines for drawing good Lewis structures. Next class will be fully small group work and students will work their way through “harder” structures. I always enjoy teaching general chemistry and I teach it every year, and chemical bonding is one of my specialties so I particularly like teaching this unit! I’ve prefaced the unit with the general bonding curve; I like this approach having used it for quite a few years.

 

Overall, I’m having a good Hump Week. Even though so much of my time is going to teaching, I’m enjoying it. I feel I’ve built a good rapport with my students in all three classes. We’ll see how the rest of the semester goes but it’s nice to have made it to the hump!

Sunday, October 15, 2023

Positional Information

 

I’m reading The Master Builder by development biologist and professor Alfonso Martinez Arias. The first bit reminded me of In Search of Cell History, but then Arias moves on to discussing embryology, a topic I am not well-versed in. All, I can say is that I’m in awe. It’s a wonder that organisms develop the way they do. So many things can go wrong, and sometimes things do go awry. But evolution has honed a successful protocol to turn a single cell into a monster – the human being is made up of trillions of cells and over two hundred different cell types, sitting in the appropriate location to carry out their specific functions.

 


How does this happen? Arias tells the story of famed biologist Lewis Wolpert (who passed away two years ago) puzzling over why even though we humans have different sized hands, our finger sizes are always proportioned to the size of our own hands. Wolpert thought that “cells either receive or enact instructions about what they do based on their position within a group of cells. He called this positional information.” He imagined a chemical (calling it a morphogen) that “would leak from one side of [a] sheet, diffuse across it, generating a gradient. Then he posited that different concentrations of the morphogen would be read by the cells… the meaning of the message to a cell depends on how far away the cell is from the source”.

 

This made me think about my research projects on elucidating proto-metabolic cycles at the origin of life. While I’m taking into account the effects of concentration, I have not considered a spatial concentration gradient that might lead to differentiation of proto-cellular functions. It strikes me that I need to really think about analog signaling. As a quantum chemist who tends to focus on one or two or three molecules interacting with each other, my isolated digital point of view is simply too narrow. I’ve been starting to build in flux into my models, but in a steady state situation. I hadn’t been thinking about how different concentrations may trigger a proto-metabolic cycle to behave very differently.

 

And what did the biologists find when they hunted for the morphogen? A protein, which they named Sonic Hedgehog (after the video game). It’s quite the amazing protein. Arias writes that “different tissues interpret the same [Sonic] signal differently and that the function of the signal is not so much to instruct but to organize, to define the domains in which cells exercise their options. Whether Sonic comes from a mouse, fish, or bead, if it’s placed in a chicken limb bud, it will inspire cells to build extra digits, and if it’s placed in the mouth, it will inspire them to build teeth that the organism’s ancestors haven’t had for millions of years.” The upshot is that “cells in different locations are different because of who their neighbors are and the conversational partners they’re encountering”.

 

Another new thing I learned was somitogenesis – the process whereby the body extends in time. There exist particular “pairs of cysts of mesodermal cells called somites, which serve as a kind of yardstick for the growing body.” There’s a fixed clock for this “precisely timed pattern of activity”. In thinking about my research, not only do I have to think about how to include spatial concentration gradients in the model, I need to think about the timing. There’s the diffusion limit, but possibly all sorts of room for play depending on what else is in the proto-cellular environment. Hmmm… lots of food for thought since I don’t know how to do this yet for the systems I’m studying.

 

Two chapters later in Arias’ book, I’m reading about how immortal cancer cells break the Hayflick limit. That’s not news to me. I’d also known about the higher levels of the enzyme telomerase in such cells. What struck me is how to think about it. The perspective Arias provides is that in normally-behaving cells, genes are subject to the rule of the cell. Living is the cell’s business. But in HeLa and other such cells, “the genome, in hijacking the cell, puts itself first.” But this isn’t the immortality you want because it’s not eternal youth. The cells are aging and going uncontrollably haywire.

 

I’ve been focusing on autocatalytic cycles in my research. I think they’re crucial to how life started and they kill two birds with one stone by explaining both growth and selectivity simultaneously. But it’s a Goldilocks situation. Autocatalysis quickly vacuums up food and the cycle expands into hypercycles. Parasitic reactions begin to temper this, but too much parasitism on the cycles and the system collapses. I’m starting to think that the autocatalytic cycle run amok is analogous to what these cancers are doing. Can autocatalytic cycles be tamed? Maybe by introducing positional information and a concentration gradient. I don’t know how to do this either but I now have a glimmer of perhaps how to proceed.

 

And this is why I’m reminded of the value of reading outside my field!

 

P.S. In the same chapter, I’m learning about organoids that can be created from stem cells. Miniguts, minibrains, these seem like the prelude to a Tleilaxu business.