Tuesday, August 20, 2024

Last Ditch?

I have regularly contemplated ditching the textbook in the classes I most regularly teach – the year-long sequence of general chemistry and physical chemistry. In my first year as a new faculty member, I used the books the department had been using. We have multiple sections of general chemistry, and the instructors decide as a group which textbook to use. There is only one section of physical chemistry, however, giving full freedom for the sole instructor to switch books.

 

My general rule is that I should use a textbook at least twice before deciding whether to ditch it or keep using it. The first year I use a new textbook I’m just getting used to it. I should not be too quick to write it off unless it seems like an utter disaster. And I would not pick an utter disaster in the first place. In P-Chem 2 (Statistical Thermodynamics), I went through three textbooks giving them two years each. While each textbook was subsequently better than its predecessor, none was a good fit with the way I wanted to teach statistical thermodynamics. (All three were “standard” P-Chem textbooks used at many universities.) Hence, I did my first ditch and switched to creating worksheets and problem sets. Many years have passed since and I’ve been happy with my decision to ditch.

 

That third textbook worked well for my approach to teaching P-Chem 1 (Quantum Chemistry). They even had a standalone version of the textbook for the semester-long course, rather than the full year of P-Chem. I used it for years. But two years ago, I decided to rearrange some content so I could incorporate modern valence bond theory and also introduce hands-on electronic structure calculations during class sections. Since my worksheets had worked well for P-Chem 2, it was a good time to ditch the textbook for P-Chem 1. I converted my lecture notes into similarly formatted worksheets. I like the new arrangement but it still needs tweaking. There’s always room for improvement!

 

For G-Chem, there are more constraints to making textbook changes because it is a group decision. And when you are running more than ten sections of the course, this also means a diversity of opinions from the many instructors. I think we’ve used four different textbooks in the 20-25 years I’ve been at my current institution. But in the last several years I’ve become increasingly disillusioned with the present textbook and, for that matter, any online homework management system (which I feel has unfortunately become the driving force in textbook adoption). So this coming fall I’m breaking with tradition and will ditch the textbook. Since I will be teaching the small honors sections of both G-Chem 1 and G-Chem 2 this coming academic year, there will be little disruption with much fewer students switching sections. At least that’s my hope.

 

My plan is to use an open access textbook as a resource. It’s a decent book and covers the content reasonably well, though not as good as the very best commercial texts out there. I will supplement these with Course Notes for each class. It will take time for me to generate these extra materials, but I think it’s worth the effort. I introduced Study Guides a couple of years ago in G-Chem, and I think these have worked well based on student feedback. They are still being tweaked as I re-work the material in this new iteration of G-Chem. I will retain most of the topics following the usual sequence, with a few minor changes. We’ll see how it goes. If it works well, this might be the last ditch where textbooks are concerned! (Unless I get to teach biochemistry again, and I’ll have to decide what to do then.)

Sunday, August 11, 2024

Also Liked

I’m in Part 2 of David Sumpter’s Outnumbered. There’s a discussion on superforecasters and how they continuously update their expectations by weighing and incorporating new information. Sumpter also compares the performance of prediction markets versus algorithmic crunching of polling data from FiveThirtyEight. There’s no clear winner. Crowd-sourcing sometimes helps, but sometimes leads you astray.

 

The chapter that most grabbed my attention analyzes the “Also Liked” effect, named after Amazon’s algorithm showing you that “customers who bought this item also bought…” or “you might also like…” It’s clear that users find this helpful to narrow down the plethora of choices which would otherwise be overwhelming. The fact that many other apps do the same thing underscores its effectiveness. The algorithm recommends. You’re welcome to ignore its recommendations, but let’s face it, most of us don’t want to wade through gobs of irrelevant stuff.

 

Sumpter runs a few simple simulations to see how these algorithms, over time, filter what you see and don’t see. The results are interesting. All it takes is a few early “likes” that make the appropriate connections to catapult something into the few vaunted choices that are displayed. That’s even true for a small data set. If you considered thousands or millions of products, the early advantages compound exponentially. Slightly-better-sellers become best-sellers in a self-fulfilling prophecy, algorithmically controlled. When you upvote a news story or website in social media, the upvote quickly cascades into more upvoting.

 

A related phenomena relevant to academia are citation statistics. Sumpter provides a data plot showing proportion of articles versus number of times they are cited. The data follows a power law (you see a straight line in a log-log plot). Sumpter writes: “Power laws are a sign of vast inequality.” When examining the data over time, it starts to resemble the “Also Liked” effect. The rich become richer in fame. The poor get relegated to obscurity. We now have the h-index (for authors) and the impact factor (for journals) and a vicious cycle ensues. But more broadly with respect to ubiquitous social media, Sumpter writes: “Inequality is one of the biggest challenges facing society, and it is exacerbated by our lives online."

 

From there, Sumpter moves to examine echo chambers and filter bubbles. There’s some interesting old data looking at political blogs from 2004 in the leadup to the U.S. presidential election that year. Democrat and Republican blogs linked nearly exclusively to their own writers with little cross-over. But all of them still linked to the same limited mass media. By 2016, this had changed. Mass media divided into and joined their respective echo chambers. But with the implementation of “Also Liked”, what you were presented when you logged on to social media became filtered more and more towards what you liked or what your friends liked. That’s the filter bubble. It also explains why conspiracy theories proliferate and gain steam. But a strange thing happens if the conspiracy theory becomes too popular too quickly. When enough naysayers and doubters “like” each other’s responses, their views can start to eclipse those of the original conspirators.

 

Sumpter presents analyses by Michela Del Vicario and her team looking at the echo chambers of scientists versus conspiracy theorists on Facebook. Both are echo chambers and filter bubbles. Most of the general population (at least in Italy) tune them out. The features of the posts are interesting. Analyzing the words, “the general rule is that the higher the activity of the user, the more negative words they use… the effect was stronger for scientists… and the more active they were on Facebook, the more negative they became. Becoming a dedicated member of an echo chamber is not a route to happiness.” Worse in my opinion: “Not only are conspiracy theorists less grumpy than scientists, their shared posts are also more popular than those made about science news. This is particularly worrying since many of the conspiracy theories are about science.”

 

The increasing isolation of individuals within modern society, and the speed at which the internet proliferates ideas, have made it very hard to stop and think. We’re too busy reacting. I used to “like” posts from Facebook friends, back when I used it. But if I ever get back to it, I might stop doing so. Instead, maybe I should spend more time making conscious choices about personal connections and not feed the filtering algorithms. “Also liked” seems like a boon, but it can also be an insidious curse.

Saturday, August 10, 2024

Data Alchemy

I’m reading David Sumpter’s Outnumbered on “the algorithms that control our lives”. Sumpter is an applied mathematician known for Soccermatics which I haven’t read, but I have read at least one of his research papers and found his writing lucid. I wanted to read more and Outnumbered was at my local library. Convenience was enough of a filter for me to choose easier access and being lazy. Sumpter will discuss filter bubbles later in his book but today’s post is on earlier chapters.

 


All the chapters are short and very readable; there are a few tables and graphs that aid the explanations. Sumpter begins by explaining how principal component analysis (PCA) works. But he does so with interesting examples such as Facebook friend connections and who likes what, and whether you can extract interesting data to build a composite profile. He tackles the Cambridge Analytica scandal head-on, and I found compelling his argument that it was mostly hyperbole. The targeted ads being placed by algorithms aren’t as effective as the tech companies (hungry for your money) say they are.

 

The chapter I learned the most from in Part 1 of his book is titled “Impossibly Biased”. Sumpter goes through the COMPAS algorithm used to assess if someone who commited a crime is at low or high risk of re-offending. There were claims and counter-claims of the algorithm being biased. But bias is in the eyes of the beholder. With some numbers and simplified examples, Sumpter explains why if you construct any two-by-two grid, it is “impossible to have both calibration between groups and equal rates of false positives and false negatives between groups” unless the two groups you study truly behave identically for a given question you are asking. There is a mathematical proof for this. Sumpter concludes: “There isn’t an equation for fairness. Fairness is something human. It is something we feel.”

 

The chapter I found the most interesting from Part 1 is titled “The Data Alchemists”. Some of his Soccermatics work comes into the story, but Sumpter also provides examples from Spotify and those seemingly creepy ads from Facebook or Google that seem to know you. There’s also discussion of a study comparing how COMPAS does versus volunteers on Mechanical Turk. Humans, not trained as judges, do just as well as the algorithm on average. As to what the algorithms might recommend to you, Sumpter argues that they work well on a group level, but not necessarily on an individual level. Yes, it does seem spooky when you receive a targeted ad that seems to “read your mind” but the algorithms aren’t so fine-grained. They aren’t decrypting your WhatsApp messages or recording your phone conversations. Sumpter says: “The more plausible explanation is that data alchemists arefinding statistical relationships in our behavior that help target us: kids who watch Minecraft and Overwatch videos eat sandwiches in the evening.” They’re correlations that may or may not have any clear causal connections.

 

Technology will continue to advance. These algorithms might get better. But they might not. One of the challenges of machine learning with large data sets with millions of variables is that we no longer understand exactly how these algorithms work. This also means that we don’t quite understand when and how they fail. Yes, we can put in band-aid fixes to reduce the symptoms of “bias” or “hallucination” but there’s no solution to the problem. Humans are not computers. Brains are not software neural nets. Manipulating data is what algorithms do. Interpreting those manipulations to make things “work better” (whatever that means) is still both science and art. As a chemist, alchemy is the appropriate word to describe it.

Sunday, August 4, 2024

Sustainability (with Data)

I’m reading Not the End of the World by Hannah Ritchie, a data scientist and communicator. The book leverages data analysis to strike an optimistic tone about “how we can be the first generation to build a sustainable planet”, the subtitle of the book. The influence of Hans Rosling can be seen in her work – while there are things that are getting worse for the planet and humankind, many things are also getting better. Each chapter comes with things we should be working on, and things we shouldn’t overly stress about. Ritchie doesn’t sugarcoat the data, but she interprets it in context.

 


She begins with the provocative idea that “the world has never been sustainable” making use of the United Nations definition of sustainability as “meeting the needs of the present without compromising the ability of future generations to meet their own needs”. Humans have been changing the world and its flora and fauna for thousands of years, be it through expanding agriculture or hunting larger beasts to extinction. But there have been many improvements in human health and longevity and improving standards of living. On the other hand, we’ve been gobbling up resources and it becomes questionable what we’re doing to future generations. Ritchie tackles seven problems: air pollution, climate change with temperature rise, deforestation, the food network, loss of biodiversity, plastics in the ocean, and overfishing.

 

I’m lucky to live in an area where the air quality is relatively good. When I was in graduate school in the ‘90s, folks who had lived in that same area in the ‘70s would tell me horror stories of the smog. The Clean Air Act in the U.S., implemented in 1970 and signed by Nixon (who also created the EPA, who apparently didn’t care about environmental issues, but who cared what voters and the public thought). Ritchie, based in the U.K., provides plots showing the peak of air pollution from the ‘50s to the ‘70s and how the emission of various gases and black carbon have plummeted approaching 18th century levels. Even China has passed its air pollution peak, and that wasn’t just because they hosted the Olympics. Ritchie discusses the need to provide access to clean cooking fuels, remove sulfur from fossil fuels, and end winter crop-burning (something I didn’t know about).

 

Climate change is tricky – you’d assume that everyone should just switch to “renewable” energy as soon as possible, but that’s not so easy because greenhouse gas emissions has multiple sources. There are trade-offs in each of these energy-hungry sectors. Three-quarters of our transport emissions come from driving on roads. I was surprised that shipping and aviation contributed only ten percent each. Electric vehicles help but there are tradeoffs; better still would be redesigning our cities and living spaces to reduce the use of cars. What caught my attention was food. Producing beef is an order of magnitude worse than chicken; plant-based proteins are even better. We also need to reduce overconsumption and wastage. I also learned that eating “local” or “organic” aren’t necessarily better. It depends. Ritchie provides both the data and the analysis.

 

Growing up in the tropics, I appreciated Ritchie’s nuanced discussion on deforestation and the protection of biodiversity. Yes, humans have cut down a third of the world’s forest mainly to make room for agriculture. But you can start to see the recovery at least in some “richer” countries. I remember the controversies surrounding palm oil some years ago. Ritchie tackles the health arguments and myths, but she also discusses the productivity of palm oil per hectare compared to almost any other common oil crops. Palm oil is more sparing in terms of land-use. The issue of beef comes up again because it’s particularly greedy for land-use where we’re growing crops mainly to feed the cows. Lamb actually takes a little more land use, but has lower emissions than beef. It was however heartening to see Ritchie’s data that the world may have passed peak agricultural land use and looks like it might also be reaching peak fertilizer use. Humans have been able to increase crop yields to feed the planet through a variety of strategies. Norman Borlaug gets several mentions.

 

On the topic of biodiversity, I learned from Ritchie’s charts that in terms of global biomass measured by carbon, plants are 82%, bacteria are 13%, fungi are 2%, and animals are 0.4%. Within the animal category, humans are 2.5%, livestock are 4%, and the largest shares go to fish at 29% and arthropods at 42%. Are we humans causing the Earth’s sixth mass extinction? Depends on how you look at it. Overall, biodiversity is decreasing but there’s a lot of variation. Some species are showing increases or recoveries; others continue down the path to extinction. And ecology is complex. You think you can do this one thing to affect this one thing, and then find out that there are many unanticipated knock-on effects. Ritchie also provides graphs showing the leveling of tonnage in wild seafood catch compared to aquaculture; and there’s also a bar chart on the carbon footprint for different kinds of fish. Apparently sardines have a very low carbon footprint. (Lobster and flounder are very high.) Tuna is a little better than chicken, and salmon does even better in this regard.

 

What I liked about Ritchie’s book is that it made me stop and think about the complex web of sustainability and ecosystems. I might even make some gradual lifestyle changes in what I eat, although as it is I rarely eat beef or lamb. For someone living in the U.S., my consumption of energy is relatively low and I buy very little outside of food. I won’t give up driving just yet (although my mileage per year is probably on the lower side). I could start composting and do a better job sorting my trash and not wasting stuff. I thought Ritchie did an effective job using the data to marshal arguments, and it’s something I could improve in my teaching to help students do so better. Ritchie’s book is a very accessible read and I recommend it.