Saturday, August 10, 2024

Data Alchemy

I’m reading David Sumpter’s Outnumbered on “the algorithms that control our lives”. Sumpter is an applied mathematician known for Soccermatics which I haven’t read, but I have read at least one of his research papers and found his writing lucid. I wanted to read more and Outnumbered was at my local library. Convenience was enough of a filter for me to choose easier access and being lazy. Sumpter will discuss filter bubbles later in his book but today’s post is on earlier chapters.

 


All the chapters are short and very readable; there are a few tables and graphs that aid the explanations. Sumpter begins by explaining how principal component analysis (PCA) works. But he does so with interesting examples such as Facebook friend connections and who likes what, and whether you can extract interesting data to build a composite profile. He tackles the Cambridge Analytica scandal head-on, and I found compelling his argument that it was mostly hyperbole. The targeted ads being placed by algorithms aren’t as effective as the tech companies (hungry for your money) say they are.

 

The chapter I learned the most from in Part 1 of his book is titled “Impossibly Biased”. Sumpter goes through the COMPAS algorithm used to assess if someone who commited a crime is at low or high risk of re-offending. There were claims and counter-claims of the algorithm being biased. But bias is in the eyes of the beholder. With some numbers and simplified examples, Sumpter explains why if you construct any two-by-two grid, it is “impossible to have both calibration between groups and equal rates of false positives and false negatives between groups” unless the two groups you study truly behave identically for a given question you are asking. There is a mathematical proof for this. Sumpter concludes: “There isn’t an equation for fairness. Fairness is something human. It is something we feel.”

 

The chapter I found the most interesting from Part 1 is titled “The Data Alchemists”. Some of his Soccermatics work comes into the story, but Sumpter also provides examples from Spotify and those seemingly creepy ads from Facebook or Google that seem to know you. There’s also discussion of a study comparing how COMPAS does versus volunteers on Mechanical Turk. Humans, not trained as judges, do just as well as the algorithm on average. As to what the algorithms might recommend to you, Sumpter argues that they work well on a group level, but not necessarily on an individual level. Yes, it does seem spooky when you receive a targeted ad that seems to “read your mind” but the algorithms aren’t so fine-grained. They aren’t decrypting your WhatsApp messages or recording your phone conversations. Sumpter says: “The more plausible explanation is that data alchemists arefinding statistical relationships in our behavior that help target us: kids who watch Minecraft and Overwatch videos eat sandwiches in the evening.” They’re correlations that may or may not have any clear causal connections.

 

Technology will continue to advance. These algorithms might get better. But they might not. One of the challenges of machine learning with large data sets with millions of variables is that we no longer understand exactly how these algorithms work. This also means that we don’t quite understand when and how they fail. Yes, we can put in band-aid fixes to reduce the symptoms of “bias” or “hallucination” but there’s no solution to the problem. Humans are not computers. Brains are not software neural nets. Manipulating data is what algorithms do. Interpreting those manipulations to make things “work better” (whatever that means) is still both science and art. As a chemist, alchemy is the appropriate word to describe it.

No comments:

Post a Comment