I’ve told you about Tim Harford’s new book, The Data Detective. His blog continues to be interesting, and his post on “The Tyranny of Spreadsheets” sparked a few thoughts. I recommend reading his post in full. It’s interesting and witty. The tale begins with 16,000 “missing” Covid cases, and the culprit was Microsoft Excel – or at least older file versions. You’ll learn about how spreadsheet computer programs got started, how they make our lives easier, and how they’re not the best suited thing for, say, genetics researchers.
I use Excel in my research. Data gets parked there and then algebraic manipulations are used to turn some numbers into other numbers. I encourage my students to use Excel and take advantage of entering in formulae for calculations, so they don’t make mistakes by hand. It also allows handy and quick analysis by some simple manipulations. Fill Down or Fill Right are wonderful inventions!
But I don’t solely use Excel. Particularly for larger data sets that require more advanced data manipulation. (To be clear, I use the word “manipulation” in a neutral sense and not to indicate I’m trying to fudge or twist the data.) It helps that I can write code. Most of my students can’t (since they’re mostly undergraduates majoring in chemistry or biochemistry), but a few can and I’ve recently encouraged my students to take the Intro to Computing class offered at my university when they made the switch from C to Python, and revamped the curriculum to emphasize computational thinking.
Data can be fumbled. And automated functions in a data processing program can mislead you and severely compound errors if you’re not careful. My students sometimes learn this the hard way, and it’s a good lesson in the importance of thinking carefully about how you’re setting up those data manipulations. I’ve had my own Excel fumbles. It helps that I’ve built up an intuition over the years so a sixth sense tingles when a number looks suspicious and I double or triple check. Too bad it only works for the narrow methodology of my expertise. Being a mediocre coder also reminds me to be extra careful. I write in little tests to check both the integrity of the data and my code. This means that sometimes things take a little longer at the initial stages, but once I’m confident everything’s working fine, the analyses proceed quickly.
Reading Harford’s post was an excellent reminder
not to let my guard down. Also, I need to make sure I use the newer Excel file
formats, since I have several old templates which are still circulating in my
folders. Excel's strength - computation - the automation of mathematics, is also its greatest weakness. It is too clever by half.
No comments:
Post a Comment