Saturday, October 8, 2016

Deep-Fried Data


Sometimes reading one thing on the web leads you to another thing, and then you discover something humorous, well-written and very thought provoking. Yesterday, for me, that was Deep-Fried Data. I encourage you to read the post in its entirety; it is the text version of a talk given to the Library of Congress. The post has a link to a video if you prefer the audio-visual experience. I personally prefer the text because I can chew over the arguments slowly at my own leisure.

It’s a relatively long article (from a 15-20 minute talk), but I’ll quote three short sections to whet your appetite.

Today I'm here to talk to you about machine learning. I'd rather you hear about it from me than from your friends at school, or out on the street. Machine learning is like a deep-fat fryer. If you’ve never deep-fried something before, you think to yourself: ‘This is amazing! I bet this would work on anything!’ And it kind of does...”

“I find it helpful to think of algorithms as a dim-witted but extremely industrious graduate student, whom you don't fully trust. You want a concordance made? An index? You want them to go through ten million photos and find every picture of a horse? Perfect... You want them to draw conclusions on gender based on word use patterns? Or infer social relationships from census data? Now you need some adult supervision in the room.”

“People are pragmatic. In the absence of meaningful protection, their approach to privacy becomes ‘click OK and pray’. Every once in a while a spectacular hack shakes us up. But we have yet to see a coordinated, tragic abuse of personal information. That doesn't mean it won't happen. Remember that we live in a time when a spiritual successor to fascism is on the ascendant in a number of Western democracies. The stakes are high.”

The article is particularly thought-provoking as I have been warily watching the rise of data analytics approaches to solving the problems of higher education. When behemoth companies start plunking down gobs of money into selling products and services to universities, the faculty should really take notice. The calls for “data-driven” assessment that pervades our institutions should make us pause and ask questions. What is this data for? How is it chosen? What does it actually tell us? What is left out in the choice? How is the data reduced into a digestible sound-byte, often some numerical value? Who owns the data? How would it drive decision-making and strategic planning?

My ears now perk up when I hear the phrase “data-driven”. It’s something like “best practices”, usually implying one best practice determined by whoever is bandying the phrase. As a scientist, I’m strongly in favor of using data to support an argument, or to make a case. When I was department chair, I would go to the administration with a data-driven argument, accompanied by graphs and tables in a clear and pre-digested format to get what I needed resource-wise. It’s an effective tactic given the way the winds have been blowing in the increasingly all-administrative university. Put several of these tactics together and you get a strategy. But is it a wise strategy?

With data science programs popping up all over (such as this one at the University of Illinois), fully online of course, and costing a chunk, the lure of big-data jobs sings its siren song. While the corporate world is infatuated with Big Data, there will be plenty of takers. I’ve never taken any of these courses but I sincerely hope that students learn how to interrogate themselves as they mine their data quarries, looking for the riches hidden within. It is human nature to see patterns and weave them into a narrative. But human choices are made. Every step of the way. Humans design the algorithms. Choices were made in the underlying theoretical models. And as the layers get deeper and more complex, we start to understand less and rely more on the output happily provided by the black box.

After all, everything tastes good when it is deep-fried.

No comments:

Post a Comment