Saturday, April 9, 2016

Probability Predictions


I am now midway through Superforecasting: The Art and Science of Prediction, a new book by Philip Tetlock and Dan Gardner. So here’s part 2 of my sharing interesting things that jump out at me. See my most recent post for part 1.

How do you make a probability prediction and what does it mean? The authors provide a great example: “If a meteorologist says there is a 70% chance of rain and it doesn’t rain, is she wrong? Not necessarily. Implicitly, her forecast also says there is a 30% chance it will not rain. So if it doesn’t rain, her forecast may have been off, or she may have been exactly right. It’s not possible to judge with only that one forecast in hand. The only way to know for sure would be to rerun the day hundreds of times. If it rained in 70% of those reruns, and didn’t rain in 30%, she would be [spot] on.”

Well, that makes things difficult since we aren’t stuck in Groundhog Day. While Punxsutawney Phil might have seen his shadow in all the reruns of the day experienced by Bill Murray, I suppose it is possible that the flapping of a butterfly could have changed whether Spring arrived early or not. But if you’re stuck in Groundhog Day, then you don’t know if Phil is a superforecaster. A USA Today story from February suggests that at least in recent years, the groundhog is no better than random.

The problem is that it takes a while to accumulate once-a-year predictions to get a decent sample size. Meteorologists doing daily predictions on the other hand get a shot at it 365 days a year. Every day, one could predict the chance of rain the next day, and quickly build up a scorecard for prediction accuracy. But that’s different from being able to rerun the day because “initial” conditions might have changed due to a pesky wing-flapping butterfly. Can you rerun the day? You could in a simulation. Who thought it was a waste of time to play games?

As a computational chemist, I don’t play computer games for recreation because I spend enough time in front of a computer. It’s not just my research. Increasingly, activities related to teaching are in front of a computer. Over the years I have slowly converted my “lecture” notes and activity plans from handwritten to electronic. Service and administrative work is mainly e-mail, teleconferencing, writing and vetting documents, and even academic advising involves me and my student advisee looking up their degree audit, course registration, timetable, all on the computer or other mobile device.

That’s probably part of why I turned back to boardgames as a hobby in the mid-to-late ‘90s. Two games in particular attracted me back to the fold, Richard Garfield’s RoboRally and Klaus Teuber’s The Settlers of Catan. I used to keep statistics in those days. For example, having played over 200 games of Settlers, I could tell that there’s more than half-a-chance of winning if you have the longest road (which gives you 2 out of the needed 10 points to win the game). Having the largest army (also 2 points) doesn’t do as well. But I no longer keep such statistics. Unless I was playing solitaire, it diminished my game-playing enjoyment with other people after a while, at least for certain types of games that were not “simulations” or did not have a strong story arc.

However for certain types of games, such as historical simulations (often wargames), the statistics are interesting. I recently blogged about how the new game I’m playtesting, Bios Genesis, allows one to “replay the tape” of life’s origin. Interestingly the feedback loop resulted in a honing of strategies given the constraints of the rules. I even lost sleep calculating the probabilities as described in another post. In my last five games, that have been quite robust (now that there is a relatively stable ruleset), the Yellow player won four of the games, and almost won the fifth. The game is interestingly asymmetric as each of the players represents an important feature needed for life to get going: Red – metabolism, Blue – genes, Yellow – compartmentalization, Green – negentropy. Does this mean that compartmentalization is the key feature for getting life started (at least within the constraints of the game rules)? Of course if you’re trying to design a game, if you make it too “one-sided”, it won’t sell.

That being said, games are a great way to test how important a particular feature might be. I enjoy games that have a mix of strategy and luck. There needs to be some randomness to keep things interesting, and this has the further advantage of allowing one to test and quantify predictions. Suppose I think a rule-change might increase the chances of having a runaway leader. (This means that if someone takes an early lead, it becomes exponentially harder for all other players to catch up.) I could make a probability-based prediction and then run the tests by playing games to see how the predictions bear out. If the game has random moving parts (a shuffled card deck, rolling the dice, etc.) then there are a range of outcomes, and therefore the predictions are probabilistic.

Tetlock and Gardner work their way through a number of features of what superforecasters have in common, and how they are different from the rest of us who are not so good at making predictions. I discussed one in the last post – foxes do better than hedgehogs. Here’s another. The authors call it the “perpetual beta” – continuing to persevere and improve without there being a final version. “There is always more trying, more failing, more analyzing, more adjusting, and trying again.” So there’s the “grit” part of it. Turns out, superforecasters also tend to be numerate, i.e., they have good quantitative reasoning skills. You don’t need to have a degree in math. You don’t need to know Bayes’ theorem. But you do need to use it qualitatively. One might make a baseline prediction. But then with new data, one adjusts the baseline taking into account both the new data and the strength of the prior probability.

So how could I get better at predicting the future? While there are no magic wands or crystal balls, there are some general principles laid out by the authors. But even those, they claim, might improve your ability my 10% (a prediction that they may have tested). Turns out that you have to practice, refine, and practice some more. Funny how this sounds similar to what I tell my students learning chemistry. Turns out that getting quick and repeated feedback is important. Funny how this sounds similar to what I should do as an instructor to help my students improve.

I will close this post with the continuation of the meteorologist story from the authors described up in the second paragraph. “Of course we’re not omnipotent beings, so we can’t rerun the day – and we can’t judge. But people do judge… they look at which side of ‘maybe’ (50%) the probability was on. If the forecast said there was a 70% chance of rain and it rains, people think the forecast is right; it it doesn’t rain, they think it was wrong.” This fundamental error, according to the authors, can have far-ranging negative consequences particularly in the world of high-level political discussions and decisions. (They provide some examples.) I guess no one wants to sound wrong, and therefore vague hedging is the norm. Sounds like fortune cookie forecasting. Amusing, perhaps. Helpful, no. Dangerous if followed, possibly.

No comments:

Post a Comment