I am now midway through Superforecasting: The Art and Science of Prediction, a new book by Philip Tetlock and Dan
Gardner. So here’s part 2 of my sharing interesting things that jump out at me.
See my most recent post for part 1.
How do you make a probability prediction and what does it
mean? The authors provide a great example: “If a meteorologist says there is a
70% chance of rain and it doesn’t rain, is she wrong? Not necessarily.
Implicitly, her forecast also says there is a 30% chance it will not rain. So if it doesn’t rain, her
forecast may have been off, or she may have been exactly right. It’s not
possible to judge with only that one forecast in hand. The only way to know for
sure would be to rerun the day hundreds of times. If it rained in 70% of those
reruns, and didn’t rain in 30%, she would be [spot] on.”
Well, that makes things difficult since we aren’t stuck in
Groundhog Day. While Punxsutawney Phil might have seen his shadow in all the
reruns of the day experienced by Bill Murray, I suppose it is possible that the
flapping of a butterfly could have changed whether Spring arrived early or not.
But if you’re stuck in Groundhog Day, then you don’t know if Phil is a
superforecaster. A USA Today story from February suggests that at least in
recent years, the groundhog is no better than random.
The problem is that it takes a while to accumulate
once-a-year predictions to get a decent sample size. Meteorologists doing daily
predictions on the other hand get a shot at it 365 days a year. Every day, one
could predict the chance of rain the next day, and quickly build up a scorecard
for prediction accuracy. But that’s different from being able to rerun the day
because “initial” conditions might have changed due to a pesky wing-flapping
butterfly. Can you rerun the day? You could in a simulation. Who thought it was
a waste of time to play games?
As a computational chemist, I don’t play computer games for
recreation because I spend enough time in front of a computer. It’s not just my
research. Increasingly, activities related to teaching are in front of a
computer. Over the years I have slowly converted my “lecture” notes and
activity plans from handwritten to electronic. Service and administrative work is
mainly e-mail, teleconferencing, writing and vetting documents, and even
academic advising involves me and my student advisee looking up their degree
audit, course registration, timetable, all on the computer or other mobile
device.
That’s probably part of why I turned back to boardgames as a
hobby in the mid-to-late ‘90s. Two games in particular attracted me back to the
fold, Richard Garfield’s RoboRally
and Klaus Teuber’s The Settlers of Catan.
I used to keep statistics in those days. For example, having played over 200
games of Settlers, I could tell that
there’s more than half-a-chance of winning if you have the longest road (which
gives you 2 out of the needed 10 points to win the game). Having the largest
army (also 2 points) doesn’t do as well. But I no longer keep such statistics.
Unless I was playing solitaire, it diminished my game-playing enjoyment with
other people after a while, at least for certain types of games that were not
“simulations” or did not have a strong story arc.
However for certain types of games, such as historical
simulations (often wargames), the statistics are interesting. I recently
blogged about how the new game I’m playtesting, Bios Genesis, allows one to “replay the tape” of life’s origin. Interestingly
the feedback loop resulted in a honing of strategies given the constraints of
the rules. I even lost sleep calculating the probabilities as described in
another post. In my last five games, that have been quite robust (now that
there is a relatively stable ruleset), the Yellow player won four of the games,
and almost won the fifth. The game is interestingly asymmetric as each of the
players represents an important feature needed for life to get going: Red –
metabolism, Blue – genes, Yellow – compartmentalization, Green – negentropy.
Does this mean that compartmentalization is the key feature for getting life
started (at least within the constraints of the game rules)? Of course if
you’re trying to design a game, if you make it too “one-sided”, it won’t sell.
That being said, games are a great way to test how important
a particular feature might be. I enjoy games that have a mix of strategy and
luck. There needs to be some randomness to keep things interesting, and this
has the further advantage of allowing one to test and quantify predictions.
Suppose I think a rule-change might increase the chances of having a runaway
leader. (This means that if someone takes an early lead, it becomes
exponentially harder for all other players to catch up.) I could make a
probability-based prediction and then run the tests by playing games to see how
the predictions bear out. If the game has random moving parts (a shuffled card
deck, rolling the dice, etc.) then there are a range of outcomes, and therefore
the predictions are probabilistic.
Tetlock and Gardner work their way through a number of
features of what superforecasters have in common, and how they are different
from the rest of us who are not so good at making predictions. I discussed one
in the last post – foxes do better than hedgehogs. Here’s another. The authors
call it the “perpetual beta” – continuing to persevere and improve without
there being a final version. “There is always more trying, more failing, more
analyzing, more adjusting, and trying again.” So there’s the “grit” part of it.
Turns out, superforecasters also tend to be numerate, i.e., they have good
quantitative reasoning skills. You don’t need to have a degree in math. You
don’t need to know Bayes’ theorem. But you do need to use it qualitatively. One
might make a baseline prediction. But then with new data, one adjusts the
baseline taking into account both the new data and the strength of the prior
probability.
So how could I get better at predicting the future? While
there are no magic wands or crystal balls, there are some general principles
laid out by the authors. But even those, they claim, might improve your ability
my 10% (a prediction that they may have tested). Turns out that you have to
practice, refine, and practice some more. Funny how this sounds similar to what
I tell my students learning chemistry. Turns out that getting quick and
repeated feedback is important. Funny how this sounds similar to what I should
do as an instructor to help my students improve.
I will close this post with the continuation of the
meteorologist story from the authors described up in the second paragraph. “Of
course we’re not omnipotent beings, so we can’t rerun the day – and we can’t
judge. But people do judge… they look
at which side of ‘maybe’ (50%) the probability was on. If the forecast said
there was a 70% chance of rain and it rains, people think the forecast is
right; it it doesn’t rain, they think it was wrong.” This fundamental error,
according to the authors, can have far-ranging negative consequences
particularly in the world of high-level political discussions and decisions.
(They provide some examples.) I guess no one wants to sound wrong, and
therefore vague hedging is the norm. Sounds like fortune cookie forecasting.
Amusing, perhaps. Helpful, no. Dangerous if followed, possibly.
No comments:
Post a Comment