Forecasting the financial markets is incredibly difficult, despite what the pundits on CNBC would have us believe. Indeed, Philip Tetlock has documented the overwhelming futility of such efforts in his research. But what if some rare individuals are truly prescient forecasters? What can we learn from how those people think? That is the subject of Tetlock’s newest book.
Almost five years ago, I wrote an article titled “No More Stupid Forecasts!” It was about the work of Tetlock, a professor of management and psychology at Wharton. For more than 20 years, Tetlock studied the predictions of experts, collecting 27,450 well-defined predictions about “clear questions whose answers can later be shown to be indisputably true or false.” The result was unsurprising: for the most part, the experts performed no better than a dart-throwing chimpanzee.
So why has Tetlock written a new book (co-authored with Dan Gardner) about “superforecasters,” and how people can train to be superforecasters? Has he discarded his earlier skepticism in the interest of writing a “Freakonomics”-type bestseller?
Are some forecasters better than others?
In his earlier book, Tetlock did make note of the fact that some forecasters tended to do better than others. To distinguish good from bad forecasters, he characterized them, respectively, as “foxes” and “hedgehogs.”
The parable of the fox and the hedgehog comes from the Greek poet Archilochus, who said, “The fox knows many things, but the hedgehog knows one big thing.” The philosopher Isaiah Berlin later turned this line into a famous essay.
Tetlock found that hedgehogs, who were certain about the one big thing they believed they knew, were worse forecasters – even (and, in fact, especially) about that one big thing – than foxes, who knew many little things but were uncertain about what they knew. The forecasters who were wracked with uncertainty did better at forecasting than those who were not in the least wracked with uncertainty.
When I taught statistics years ago in the business school of the University of Illinois at Chicago, I told the students whenever they saw a statistic quoted, they should ask themselves, “How can they know that?” In this case it is worth asking, “How can Tetlock know that?” How does he know that foxes are better forecasters than hedgehogs?
In his book Superforecasting, he explains how he knows. When forecasters make a forecast, they are asked to assign it a probability. Consider for example weather forecasts. Weather forecasts come with a probability: “50% chance of rain.” If a weather forecaster forecasts 50% chance of rain repeatedly, and in fact 50% of the time it does rain, then, Tetlock and Gardner explain, that forecaster gets a perfect score on calibration.
However, this forecaster does not get a very good score on a second measure, which Tetlock and Gardner call resolution. That’s because the forecaster was “playing it safe,” taking a 50/50 non-committal middle-of-the-road stance every time. If the forecaster had predicted 100% chance of rain, and every time they predicted 100% chance it did rain, then they would get a perfect score on resolution too.
The combination of calibration and resolution scores is called the Brier score, developed by Glenn W. Brier in 1950. Predictions of 100% chances that are correct 100% of the time give you a better overall Brier score than mere predictions of 50% chances that turn out to be correct 50% of the time. However, a perusal of the definitions of the Brier score on the Internet turns up different versions. Unfortunately, Tetlock’s and Gardner’s book does not specify which version of the Brier score they are using.
So, we have to take into consideration the possibility that the results are an artifact of the scoring technique. Nevertheless, let’s proceed on Tetlock’s implicit assumption that the better a forecaster’s Brier score, the better a forecaster she is. And let’s take it for granted that Tetlock was able to distinguish foxes from hedgehogs.
If one type of forecaster is better than another, then there must be a systematic way to be a better forecaster. Tetlock’s challenge in this project (which was sponsored by the U.S. intelligence research agency IARPA, the Intelligence Advanced Research Projects Activity) was to see if he could find out how the better forecasters did it, whether better forecasting could be learned and whether people could be trained to do it – indeed, to see if some people were, or could learn to become “superforecasters.”
How do you measure forecasting accuracy?
The problem is that you have to have some way to determine if a forecaster’s predictions are correct. That’s what the Brier score is for. But you can calculate the Brier score only if you can determine whether the forecast came true or not. And you can do that only if the predictions are about “clear questions whose answers can later be shown to be indisputably true or false.”
Unfortunately, most forecasts aren’t about those kinds of clear questions. Tetlock and Gardner use journalist Tom Friedman as an example. Though he is one of the most sought-after commentators, most of his predictions aren’t about clear questions whose answers can later be shown to be indisputably true or false. In fact, Tetlock and Gardner categorize Friedman not as a superforecaster but as a “superquestioner.” That is, he raises important questions, and precise forecasts could be made about those questions. But he doesn’t make the kind of forecasts himself that could be scored.
A precise forecast must be about a specific event, occurring within a specific future period of time. Most forecasts – for example stock market predictions – don’t meet that test. They don’t specify an exact future time period, and/or the event they predict is not precisely specified. Also, they are usually not made with a probability, like weather forecasts are. Predictions like that can’t be scored by the Brier method; they cannot be verified as having occurred, and their assigned probabilities can’t be compared with the actual frequency of being right.
Tetlock and Gardner use the example of the predictions of Keynesians and Austerians during the recent financial crisis and economic recession and their aftermath. Keynesians predicted that if interest rates weren’t lowered and national budgets allowed to go into deficit to finance fiscal stimulus, then the recession would be deep and prolonged. Austerians predicted that if the recommendations of the Keynesians were followed, the result would be runaway inflation.
Keynesians such as Paul Krugman have argued that Austerians have been proven wrong but that they won’t admit it. Austerians have argued that they weren’t wrong because the Keynesian measures that were taken in some countries just haven’t brought runaway inflation yet, and still will, and in the countries that practiced austerity the results would have been worse otherwise.
It’s impossible to test the predictions of either forecast because neither predicted exactly what would happen within an exact time period. Neither prediction can be proved right or wrong. Besides that, neither one appended a probability to the prediction. One prediction with a probability wouldn’t be enough to calculate a Brier score in any event, but even if they had made many predictions without a probability a Brier score couldn’t be calculated – unless it was assumed that each prediction was made with 100% certainty, which they likely were not.
Superforecasters do emerge
Tetlock tested a large number of forecasters – ordinary people, for the most part, not celebrity experts -- who answered Tetlock’s advertisement and volunteered to participate. Each one of them made a large number of forecasts; one forecaster, Doug Lorch, who did eventually prove to be a superforecaster, made one thousand separate forecasts in his first year alone. (It then took some time to see which of his forecasts proved correct and to calculate a Brier score.)
Among these volunteer forecasters, a small number did measure as significantly and consistently better than the others and much better than a dart-throwing chimpanzee.
Two possible sources of skepticism arise. One is that when so many volunteer forecasters make their forecasts, some will get significantly higher scores than others because of outliers due to randomness. We can trust Tetlock to assure us that the statistics show that is not so, especially since he outlines convincingly the processes that superforecasters go through to make better forecasts.
A second possible source of skepticism is that the superforecasters were gaming the Brier scoring system. For example if they had a choice of which questions to answer (though perhaps they did not), they might choose predicting whether the stock market will go up or down a day in advance for good calibration, or predicting the result of an election a week in advance for good resolution, depending on exactly how the Brier score is calculated. This is possible, but in the context it seems more likely that they really were better forecasters, and either had a knack for it or had learned to do it. Again, the reason this seems more likely is that Tetlock and Gardner take apart the process of producing an accurate forecast (or rather, accurate within a probability). They show how it is done, and it is credible.
The process, as they describe it, is summarized this way: “[S]uperforecasters often tackle questions in a roughly similar way – one that any of us can follow: Unpack the question into components. Distinguish as sharply as you can between the known and unknown and leave no assumptions unscrutinized. Adopt the outside view and put the problem into a comparative perspective that downplays its uniqueness and treats it as a special case of a wider class of phenomena. Then adopt the inside view that plays up the uniqueness of the problem. Also explore the similarities and differences between your views and those of others – and pay special attention to prediction markets and other methods of extracting wisdom from crowds. Synthesize all these different views into a single vision as acute as that of a dragonfly. Finally, express your judgment as precisely as you can, using a finely grained scale of probability.”
To explain what “outside view” means, the authors use an example of the question of whether a particular American family named the Renzettis, described in a particular way owns a pet. With no further information, the forecaster might have no clue. But the forecaster begins by getting the outside view, looking up on Google what percentage of American households own pets. The answer turns out to be 62%. The forecaster can narrow it down from there by adding other facts about the family. This seems like the obvious way to proceed after you think about it, but many forecasters might not do even that, and of course a dart-throwing chimpanzee wouldn’t do it.
But is this the kind of forecasting that we want?
Tetlock and Gardner – acting like the foxes that they presumably are – then exhibit their uncertainty about the whole enterprise by essentially asking whether this is even the kind of forecasting that we want or that is useful. Making predictions about clear questions whose answers can later be shown to be indisputably true or false may not be either what we want or what is truly useful.
This is true, of course, if forecasting is merely entertainment. Pundits on television stations, such as CNBC, make predictions constantly usually with bravado and a high level of confidence, and most of them turn out to be wrong, or only a random number of them turn out to be right. These forecasts might get a Brier score as low as zero if they were specified enough to calculate a Brier score. But they are for entertainment and perhaps to stimulate thought and speculation, not for accuracy.
But even if they are not only for entertainment, perhaps well-specified predictions are not very useful. Tetlock and Gardner consider this possibility. Here, in fact, is where they truly excel. They posit that the truly important questions are those for which predictions about them are just too vague to judge. “That’s typical,” they say, “of the answers to big, important questions like ‘How does this all turn out?’”
This leads to the following dilemma: “What matters is the big question, but the big question can’t be scored. The little question doesn’t matter but it can be scored… You could say we were so hell-bent on looking scientific that we counted what doesn’t count.”
This, they say, is not entirely fair since the questions were screened to be relevant to problems on the desks of intelligence analysts. But, they say, “it is fair to say these questions are more narrowly focused than the big questions we would all love to answer, like ‘How does this all turn out?’”
I am grateful to Tetlock and Gardner also for a new word that I am ashamed I didn’t know: bafflegab. Merriam-Webster defines it as “gobbledygook” and uses it in this delightful sentence: “I kept asking the telemarketer what the final cost of the ‘special offer’ was, and all I got was more bafflegab about deferred payments, option to cancel at any point, etc.” But it is Tetlock and Gardner’s usage that I particularly like:
“It’s easy to impress people by stroking your chin and declaring ‘There is a 73% probability Apple’s stock will finish the year 24% above where it started.’ Toss in a few technical terms most people don’t understand – ‘stochastic’ this, ‘regression’ that – and you can use people’s justified respect for math and science to get them nodding along. This is granularity as bafflegab.”
But don’t we need leaders, and don’t leaders need to be confident in their predictions?
This is another dilemma that Tetlock – admirably – admits to voluntarily. The qualities that make hedgehogs poor forecasters also seem to make good leaders. The authors say, “Look at the style of thinking that produces superforecasting and consider how it squares with what leaders must deliver. How can leaders be confident, and inspire confidence, if they see nothing as certain? This looks like a serious dilemma. Leaders must be forecasters and leaders but it seems that what is required to succeed at one role may undermine the other.”
The authors offer a solution that seems a bit of a patch job. The leader must carefully consider the possible forecasts like a fox before making a decision, but then after making the decision proceed with certainty like a hedgehog. This may be, however, as close as anything to the best advice that could be offered.
Tetlock’s work continues, and it should be welcomed. His plan of future research is interesting: In future research, he wants to see how effectively we can answer unscorable “big questions” with clusters of little, scorable ones.
The saying “Prediction is very difficult, especially about the future” is often attributed to Nobel Prize-winning quantum theorist Niels Bohr, but it has also been attributed to Yogi Berra.[1] Perhaps Tetlock may find a modicum of success in his venture. If so, predictions about the future may not be as difficult as either of them thought.
Michael Edesess, a mathematician and economist, is a senior research fellow with the Centre for Systems Informatics Engineering at City University of Hong Kong, chief investment strategist of Compendium Finance, and a research associate at EDHEC-Risk Institute. In 2007, he authored a book about the investment services industry titled The Big Investment Lie, published by Berrett-Koehler. His new book, The Three Simple Rules of Investing, co-authored with Kwok L. Tsui, Carol Fabbri and George Peacock, was published by Berrett-Koehler in June 2014.
[1] For the last word on the origin of this saying, see here.
Read more articles by Michael Edesess