Nobelist William F. Sharpe, speaking at a CFA Institute Annual Conference last year, said, “When I hear smart beta, it makes me sick.” And yet its popularity has swept not only the ETF universe but academia too. According to a paper by Duke University professor Campbell Harvey, hundreds of academic papers have been published about the “factors” that underlie smart beta strategies. Wharton Research Data Services Research Director Denys Glushkov identified 164 U.S. domestic equity smart beta ETFs during the 2013-2014 period.

What does smart beta mean? Does it deserve the attention it is getting from the market and academia?

The origin

In 1992 Eugene Fama and Kenneth French published an article in The Journal of Finance titled “The Cross-Section of Expected Stock Returns.” In the article, Fama and French ran regressions of stock portfolios against three variables, the market’s return as a whole, returns on small stocks and returns on value stocks. In these regressions, they found substantial dependencies on two of the factors in the regression, small stock returns and value stock returns – more than on the market as a whole.

After the article came out, the late, revered Fischer Black, who would have won a Nobel Prize for the Black-Scholes formula had he not died in 1995, attacked it viciously, calling it the product of “data mining” – a pejorative term that can be applied to most of the evidence that has been advanced for the superiority of investment strategies that have been shown to exhibit outperformance based on historical data.

It should be noted that running a multiple regression is an exceedingly easy thing to do. It can take a researcher literally less than ten seconds of her time to run a multiple regression on numerous variables in an Excel spreadsheet if the values of those variables have already been entered in the columns. This ease invites overuse of the procedure; indeed it is massively overused.

By running regressions or backtests multiple times on the enormous amount of historical stock market data that exists, a researcher cannot fail to validate some investment strategy or statistical relationship, in order to sell an investment product or write a paper. But the apparent validity of the investment strategy or statistical relationship is often spurious, an accidental concurrence of random data discovered through intensive data mining. The common phrase often used to refer to this phenomenon is, “If you torture the data hard enough, it will confess to anything.”

One remedy that has been proposed for promiscuous data mining is to have a reasonable theory firmly in mind before turning to the data to seek statistical evidence to support it. “Lack of theory,” said Black in his critique of the Fama-French paper, “is a tipoff: watch out for data mining!” Black went on to say of their paper, “I especially attribute their results to data mining when they attribute them to unexplained ‘priced factors,’ or give no reasons at all for the effects they find.” In fact, proposed reasons for the Fama-French findings had to be conjured up only after the publication of their 1992 paper and a follow-up paper in 1993. They admitted that they could not offer any sound economic reasons for them in the papers themselves.

Black’s criticism notwithstanding, the two Fama-French papers went on to be routinely referred to as “seminal” by a generation of financial academicians and quantitative practitioners whose level of data mining for “factors” would make Black spin in his grave. What Black would consider data mining these researchers now consider “evidence-based financial economics.” Thus, we have now not only the two factors (three with the market as a whole) that were identified by Fama and French – and evidence for one of those two, small capitalization, has been acknowledged to have faded over time – but what has been called a “factor zoo.”

Some seminal works spawn productive fields; others unfortunately do the opposite.