According to Andrew Ang, a guru of factor-based investing and former chair of the finance and economics division of Columbia Business School’s Data Science Institute, the “anomalies” literature is the scientific foundation for quantitative asset management. But this focus, which was not very scientific to begin with, is proving its utter ruin.

Anomalies are instances of investment strategies or subgroups of securities that have outperformed the market on a risk-adjusted basis over a long period of time. They are called anomalies because they stand in contradiction to efficient market theory, which would imply that such phenomena cannot occur. The protocol for testing for the existence of an anomaly has been established for some time, at least since two papers in 1992 and 1993 by Eugene Fama and Kenneth French. Those papers explored what appeared to be the value stock and small-stock anomalies. The protocol involves regression and hypothesis testing to determine if the anomaly is statistically significant.

I will first review the subject of hypothesis testing, then discuss three recent papers that show how badly off-track most anomalies research is.

R.A. Fisher’s work on the farm

Perhaps it was because Ronald Aylmer Fisher had been forced in his youth, for lack of funds after graduating from Cambridge University, to take a job working on a farm in Canada. Or perhaps it was simply because agriculture was much more of a vibrant developing industry in 1919 than it is now. Or perhaps it was because Fisher had gotten into a professional scrape with his mentor, the famous statistician Karl Pearson.

But whatever the reason, when in that year – 1919 – Pearson offered Fisher a plum university job, Fisher turned it down and took a job instead at the Rothamsted Agricultural Experiment Station, in the English county of Hertfordshire about 50 kilometers north of London.

This turned out to be a boon to statistics. At Rothamsted, Fisher pioneered two of the field’s most important breakthroughs, hypothesis testing and the design of experiments. These tools are now used in the vast majority of scientific studies of empirical data.

Fisher’s need was to test whether grain variety A, a new grain he was testing, would produce a greater yield than the old grain variety B. So he planted a number of plots of land with variety A and the same number with variety B – taking care that there were no other systematically different factors affecting the plots planted with A and those with B, such as sunshine and shading.

One of the grains, say A, would inevitably produce a higher average yield than the other, since it was unlikely that the average yields would be exactly the same. The challenge was to decide whether the difference was large enough to conclude that A produces higher yield than B.