Morningstar Ratings Fail over a Full Market Cycle

Portfolio values have been whipsawed by the 2008 bear market and now the 2009 bull market.  While most investors suffered substantial declines, the recent topsy-turvy market environment has been an excellent laboratory in which to test the skills of active managers.

And when managers are tested, so are the systems used to predict their performance.  Perhaps no system is as widely used as Morningstar’s “star” rating system.

We originally reviewed the predictive ability of Morningstar’s ratings in October of 2007, and here we update the results of that study, using data for the three years ending September 30, 2009.

Our analysis found that Morningstar’s ratings lost virtually all of their predictive ability when measured over a full market cycle. 

When we measured the effectiveness of the ratings in 2007 (see here), we found that they had consistent predictive ability.  At that time, we noted in our study that the testing period (the three years 2004-2006) consisted almost entirely of bull market conditions and that a comprehensive test of the ratings methodology needed to incorporate a full market cycle, including a down market.

If one could rely on continued bull markets, an index fund would do just fine.  Active managers would not be necessary, and nor would be the predictive systems, such as Morningstar ratings, used to assess them.

It is precisely because markets are unpredictable that active managers are valuable, and when predictive systems meant to assess those managers fail in volatile conditions, one must ask whether those systems serve any useful purpose.

In our view, they don’t.

Methodology

Our measure of predictive ability is the probability that a randomly selected higher-rated fund will outperform a randomly selected lower-rated fund.  For example, we looked at the chances of a randomly selected 5-star fund outperforming a 4-star fund.

We believe this metric is the most meaningful way to assess the usefulness of the rating system from the perspective of a financial advisor.  Advisors need to know the incremental improvement they obtain by, for example, trading up from a 4-star to a 5-star fund.

Most studies, including Morningstar’s own study, look at the average performance of funds within a star category.  For example, Morningstar compared the average performance of 5-star funds and 4-star funds.  This methodology has limited value, because the only way advisors can obtain this performance differential is by purchasing all the funds in a given rating category, which is obviously impractical.

As in our 2007 study, we calculated the probabilities across five Morningstar fund categories: US Equity, International Equity, Balanced, Taxable Bond, and Muni Bond.

We chose three years as our measurement period because that was the time period used by Morey and Gottesman (M-G) in their study (see here).  The M-G study is the most prominent research conducted on Morningstar’s ratings since Morningstar revised their ratings methodology in 2002.

We compared the difference in probabilities between the current study and the 2007 study.  This showed the net gain or loss in predictive ability from the up-market environment of the original study to the volatile environment of the current study.

We calculated these probabilities using the raw annualized returns and Morningstar’s risk-adjusted annualized returns.

Only funds that survived over the three year period were considered.  Index funds, ETFs, and other funds not rated by Morningstar were excluded from this analysis.  Each share class of a fund was treated separately (since different share classes can be rated differently).

Results and implications

Our results are presented in tables 1-6.  Tables 1 and 2 show the differences in probabilities for the raw returns and the risk-adjusted returns, respectively.  A negative number, shown in red, indicates a decrease in predictive ability.

Tables 3 and 4 show the probabilities from the current study, based on raw and risk-adjusted returns, respectively.  Tables 5 and 6 show the probabilities based on 2004-2006 data, and are reproduced from the original study.

A key criterion we used to assess the effectiveness of the ratings is whether the results are monotonic.  Results are monotonic if 5-star funds outperform 4-star funds, 4-star funds outperform 3-star funds, etc.

Our second criterion is the average probability of improving performance by moving to a fund with a rating that is one star higher.  We averaged the probabilities of 5- v. 4-star, 4- v. 3-star, 3- v. 2-star and 2- v. 1-star fund comparisons.  A typical decision for an advisor is whether to move to a fund with a rating one star higher, and this criterion measures the average effectiveness of that decision.