Plagued by overfitting and collinearity, returnsbased style analysis frequently fails, confusing noise with portfolio risk.
Returnsbased style analysis (RBSA) is a common approach to investment risk analysis, performance attribution, and skill evaluation. Returnsbased techniques perform regressions of returns over one or more historical periods to compute portfolio betas (exposures to systematic risk factors) and alphas(residual returns unexplained by systematic risk factors). The simplicity of the returnsbased approach has made it popular, but it comes at a cost – RBSA fails for active portfolios. In addition, this approach is plagued by the statistical problems of overfitting and collinearity, frequently confusing noise with systematic portfolio risk.
ReturnsBased Style Analysis – Failures for Active Portfolios
In an earlier article we illustrated the flaws of returnsbased style analysis when factor exposures vary, as is common for active funds:
 Returnsbased analysis typically yields flawed estimates of portfolio risk.
 Returnsbased analysis may not even accurately estimate average portfolio risk.
 Errors will be most pronounced for the most active funds:
 Skilled funds may be deemed unskilled.
 Unskilled funds may be deemed skilled.
These are not the only flaws. We now turn to the subtler and equally critical issues – failures in the underlying regression analysis itself. We use a recent Morningstar article as an example.
iShares Core High Dividend ETF (HDV) – ReturnsBased Style Analysis
A recent Seeking Alpha article provides an excellent illustration of problems created byoverfitting and collinearity. In this article, Morningstar performed returnsbased style analysis ofiShares Core High Dividend ETF (HDV).
Morningstar estimated the following factor exposures for HDV using the Carhart model:
iShares Core High Dividend ETF (HDV) – Estimated Factor Exposures Using the Carhart Model – Source: Morningstar
The MktRF coefficient, or loading, is HDV’s estimated market beta. A beta value of 0.67 means that given a +1% change in the market HDV is expected to move by +0.67%, everything else held constant.
The article then performs RBSA using an enhanced Carhart + Quality Minus Junk (QMJ) model:
iShares Core High Dividend ETF (HDV) – Estimated Factor Exposures Using the Carhart + Quality Minus Junk (QMJ) Model – Source: Morningstar
With the addition of the QMJ factor, the market beta estimate increased by a third from 0.67 to 0.90. Both estimates cannot be right. Perhaps the simplicity of the Carhart model is to blame and the more complex 5factor RBSA is more accurate?
iShares Core High Dividend ETF (HDV) – Historical Factor Exposures
Instead of Morningstar’s RBSA approach, we analyzed HDV’s historical holdings using theAlphaBetaWorks’ U.S. Equity Risk Model. For each month, we estimated the U.S. Market exposures (betas) of individual positions and aggregated these into monthly estimates of portfolio beta:
iShares Core High Dividend ETF (HDV) – Historical Market Exposure (Beta)
Over the past 4 years, HDV’s beta varied in a narrow range between 0.50 and 0.62.
Both of the above returnsbased analyses were off, but the simpler Carhart model did best. It turns out the simpler and a less sophisticated returnsbased model is less vulnerable to the statistical problems of multicollinearity and overfitting. The only way to find out that returnsbased style analysis failed was to perform holdingsbased analysis using a multifactor risk model.
Statistical Problems with ReturnsBased Analysis
Multicollinearity
Collinearity (Multicollinearity) occurs when risk factors used in returnsbased analysis are highly correlated with each other. For instance, smallcap stocks tend to have higher beta than largecap stocks, so the performance of smallcap stocks relative to largecap stocks is correlated to the market.
Erratic changes in the factor exposures for various time periods, or when new risk factors are added, are signs of collinearity. These erratic changes make it difficult to pin down factor exposures and are signs of deeper problems:
A principal danger of such data redundancy is that of overfitting in regression analysis models.
Overfitting
Overfitting is a consequence of redundant data or model overcomplexity. These are common for returnsbased analyses which usually attempt to explain a limited number of return observations with a larger number of correlated variable observations.
An overfitted returnsbased model may appear to describe data very well. But the fit is misleading – the exposures may be describing noise and will change dramatically under minor changes to data or factors. A high R squared from returnsbased models may be a sign of trouble, rather than a reassurance.
As we have seen with HDV, exposures estimated by RBSA may bear little relationship to the portfolio risk. Therefore, all dependent risk and skill data will be flawed.
Conclusions

When a manager does not vary exposures to the market, sector, and macroeconomic factors, returnsbased style analysis (RBSA) using a parsimonious model can be effective.

When a manager varies bets, RBSA typically yields flawed estimates of portfolio risk.

Even when exposures do not vary, returnsbased style analysis is vulnerable to multicollinearity and overfitting:

The model may capture noise, rather than the underlying factor exposures.

Factor exposures may vary erratically among estimates.

Estimates of portfolio risk will be flawed.

Skilled funds may be deemed unskilled.

Unskilled funds may be deemed skilled.

Holdingsbased analysis using a robust multifactor risk model is superior for quantifying fund risk and performance.