Hedge Fund Replication: A Re-examination of Two Key Studies

By Andrew Beer

The revelation that a key paper by Rogoff and Reinhart included errors in both coding and data highlights the need for investors and practitioners to periodically re-evaluate the assumptions and conclusions in frequently cited studies.  In the factor-based hedge fund replication space, recently published white papers have cited two studies to support or question the underlying concept.  First released in mid-2006, Jasmina Hasanhodzic and Andrew Lo’s seminal paper, “Can Hedge Fund Returns be Replicated?:  The Linear Case” (hereafter, “Lo”) essentially laid the groundwork for the industry by concluding that a linear, factor-based model could successfully replicate much of the returns of various hedge fund strategies.  On the other side of the debate, EDHEC’s Noel Amenc and colleagues published three papers over 2008 and 2009 which argues that factor-based replication was “systematically inferior” to investing directly in hedge funds.

With the added benefit of several years of live history, it is now clear that Lo actually understated the effectiveness of the strategy by failing to account for how survivorship in the hedge fund data would affect relative pro forma returns.  Likewise, the more recent (2009) paper by Amenc et al., “Performance of Passive Hedge Fund Replication Strategies” (hereafter, “Amenc”) failed to include actual results from replication indices launched in 2007-08 that demonstrated conclusively that replication models had matched or outperformed actual hedge fund portfolios through the crisis.  Furthermore, the results were undermined by inconsistent factor specifications which adversely affected the results.  The following note expands on these two points.

Hasanhodzic and Lo, “Can Hedge Fund Returns be Replicated?:  The Linear Case” (2007)

This important paper, first released in 2006, introduced the concept of using a 24 month rolling-window linear regression to replicate hedge fund returns.  In many ways, this seminal paper launched the factor-based hedge fund replication business.  Interestingly, though, the authors appear to have overlooked the most important conclusion:

  • Using a simple five factor model, the replication of an equally weighted portfolio of 1,610 funds appears to deliver all or virtually all of the returns over almost 20 years, adjusted for survivorship bias.

In other words, the simple clone’s performance exceeded all expectations during the “high alpha” period of 1986-2005.  Remarkably, this pro forma performance of the clone was approximately equal to the performance of the S&P 500 over the same period, but with materially lower volatility and drawdowns. This is a startling result that is lost in the paper’s forty pages of formulas, text and tables. Here’s why:

The data set used was based entirely on “live” funds in the TASS database as of September 2005 – 1,610 funds.  Invariably, “live” funds have outperformed “dead” peers by a wide margin:  in the HFR database, for instance, by more than 400 bps per annum. Inexplicably, the authors assert that “any survivorship bias should impact both funds and clones identically,” and therefore can be ignored. This simply is incorrect. We know today that this kind of data bias, by definition, is “non-replicable.”  Therefore, the clone should be compared to a realistic measure of performance – i.e. adjusted for survivorship bias. This is why replicators are often benchmarked against indices the like HFRI Fund of Funds index that are more representative of actual investor returns.

From Figure 5 in the paper, we can infer that the equally weighted portfolio of sample funds returned between 13% and 14% on a compound annual basis over almost twenty years. This clearly is unrealistically high: hedge funds as a group simply did not outperform the S&P by 200-300 bps per annum on a net basis during a twenty year bull market in which stocks returned 10% per annum. Assuming several hundred bps of survivorship bias, the hedge fund portfolio would have slightly underperformed the S&P 500, but with materially lower drawdowns and volatility.  And, in fact, this is precisely how the simple clone performed. See Figure 5 reproduced below with commentary added.

In this context, the performance of the linear clone (around 10% per annum) is remarkable and should have been highlighted more prominently.

A secondary issue is the use of a factor set that is missing important market exposures. The study employs only five market factors: the S&P 500 total return, the Lehman AA index, the spread between the Lehman BAA index and Lehman Treasury index, the GSCI total return, and the USD index total return. More recent studies, including our own, have demonstrated that emerging markets, short term Treasury notes and small capitalization equities are important factors since they enable the models to incorporate, respectively, volatility expectations, yield curve trades and market capitalization bias. Conversely, while the inclusion of the GSCI has intrinsic appeal, it does not appear to be additive over time to out of sample results. Consequently, the overall results arguably would have been even more compelling with a slightly more robust factor set.

Amenc et al., The Performance of Passive Hedge Fund Replication Strategies (2009)

In response to the paper by Hasandhozic and Lo and the launch of several factor-based indices, EDHEC released several papers that were highly critical of the concept during 2007-09.  In the first paper, “The Myths and Limits of Passive Hedge Fund Replication:  An Attractive Concept… Still a Work-in-Progress,” the authors seek to redo the rolling linear model employed by Hasandhozic and Lo, but apply it to the EDHEC hedge fund database. Since there is very little explanation of the underlying data, it is impossible to estimate the effect of survivorship bias or other sampling issues.

The more relevant paper was published in 2009, “The Performance of Passive Hedge Fund Replication Strategies.” It is difficult to read this paper without the sense that the authors, who are closely tied to the fund of hedge fund industry (and funded by Newedge), had a predetermined agenda.  The end result is a paper that includes some very helpful analysis – for instance, that Kalman filters and non-linear factors don’t improve out of sample results – but whose conclusions are undermined by selective omission.  For instance:

  • Even though there was over two years of live data from replication indices that showed strong results with high correlation through the crisis, the authors neglect to include this and focus instead on re-doing the Lo analysis with the admittedly incomplete five factor set.
  • When the authors do in fact acknowledge that Lo’s factor base should be expanded to include emerging markets, small cap stocks and other factors, they test each strategy with an unreasonably narrow subset of factors even though it was well established by this time that a more robust factor set was critical.  This is discussed in detail below.

In Section 3.2, the authors “test whether selecting specific sets of factors for each strategy leads to an improvement in the replication performance. Based on an economic analysis and in accordance with Fung and Hsieh (2007), who provide a comprehensive summary of factor based risk analyses over the past decade, we select potentially significant risk factors for each strategy.” The factors identified are quite reasonable, such as the spread between small and large capitalization stocks, emerging markets, and other fixed income spreads.

In the table below, the five factors on the left side represent the original Lo portfolio, while the five on the right represent the Fung & Hsieh additions.

The logical next step would be to test whether the results of the Lo five factor set is improved by the addition of one or more of the factors.  Instead, the authors only use 1-4 factors for each strategy and throw out most of the original factors. Remember that at this time it was well established that a narrow factor base was insufficient to replicate most hedge fund returns.  This is why Merrill, Goldman Sachs and others all used 6-8 factors, not 1-4. To use one example, in order to seek to replicate the macro space, the authors used only the Lehman AA Intermediate Bond index – a single factor – with a 24 month rolling window. For distressed, the one factor is the spread between a BAA index and Treasurys. For risk arbitrage, it’s only the S&P 500. For long/short equity and funds of funds, it’s the S&P 500 and the small cap-large cap spread.

To underscore the point, the debate at the time was not whether one or two factors could reasonably replicate sector returns, but whether a diversified portfolio of market factors could do so. By starkly reducing the factor set, the authors essentially designed an experiment that was bound to fail.  Consequently, investors should seriously question the validity of the authors’ conclusion that “the performance of the replicating strategies is systematically inferior to that of the actual hedge funds.”

Conclusion

Each paper covers a lot of ground and approach the topic from slightly different angles.  However, the important point is that understanding the underlying assumptions is critical in order to accurately interpret the results.  These assumptions – the survivorship bias issue in the Lo data and the reduction of factors in Amenc et al. – are lost in the paragraph-long abstract and buried in dozens of pages of analysis, formulas and tables.  Without fully appreciating the limitations imposed by various assumptions, as investors we risk misinterpreting the validity and implications of the conclusions.

Be Sociable, Share!

4 Comments

  1. Noël Amenc and Lionel Martellini
    May 13, 2013 at 9:06 am

    In the present article, a number of comments are made about a paper we published in the European Financial Management Journal (Amenc, N., L. Martellini, J.-C. Meyfredi and V. Ziemann, 2010, Passive hedge fund replication — Beyond the linear case, European Financial Management Journal, 16, 2, 191-210.) We feel that some of these comments deserve a response, as they might be misleading for the reader who is not overly familiar with the research.

    The contribution of our paper published in the European Financial Management Journal is to extend Hasanhodzic and Lo (2007) by assessing the out-of-sample performance of various non-linear and conditional hedge fund replication models. In a nutshell, our ambition was to improve the performance results in Hasanhodzic and Lo (2007) by considering more sophisticated models and we were quite surprised to find that going beyond the linear case does not necessarily enhance the replication power.

    The article “Hedge Fund Replication: A Re-examination Of Two Key Studies” claims that our paper “includes some very helpful analysis […] but whose conclusions are undermined by selective omission. For instance: Even though there was over two years of live data from replication indices that showed strong results with high correlation through the crisis, the authors neglect to include this and focus instead on re-doing the Lo analysis with the admittedly incomplete five factor set.” We indeed decided not to analyze live performance data from passive replication products, and this for two reasons. First of all, this was not the focus of our paper, which instead was (as mentioned above) to try and improve over linear replication models. Secondly, we did not feel that two years was a sufficiently long sample to allow for any meaningful statistical analysis. It is our belief that showing coherence in the scope of a research project paper, and avoiding meaningless statistical analysis, should be called something other than “selective omission.”

    A second criticism made in the article “Hedge Fund Replication: A Re-examination of Two Key Studies” is that we use a selection of useful factors for each strategy, as opposed to using a large identical set of factors for all strategies. This criticism is phrased as follows: “By starkly reducing the factor set, the authors essentially designed an experiment that was bound to fail.” Unfortunately, we have found that while including more factors improves the performance of replication models in-sample, it tends to hurt the out-of-sample performance of such models. In short, parsimony is a well-known necessary condition for out-of-sample robustness, and we feel it is misleading to claim that one only needs to add an increasingly large number of factors to generate satisfactory hedge fund replication performance.

    Finally, a comment is made that “It is difficult to read this paper without the sense that the authors, who are closely tied to the fund of hedge fund industry (and funded by Newedge), had a predetermined agenda.” We feel that this comment is out of place. EDHEC-Risk has always been known for publishing unbiased academic research and at no point in the research process did Newedge intervene to influence the results in any possible way. In the same way that we consider that professionals have the right to be taken seriously and to be criticized for what they write and not for what they are when they express scientific views, we believe that authors from the academic world deserve the same level of respect. If the only real argument for criticizing a research paper is to disparage the authors’ conduct with no evidence, then we do not think that this criticism is admissible. Just like we think it is logical to display the financial contributions to our research programs that the Institute receives from our sponsors (the authors are not beneficiaries), we also think it is logical for this concern for transparency and the sponsor’s desire to support transparent and independent research to be recognized and not denigrated.


  2. Andrew Beer
    August 15, 2013 at 12:05 pm

    Dear Professors Amenc and Martellini:

    My apologies for the delayed response. Thank you for taking the time to read and provide a critique of my note. I will respond in order:

    1. The reason I highlighted the live performance was that the extant replicators were largely successful at tracking industry returns in 2007-09. As a practitioner and researcher, this is valuable information. It was not a coincidence that the models had similar features — 24 month window length, 5-8 factors across major asset classes, etc. Those parameters were chosen because highly capable researchers at investment banks and asset management firms were conducting similar experiments and reached similar conclusions. Consequently, it seemed to me at the time that there was a lot of other research and live validation that a variant on the Hasandhozic/Lo experiment could work well. This seemed pertinent to me, but perhaps it wasn’t to you.

    2. We’ve found that more factors do in fact improve out of sample results, but up to a limit. Since I don’t have your research, I cannot comment on the specific results you refer to. I fully agree with the concept of parsimony, which we’ve embraced while other firms have added unnecessary factors for appearance of complexity. In addition — and this may simply be a semantic issue — I don’t see how going from 5 or 6 factors to one or two is an “extension” of the study. Instead of working to extend or modify the H/Lo factors, you seemed to scrap them entirely and start again with a much narrower pool — one or two factors for many sub strategies. At the time, I thought it was well established that one or two factor models didn’t work well. I still don’t understand this transition in your paper and it seems to me like you ended up with a very different study (perhaps “Can a two factor model explain hedge fund sub sector returns out of sample?”). As a practitioner, my interpretation of this was that the poor outcome was a foregone conclusion, and hence I questioned why it was included.

    3. I apologize for the insinuation. This was based on a rumor at the time of publication — I don’t remember the source, probably from one of the banks who of course had their own agenda. As a practical matter, though, I would argue that no research is unbiased. As a liberal arts major, I am trained to read into subtext, and my strong impression from reading your papers and the way they were structured was that you had a vested interest in arguing against what you describe as “passive” replication. If I misinterpreted this, then I apologize.

    If you would like to have a live debate on any of these issues, please feel free to contact me directly. And thank you again for taking the time to read and carefully respond to my submission.

    All the best,

    ADB


  3. Prof Jim Liew
    May 5, 2015 at 11:19 am

    That definitely would be fun to see a live debate! :) Did you guys do it?


Leave A Reply

← Agecroft Partners Believes Hedge Fund Branding Drives a Majority Of Asset Flows Paulson-versus-Krugman Revisited: Not a Keynesian clincher →