Mean Variance Optimization

joe_the_user · on Aug 8, 2012

Interesting stuff...

A further point to consider if we're talking about real world applications is that it is not actually established that markets have finite variance - seriously.

In the 1960's, Benoit Mandelbrot began his research into chaos and fractal by looking at markets and finding that non-Gaussian, Levi-Stable distributions modeled changes in the market best[1]. And these L-stable distributions don't generally have a finite variance and sometimes don't have a finite mean [2].

And it is fairly easy to see how a market tends to not be Gaussian; change based on a Gaussian distribution tends to be random walk a la Brownian motion, where the final position of a variable is the sum of many small changes in the variable. Non-Gaussian, infinite-variance-distribution-based movement on the other hand has the property that the final result of a variable tends to be more the result of a finite number of large changes rather than a lot of small changes. And this is what the stock market often looks like. A few wild moves often impact things as much as the incremental changes. The apparent mean, variance and distribution of stocks on a day-to-day basis may not pan out in extreme situations and these can eat away the rest of your profits. If the stocks that seemed independently in normal conditions all go down in crash, your estimated-correlation-based-diversification hasn't protected you very well.

The Black Swan is a sadly too-simplified popular summary of these points.[3] It does point to the general idea. The higher-level take-away is that infinite variance distributions exist and indeed, you can not apriori assume a given distribution you are working with isn't one.

[1] http://books.google.com/books?id=6KGSYANlwHAC&lpg=PP1... [2] http://en.wikipedia.org/wiki/L%C3%A9vy_distribution [3] http://en.wikipedia.org/wiki/The_Black_Swan_(Taleb_book)

photon137 · on Aug 8, 2012

Actually, financial markets have employed non-Gaussian Levy processes in modelling derivatives for a long time (it is a bit different from the Levy distribution, I agree - but nothing stops the processes from having non-finite moments).

For example, a very widely used process for modelling information-driven timeseries (like stock returns) is the jump-diffusion model where the diffusion component is a Brownian motion while the jump component is a Poisson process.

The underlying volatility is often modelled using a different process - e.g. the SABR model, the Heston model etc.

There are similar cases for interest-rate processes (Hull-White/BDT etc) which have to satisfy conditions of mean-reversion and no-arbitrage across the yield curve.

See, it's not as if we don't realize that the underlying processes are mathematically inadequate to fully explain all market movements. But for a model to be useful, it has to satisfy two conditions:

1. Be able to produce a non-arbitrageable "mid"-price for making a market (ie if someone asks a trader to quote the bid-ask for an option, e.g.)

2. Be able to reproduce the current market prices of an asset/its derivatives. This requires model calibration.

Models are chosen based on how easily and how fast they can satisfy 1 &2.

Do they produce risk numbers that are believable? Probably yes, if the model has been calibrated and has been tested against out-of-sample inputs.

Do they guard adequately against event-risk (the thing you would try to signify using your "infinite" variance distributions)? Probably not - but then again, nothing does. How would you go ahead and calibrate the Levy distribution so that the sampling process can explain currently tradeable market prices? Would be it a "do once, leave forever" calibration? Or would it change from day-to-day (ie "local" calibration)?

joe_the_user · on Aug 9, 2012

Well more interesting stuff. My Googling yielding woefully incomplete references for your keywords. So "a long time" means what here? Some pointers would be useful here. I'm interested, help me out here.

I know there was a book giving long reworking of Black-Scholes for Levi-Stable distributions in the early 2000's.

Of course, while you described the sophisticated approach, from Black-Scholes to the Gaussian "coupela" to the OP, the unsophisticated approach has a lot of traction still.

If we're ranging around all our interests in the market, I'd offer what I might hope would be the Minsky comment; "the fault, dear Brutus, lies not with our models but in our selves". Gaussian models reappear because they have "predictiveness". The problematic sides of the more sophisticated models are tolerated for the same reason. Greed tempts us always to irrationality jump from what Keynes called uncertainty to mere probability.

Btw, what do you think of Doug Noland of prudentbear.com?

crntaylor · on Aug 9, 2012

I'm an analyst for a quantitative hedge fund. Please, please everyone promise me to never base your investment decisions on this discredited form of mean-variance optimization.

This method of stock selecting was invented by Harry Markowitz in 1952. In the intervening sixty (!) years we have accumulated overwhelming evidence that plain vanilla mean variance optimization doesn't work. Among its many flaws:

1. It makes unrealistic assumptions about the distribution of returns (i.e. that they are multivariate normal, when it is well known that returns exhibit heavy tails, time-varying volatility, fluctuating correlations etc etc).

2. It relies on you having good estimates of the expected annual return of individual stocks. How do you propose to get these? Don't say you'll use historical measurements, unless you really believe that last year's return is a good predictor of this year's return (it's not, except perhaps in some sectors, and even then it's difficult to measure and you'd be subject to crash risk).

3. The optimization procedure is error-maximizing. That is, even if returns were multivariate normal and you had a reliable way to measure the expected return on stocks, you'd still have errors in your covariance matrix, and these errors are amplified by the optimization procedure. You can see this in the article, then the "optimum" portfolio recommends putting 75% of your portfolio in MSFT and shorting AMZN and AAPL. Does anyone really believe that's sensible? Does anyone believe that such a portfolio is diversified?

The problem is that your model of stock returns is subject to massive overfitting. Let's say you have data for the last 10 years (i.e. about 2500 days). If there are N stocks in your portfolio, you need N(N+1)/2 pieces of information to specify the covariance matrix, which puts an upper limit of 70 stocks in your universe (since 70 * 71 / 2 ~ 2500). A good rule of thumb is that you should have 10 observations per free parameter, which cuts that number down to 22 stocks (22 * 23 / 2 ~ 500). I think that most portfolios consisting of 22 single-name stocks aren't sufficiently diversified (and you'll still be subject to the first two problems above).

In 2012, no one should be using mean-variance optimization to select stocks. At the very least, shrink the covariance matrix toward some sensible prior (e.g. constant correlations, sector correlations, or a factor model) and backtest your strategy over the past 10-20 years and look at the annual volatility, size and length of drawdowns, skewness and information ratio.

nxtrader · on Aug 9, 2012

Very interesting post -

Other than using a shrinkage estimator for the covariance matrix - what techniques would you suggest make sense for doing portfolio optimization?

tbenst · on Aug 8, 2012

For your sake, do not base any investment decisions off of this model. Historical correlation != future correlation. You are MUCH better off using the Fama-French three factor model as a starting point.

An example of historical correlation severely understating risk was the 2008 financial crisis. Default rates of mortgages in, say, Florida that historically had little correlation with default rates of mortgages in Nevada suddenly became very correlated. Measuring risk in this fashion is not robust enough for investment decisions

http://en.wikipedia.org/wiki/Fama%E2%80%93French_three-facto...

photon137 · on Aug 8, 2012

Firstly, nice effort.

Secondly, some features you could add:

1. Constrained optimization - including budget constraints, sector selection constraints etc. A tough one would be cardinal constraints e.g. I am limited to 4 stocks etc.

2. Return attribution - whether the returns your portfolio earned were due to stock selection or asset allocation or both (Brinson Model: http://www.mscibarra.com/research/articles/2002/PerfBrinson....).

3. Performance and compression - how would this deal with huge covariance matrices? 10000 x 10000? Matrix operarions on these wouldn't be trivial. In-memory serialization/deserialization issues also come to mind. (edit: then again, Excel can't do 10k x 10k :) )

4. I'm not conversant with SciPy - does this use BFGS/similar for optimization?

5. Compute as a service? Host a grid? Let calculation requests come to you via Excel? (Nobody would want a 10000 asset timeseries to be processed on their CPU for two hours).

wtvanhest · on Aug 8, 2012

I second everything photon has said and would add that anyone using this model should acknowledge the risk they are taking by using past stock correlations to predict future optimal portfolios. While this approach is central to finance, recently (past 18 months) we have seen bizarrely correlated asset price changes due to macro issues (especially in Europe).

These macro issues may change the way stocks correlate in future markets. I would back test any portfolio with data from at least 18 months ago to make sure it still looks pretty good.

photon137 · on Aug 8, 2012

>we have seen bizarrely correlated asset price changes due to macro issues

Actually, this is quite expected given event-risk in the market. For example, even in a non-crisis situation, correlation increases in intra-day trading when market-moving news is due - e.g. payroll numbers are due or ISM numbers are released.

Coming back to the present, headline-risk hasn't been as high as it has been since May, 2010 as we keep having one EU summit after another, one central bank announcement after another and each having a huge impact on how asset prices behave.

So the correlation is not entirely bizarre - people have been making correlation trades as well - for example JP Morgan's huge loss while betting on credit index tranches was a bet on implied vs. realized correlation (it didn't play out as expected, of course).

I'd be interested to know how people are playing this out in the equity space - via variance swaps?

wiredfool · on Aug 8, 2012

I'd strongly consider not using 25% of the data for the correlations. Then backcheck the generated model based on that last 25%. I'd be really tempted to make sure that the 25% included things like the Global Financial Cluster..ck, just to see how well things go when something unexpected happens.

Either that or run separate correlations for each $time_period and see how the results change over time. If they're stable, then you've got better confidence than if they're wildly changing at each interval.

Also, realize that people with way more data and processor power have been doing this for a long time, with very mixed results. /me remembers my uncle slaving over the Apple ][e looking at charts and numbers.

bravura · on Aug 8, 2012

I'm interested in talking about macro variable prediction.

[edit: A colleague of mine has a model that predicts macro variables, and wants beta users.]

Could you email me? (Email in profile.)

karamazov · on Aug 8, 2012

Thanks for the suggestions! I might implement a couple in a future post.

Numpy can handle a 10k by 10k matrix - in this case, you wouldn't want to print it to the spreadsheet, but you could still run the optimization. I've tested to 100 tickers and there isn't a noticeable increase in computation time. (I'm sure there would be at 10,000, but it's probably doable; the constraint here might be fetching that much data.)

The optimization is done with a Nelder-Mead simplex algorithm [1], but SciPy has a lot of different functionality built in. There's a different function, fmin_bgfs[2], that does use BFGS optimization.

Computations as a service is something we're definitely looking into; if there's enough demand we'll do it.

[1] http://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method [2] http://docs.scipy.org/doc/scipy/reference/generated/scipy.op...

wiredfool · on Aug 8, 2012

It looks like the optimal weights for AMZN and AAPL are negative. That doesn't seem possible unless you're shorting them, but that's a quite different risk profile than going long.

wtvanhest · on Aug 8, 2012

That is just another point of optimization he needs to add.

wiredfool · on Aug 8, 2012

It doesn't look like there are any limits on the weighting, even on the high end.

Furthermore, he's using closing price, not adjusted closing price. So splits and dividends aren't included. (which is most of the return in MSFT for the last 8 years or so).

karamazov · on Aug 8, 2012

It's easy to switch to adjusted closing price - there's a 'which_price' dictionary that has adjusted price as an option. The weight limits do go a bit crazy if you can get close to 0 variance (but that's hard to do in real life).

billswift · on Aug 8, 2012

Decent article, but I wanted to add a little extra warning to this:

>One way to do this is to look at past returns and come up with the historical correlation.

Be very wary of historical correlations, at any level. I am old enough (I was in my early teens) that I can remember the screaming of economists during the 1970s stagflation - it was known that you could not have high inflation and high unemployment at the same time. Until we did.

robbiep · on Aug 9, 2012

I may be totally off the mark here (it has been 5 years since I last studied portfolio management) but if you have 2 stocks that are totally negatively correlated, with the same expected return, won't you have a return of 0%?

calgaryeng · on Aug 8, 2012

There are already a number of libraries in R that do this (including reverse portfolio optimization and Black-Litterman optimization)!

hogu · on Aug 9, 2012

just a small nitpick, you should be able to calculate portfolio variance using simple matrix operations instead of writing a double for loop.

something like.

temp = a* std_dev

var = np.dot(np.dot(temp.T, cor), temp)