# Is Correlation Useful?

When people purchase financial assets, they only really care about the return, how consistent this return will be, and how this return relates to the return of other assets in the portfolio. Unfortunately, these aren’t known before we make an investment. We have to estimate them using historical data. People typically use the following to do this:

**Expected Value:**how much return they are going to get**Variance:**how much those returns are going to fluctuate**Correlation:**how much those returns are the same as others in the portfolio

Today we focus on correlation:

## FTSE 100

Let’s pick four of the largest companies in the FTSE 100 and look at the correlation between the daily price returns:

AZN.L | SHEL.L | HSBA.L | ULVR.L | |

AZN.L | 1.00 | 0.30 | 0.30 | 0.39 |

SHEL.L | 1.00 | 0.49 | 0.34 | |

HSBA.L | 1.00 | 0.32 | ||

ULVR.L | 1.00 |

*Daily price return correlation Jan 2007-Mar 2023. Source: Yahoo Finance.*

As you can see, there’s not much going on here. This is because daily returns are noisy – the consequence of seemingly-random trading and price discovery. But even when we zoom out in an attempt to remove some of this short-term noise, whilst looking at monthly price returns, relationships still don’t emerge. Correlations are still low.

This is important if one cares deeply about one’s monthly return, and the stability of those returns. Because, according to correlation logic, if one constructs a portfolio using assets that are uncorrelated or, better yet, *inversely* correlated, one can expect a stable return each month because in some months asset X will fly and in others it will be asset Y leading the charge. Simple maths: if both these assets are uncorrelated and each has a 50% chance of returning >5% each month, one has a 1-(0.5*0.5)=0.75=75% chance of having at least one asset return >5%. Contrast this with the only 50% chance you have if these assets are perfectly correlated, or the 100% chance you have if they’re perfectly *inversely* correlated. Remember what investors care about – although the *expected value *is the same, the *variance* of the portfolio has been reduced by using uncorrelated assets.

Personally I don’t really give a shit if returns for each individual month are good or bad. I care about the value of the portfolio over the long term. To examine this, it’s better to look at the relationship between the price of assets, rather than their return. We repeat the above exercise, now looking at (monthly) *price*:

AZN.L | SHEL.L | HSBA.L | ULVR.L | |

AZN.L | 1.00 | -0.02 | -0.52 | 0.80 |

SHEL.L | 1.00 | 0.54 | 0.11 | |

HSBA.L | 1.00 | -0.37 | ||

ULVR.L | 1.00 |

*Monthly price correlation Jan 2007-Mar 2023. Source: Yahoo Finance.*

There seems to be something more substantial going on here. By extending the timeline and focusing on price we might get a clearer picture of the true nature of the relationship between these stocks.

## Why did we just do that?

Or do we?

Was any of the above, and analyses like it, actually useful?

To answer that we have to examine how valid correlation is as a measure of relatedness between two variables.

**The correlation is a changin’**

To start, let’s revisit our correlations from above, except now we look at *rolling* correlation: correlation over a moving window of observations. We focus on the relationship between AZN.L and ULVR.L as that seems to be the most promising.

90-day daily price correlation is pure noise, yo-yo-ing between 0.5 and -0.5. Even 900 days doesn’t tell us much and changes significantly over time. Of particular note is the apparent erosion of relatedness between the two stocks towards the end of the period.

The pattern for monthly price correlation for both one-year and 10-year periods is similar: the two stocks become uncorrelated. Remember, this is taking **10 years** of monthly prices! You would have thought this a large enough sample size to get a reasonable estimate of the correlation between two stocks – apparently not. The correlation changes.

**Random numbers**

Try this one: randomly generate a list of 10 numbers between, say -0.2 and 0.2 (kind of like what we would expect annual returns to be). Then generate another list. Now calculate the correlation between them. Now do it again. And again. Keep doing it until you get bored.

You might find some strong relationships between the two lists. After doing this about 10 times (I get bored easily) I was getting correlation figures as high as 0.8. A similar thing is going on in this famous chart:

Basically: correlation is completely meaningless without statistical significance. In small samples, you can easily generate a high correlation purely by chance. So when you have enough variables over a short-enough time period, you’re going to get some high correlations between some of those variables.

**Off and on again**

Look. What do you notice? For half of the observed periods, the two assets move in exact unison. In this period, their correlation is ~1. And yet there is no (real) correlation for the period in its entirety (0.29).

“But Haydn, this isn’t fair, you’re looking at returns! Look at prices.”

Fine:

You’ll notice here that again prices move in unison for half the observations – returns for this period are identical. And yet good old correlation tells us these assets are unrelated (the correlation is again 0.29).

Correlation analysis fails when the relationship between variables changes over time. It won’t really detect when they move independently to each other, then suddenly become correlated for a period of time, then uncorrelated again. You know, sort of how assets seem to behave.

**Causation without correlation**

Q: Is there a relationship between these two things?

Correlation: No. Correlation is 0.

Obvious common sense answer: Yes, clearly.

Correlation: Ok fine but I never said I was perfect – I can only indicate the presence of a relationship!

The problem is that this argument often slips into the assertion that if there is no correlation, there is no relationship. If A then B = if not A then not B. This is a logical error. Correlation is neither necessary nor sufficient in variables that are causally related.

A real-world example: consider driving up a hill and sticking to the 60mph speed limit. Being the budding young data enthusiast that you are, you decide to measure the force applied to the accelerator and the speed of the car. You observe no correlation between force applied and speed. Conclusion: there is no relationship between force applied to the accelerator and speed. Another example:

There are many others.

**The paradox of correlation**

This is counter-intuitive. If I change A but nothing happens to B, surely A and B are unrelated? No, because of confounding variables. This is one of the things that spoil the party: in reality, C may be having an effect on A and B, which is causing the two to appear to be unrelated. Or related.

There are other statistical ‘anomalies’ caused by these confounders. The most famous of these is the Simpson’s Paradox:

At a whole sample level, height is inversely correlated with basketball success. But when the subgroups are correctly segregated, the true nature of the relationships emerge. A lesser-known manifestation is Berkson’s Paradox:

If you don’t date people who are ugly and mean, you’ll observe an inverse correlation between looks and niceness. You’ll come up with some theory that being hot *causes* you to be mean because you can get away with it or something. But this is a fake relationship – in reality, the two are unrelated.

**Applications**

Correlation works best in simple, linear environments in which confounding variables have been taken account of and the distributions of the variables are easy to work with.

As I have said before, this isn’t the environment that investors find themselves in. Therefore, correlation is to be used, if at all, with extreme caution when thinking about constructing a portfolio.