An interesting question is what kind of correlation matrices are possible with three variables (A, B and C). If we know, that there is a correlation between A and B as well as B and C, what kind of correlation can occur between A and C. What are the possible maximum values of the correlation between A and B and between B and C, when the correlation between A and C variables is null?

The correlation can be interpreted as the cosine of the angle between the normalized vectors of the variables .

cor(A,B) = cos(α)

Therefore the correlation between A and C is null, if their vectors are orthogonal (cos (90°)=0). The correlation between the variables is higher, if the angle between their vectors is smaller, so if the angle between A and C vectors is fixed, the correlation between A and B as well as between B and C variables is the largest, if the vector of the B variable is in the same plane like the vectors of A and C variables.

For example, if the correlation between A and B, B and C variables are the same, then its maximum value is (cos(45°)), when the correlation between A and C is null.

Let’s see how it works in practice!

We are generating 3 normally distributed random variables (n = 1000).

> options(digits=7)

> set.seed(234)

>

> M <- matrix(rnorm(3000), ncol=3)

> colnames(M) <- c(“A”, “B”, “C”)

> head(M)

A B C

[1,] -1.34352141 -0.158314852 -0.41120490

[2,] 0.62177555 0.018813945 -0.27796435

[3,] 0.80087466 0.498246468 0.40257018

[4,] -1.38889241 -1.675263002 0.45676675

[5,] -0.71435686 3.003174741 -0.43762865

[6,] -0.32406105 -0.608898653 1.36512746

We are defining the desired correlation matrix. The value of the variable maxCor is (rounded down), i.e., the maximal correlation between A and B, B and C variables, if they are the same values and the correlation between A and C variables is 0.

> maxCor <- floor(1e7*sqrt(2)/2)/1e7

> CM <- matrix(c(1,maxCor,0,

+ maxCor,1,maxCor,

+ 0,maxCor,1), nrow=3)

> colnames(CM) <- c(“A”, “B”, “C”)

> rownames(CM) <- c(“A”, “B”, “C”)

> CM

A B C

A 1.0000000 0.7071067 0.0000000

B 0.7071067 1.0000000 0.7071067

C 0.0000000 0.7071067 1.0000000

We change the values of the 3 generated variables in order to reach the given correlation matrix with Cholesky decomposition.

> L <- chol(CM)

> ABC <- ABC %*% t(L)

> head(ABC)

A B C

[1,] -1.45546690 -0.52315033 -0.00027866842

[2,] 0.63507902 -0.26466082 -0.00018837296

[3,] 1.15318808 0.75488359 0.00027281678

[4,] -2.57348211 -0.72782332 0.00030954511

[5,] 1.40920812 1.68593692 -0.00029657546

[6,] -0.75461737 0.93457073 0.00092512980

Let’s see if the correlations between the variable are as we wanted!

> cor(ABC)

A B C

A 1.000000000 0.35850510 0.034449684

B 0.358505099 1.00000000 0.812815583

C 0.034449684 0.81281558 1.000000000

The correlation matrix is not the most successful, however, as we can see a distribution like this is theoretically possible.

What happens if we increase a little the correlation between A and B, B and C variables (with 0.0000001)?

> maxCor <- floor(1e7*sqrt(2)/2)/1e7+1e-7

> CM <- matrix(c(1,maxCor,0,

+ maxCor,1,maxCor,

+ 0,maxCor,1), nrow=3)

> L <- chol(CM)

Error in chol.default(CM) :

the leading minor of order 3 is not positive definite

We get an error message, because such distribution doesn’t exist. The desired correlation matrix is not positive definite, there is a negative eigenvalue, so it can not be a correlation matrix. If the correlation between A and B, B and C variables are greater than , then the correlations between A and C cannot be 0.

> eigen(CM)

$values

[1] 2.0000000e+00 1.0000000e+00 -2.6606238e-08$vectors

[,1] [,2] [,3]

[1,] 0.50000000 -7.0710678e-01 0.50000000

[2,] 0.70710678 -4.4408920e-16 -0.70710678

[3,] 0.50000000 7.0710678e-01 0.50000000

Of course the correlation between A and B, B and C variables can be less than these values, because if the vector of the B variable is not in the plane of the vectors of A and C variables, then the angles between A and B, B and C variables can be larger, so the correlations between them are smaller.

Similarly, if the correlations between A and B, B and C are different, for example the angle between the vectors of the A and B variables is 55°, the correlation between the B and C is maximal if the angle between the corresponding vectors is 90° – 55° = 35°. So, if the correlation between A and B is cos(55°) = 0.574, the correlation between A and C can be 0, if the correlation between the variables B and C is maximum cos (35°) = 0.819.

> maxCorAB <- 0.5735764

> maxCorBC <- 0.8191520

> CM <- matrix(c(1,maxCorAB,0,

+ maxCorAB,1,maxCorBC,

+ 0,maxCorBC,1), nrow=3)

> CM

[,1] [,2] [,3]

[1,] 1.0000000 0.5735764 0.000000

[2,] 0.5735764 1.0000000 0.819152

[3,] 0.0000000 0.8191520 1.000000

>

> L <- chol(CM)

>

> ABC <- M %*% t(L)

>

> cor(ABC)

[,1] [,2] [,3]

[1,] 1.000000000 0.3341330 0.032695729

[2,] 0.334132999 1.0000000 0.770568600

[3,] 0.032695729 0.7705686 1.000000000

If, however, we increase slightly the correlation between the A and B or B and C variables than the maximum value, we get an error message again.

> maxCorAB <- 0.5735764+1e-7

> maxCorBC <- 0.8191520

> CM <- matrix(c(1,maxCorAB,0,

+ maxCorAB,1,maxCorBC,

+ 0,maxCorBC,1), nrow=3)

>

> L <- chol(CM)

Error in chol.default(CM) :

the leading minor of order 3 is not positive definite

>

> maxCorAB <- 0.5735764

> maxCorBC <- 0.8191520+1e-7

> CM <- matrix(c(1,maxCorAB,0,

+ maxCorAB,1,maxCorBC,

+ 0,maxCorBC,1), nrow=3)

>

> L <- chol(CM)

Error in chol.default(CM) :

the leading minor of order 3 is not positive definite

The correlations between the variables are not independent. Correlation between two variables is possible only within a given framework (even if this framework is quite wide), if the correlation between them and a third variable is given.