Possible correlation matrices (3 variables)

An interesting question is what kind of correlation matrices are possible with three variables (A, B and C). If we know, that there is a correlation between A and B as well as B and C, what kind of correlation can occur between A and C. What are the possible maximum values of the correlation between A and B and between B and C, when the correlation between A and C variables is null?

The correlation can be interpreted as the cosine of the angle between the normalized vectors of the variables x-x_.

A_B

cor(A,B) = cos(α)

Therefore the correlation between A and C is null, if their vectors are orthogonal (cos (90°)=0). The correlation between the variables is higher, if the angle between their vectors is smaller, so if the angle between A and C vectors is fixed, the correlation between A and B as well as between B and C variables is the largest, if the vector of the B variable is in the same plane like the vectors of A and C variables.

For example, if the correlation between A and B, B and C variables are the same, then its maximum value is 2_2 (cos(45°)), when the correlation between A and C is null.

A_B_C

 

Let’s see how it works in practice!

We are generating 3 normally distributed random variables (n = 1000).

> options(digits=7)
> set.seed(234)
>
> M <- matrix(rnorm(3000), ncol=3)
> colnames(M) <- c(“A”, “B”, “C”)
> head(M)
A            B           C
[1,] -1.34352141 -0.158314852 -0.41120490
[2,]  0.62177555  0.018813945 -0.27796435
[3,]  0.80087466  0.498246468  0.40257018
[4,] -1.38889241 -1.675263002  0.45676675
[5,] -0.71435686  3.003174741 -0.43762865
[6,] -0.32406105 -0.608898653  1.36512746

 

We are defining the desired correlation matrix. The value of the variable maxCor is 2_2 (rounded down), i.e., the maximal correlation between A and B, B and C variables, if they are the same values and the correlation between A and C variables is 0.

> maxCor <- floor(1e7*sqrt(2)/2)/1e7
> CM <- matrix(c(1,maxCor,0,
+ maxCor,1,maxCor,
+ 0,maxCor,1), nrow=3)
> colnames(CM) <- c(“A”, “B”, “C”)
> rownames(CM) <- c(“A”, “B”, “C”)
> CM
A B C
A 1.0000000 0.7071067 0.0000000
B 0.7071067 1.0000000 0.7071067
C 0.0000000 0.7071067 1.0000000

 

We change the values of the 3 generated variables in order to reach the given correlation matrix with Cholesky decomposition.

> L <- chol(CM)
> ABC <- ABC %*% t(L)
> head(ABC)
A           B              C
[1,] -1.45546690 -0.52315033 -0.00027866842
[2,]  0.63507902 -0.26466082 -0.00018837296
[3,]  1.15318808  0.75488359  0.00027281678
[4,] -2.57348211 -0.72782332  0.00030954511
[5,]  1.40920812  1.68593692 -0.00029657546
[6,] -0.75461737  0.93457073  0.00092512980

 

Let’s see if the correlations between the variable are as we wanted!

> cor(ABC)
A          B           C
A 1.000000000 0.35850510 0.034449684
B 0.358505099 1.00000000 0.812815583
C 0.034449684 0.81281558 1.000000000

The correlation matrix is not the most successful, however, as we can see a distribution like this is theoretically possible.

 

What happens if we increase a little the correlation between A and B, B and C variables (with 0.0000001)?

> maxCor <- floor(1e7*sqrt(2)/2)/1e7+1e-7
> CM <- matrix(c(1,maxCor,0,
+                maxCor,1,maxCor,
+                0,maxCor,1), nrow=3)
> L <- chol(CM)
Error in chol.default(CM) :
the leading minor of order 3 is not positive definite

We get an error message, because such distribution doesn’t exist. The desired correlation matrix is not positive definite, there is a negative eigenvalue, so it can not be a correlation matrix. If the correlation between A and B, B and C variables are greater than , then the correlations between A and C cannot be 0.

> eigen(CM)
$values
[1]  2.0000000e+00  1.0000000e+00 -2.6606238e-08

$vectors
[,1]           [,2]        [,3]
[1,] 0.50000000 -7.0710678e-01  0.50000000
[2,] 0.70710678 -4.4408920e-16 -0.70710678
[3,] 0.50000000  7.0710678e-01  0.50000000

 

Of course the correlation between A and B, B and C variables can be less than these values, because if the vector of the B variable is not in the plane of the vectors of A and C variables, then the angles between A and B, B and C variables can be larger, so the correlations between them are smaller.

Similarly, if the correlations between A and B, B and C are different, for example the angle between the vectors of the A and B variables is 55°, the correlation between the B and C is maximal if the angle between the corresponding vectors is  90° – 55° = 35°. So, if the correlation between A and B is cos(55°) = 0.574, the correlation between A and C can be 0, if the correlation between the variables B and C is maximum cos (35°) = 0.819.

> maxCorAB <- 0.5735764
> maxCorBC <- 0.8191520
> CM <- matrix(c(1,maxCorAB,0,
+                maxCorAB,1,maxCorBC,
+                0,maxCorBC,1), nrow=3)
> CM
[,1]      [,2]     [,3]
[1,] 1.0000000 0.5735764 0.000000
[2,] 0.5735764 1.0000000 0.819152
[3,] 0.0000000 0.8191520 1.000000
>
> L <- chol(CM)
>
> ABC <- M %*% t(L)
>
> cor(ABC)
[,1]      [,2]        [,3]
[1,] 1.000000000 0.3341330 0.032695729
[2,] 0.334132999 1.0000000 0.770568600
[3,] 0.032695729 0.7705686 1.000000000

If, however, we increase slightly the correlation between the A and B or B and C variables than the maximum value, we get an error message again.

> maxCorAB <- 0.5735764+1e-7
> maxCorBC <- 0.8191520
> CM <- matrix(c(1,maxCorAB,0,
+                maxCorAB,1,maxCorBC,
+                0,maxCorBC,1), nrow=3)
>
> L <- chol(CM)
Error in chol.default(CM) :
the leading minor of order 3 is not positive definite
>
> maxCorAB <- 0.5735764
> maxCorBC <- 0.8191520+1e-7
> CM <- matrix(c(1,maxCorAB,0,
+                maxCorAB,1,maxCorBC,
+                0,maxCorBC,1), nrow=3)
>
> L <- chol(CM)
Error in chol.default(CM) :
the leading minor of order 3 is not positive definite

The correlations between the variables are not independent. Correlation between two variables is possible only within a given framework (even if this framework is quite wide), if the correlation between them and a third variable is given.

 

Leave a comment