Statistics

# Generating correlated variables in R, and an uninvertability hypotheises.

This has been a problem I’ve been suffering with as of late.  How do you generate variables with an arbitrary covariance in R?  There’s no “covariance=3” button!  This creates some issues when I’m trying to simulate various behaviours in my econometrics studying (such as omitted variable bias, forthcoming). So I attempted to come up with a solution– only to discover the problem is much more difficult than I initially thought!

Say x is an arbitrary NxK matrix composed of N draws from K different random variables, and we would like to generate a matrix y (NxK) such that it has  an arbitrary covariance matrix c (KxK) and arbitrary mean m=E(y)=1xK. Define 1 as a 1xN vector of 1’s, which allows 1*x=E(x)=1xK, and N remains a scalar. $c=E(y'x)-E(y)'E(x)$ $c+E(y)'E(x)=E(y'x)$ $c+E(y)'(1*x/N)=y'x/N$ Convert into sample analogs, E(y) is a selected parameter, so it has no sample analog at the moment. $cN+E(y)'(1*x)=y'x$ Keep in mind N is a scalar $cNx'+E(y)'(1*x)x'=y'xx'$ $N*cx'inv(xx')+E(y)'1=y$

But unfortunately, xx‘ needs to be invertible! xx‘ rarely is invertable as an NxN when N is large and K is small-> only x’x, a KxK is likely invertable!  I think we can show that as N approaches infinity, xx’ is singular with probability approaching 1 for a finite K. At a minimum, we can show that this isn’t going to work in R! Work below.

N<-50
K<-round(runif(1,1,10))
c<-matrix(runif(K^2,-1,1), ncol=K, nrow=K)
c[upper.tri(c)]<-c[lower.tri(c)] #correlation matrixes are symmetric by definition.
#diag(c)<-1
ey<-matrix(rnorm(K,0,1),1,K)
eine<-matrix(1,1,N)

ex<-rnorm(1,0,10)
x<-matrix(rnorm(N,ex,10)) #one row is alcready created.
tick<-0 #cheap trick to make sure your loop repeated successfully. You’ll see it.
for(i in 2:K){
ex<-rnorm(1,0,10)
holding<-as.matrix(rnorm(N,ex,1)) #it’s an Nx1, k=1
x<-cbind(x,holding)
tick<-tick+1
}
print(tick)

solve(t(x)%*%x/N) #kxk is invertable

solve(x%*%t(x)/N) #nxn is not!

My thought is that k squared terms: $(x_{n1}^2+x_{n2}^2+...x_{nk}^2)/N$ for K different terms. But if K is small, then the sum of $x_k^2$‘s is an unincreasing finite number, divided by N, which approaches infinity. As such, all the diagonal elements approach 0.  The same is true of the cross diagonal elements. An NxN matrix with all 0’s in all elements has rank less than its dimension and cannot be inverted.

http://www.r-bloggers.com/simulating-random-multivariate-correlated-data-continuous-variables/