Statistics

Generating correlated variables in R, and an uninvertability hypotheises.

This has been a problem I’ve been suffering with as of late.  How do you generate variables with an arbitrary covariance in R?  There’s no “covariance=3” button!  This creates some issues when I’m trying to simulate various behaviours in my econometrics studying (such as omitted variable bias, forthcoming). So I attempted to come up with a solution– only to discover the problem is much more difficult than I initially thought!

Say x is an arbitrary NxK matrix composed of N draws from K different random variables, and we would like to generate a matrix y (NxK) such that it has  an arbitrary covariance matrix c (KxK) and arbitrary mean m=E(y)=1xK. Define 1 as a 1xN vector of 1’s, which allows 1*x=E(x)=1xK, and N remains a scalar.

c=E(y'x)-E(y)'E(x)
c+E(y)'E(x)=E(y'x)
c+E(y)'(1*x/N)=y'x/N Convert into sample analogs, E(y) is a selected parameter, so it has no sample analog at the moment.
cN+E(y)'(1*x)=y'x Keep in mind N is a scalar
cNx'+E(y)'(1*x)x'=y'xx'
N*cx'inv(xx')+E(y)'1=y

But unfortunately, xx‘ needs to be invertible! xx‘ rarely is invertable as an NxN when N is large and K is small-> only x’x, a KxK is likely invertable!  I think we can show that as N approaches infinity, xx’ is singular with probability approaching 1 for a finite K. At a minimum, we can show that this isn’t going to work in R! Work below.

N<-50
K<-round(runif(1,1,10))
c<-matrix(runif(K^2,-1,1), ncol=K, nrow=K)
c[upper.tri(c)]<-c[lower.tri(c)] #correlation matrixes are symmetric by definition.
#diag(c)<-1
ey<-matrix(rnorm(K,0,1),1,K)
eine<-matrix(1,1,N)

ex<-rnorm(1,0,10)
x<-matrix(rnorm(N,ex,10)) #one row is alcready created.
tick<-0 #cheap trick to make sure your loop repeated successfully. You’ll see it.
for(i in 2:K){
ex<-rnorm(1,0,10)
holding<-as.matrix(rnorm(N,ex,1)) #it’s an Nx1, k=1
x<-cbind(x,holding)
tick<-tick+1
}
print(tick)

solve(t(x)%*%x/N) #kxk is invertable

solve(x%*%t(x)/N) #nxn is not!

My thought is that k squared terms: (x_{n1}^2+x_{n2}^2+...x_{nk}^2)/N for K different terms. But if K is small, then the sum of x_k^2‘s is an unincreasing finite number, divided by N, which approaches infinity. As such, all the diagonal elements approach 0.  The same is true of the cross diagonal elements. An NxN matrix with all 0’s in all elements has rank less than its dimension and cannot be inverted.

http://www.r-bloggers.com/simulating-random-multivariate-correlated-data-continuous-variables/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s