Programming

Using “cbind” and “as.vector”: Computationally intensive commands

As perhaps a mere interesting note, cbind when combined with as.vector can be a particularly RAM-intensive set of commands. I noted the following script excerpt caused my computer to quickly consume  11GB of RAM on a 300k entry dataset. :

for(j in c(1:200)){
mod.out$coefficients$count[1:80]<-lim.my.df[1:80,j]
mod.out$coefficients$zero[1:80]<-lim.my.df[81:160,j]
a<-predict(mod.out,clip.1)
b<-predict(mod.out,clip.mean)
diff.j<-mean(a-b)
# diff[,paste(i,”.”,j)]<-diff.j
diff<-as.vector(cbind(diff,diff.j))
}

The purpose of this script is to use bootstrapped coefficients generate an average partial effect between clip.1 and clip.mean. We will later use this to get a estimate of the standard errors of the APE. As it stands, it eats all my RAM quite promptly and causes the computer to crash.  The following script, nearly identical, does not have this problem:

for(j in c(1:200)){
mod.out$coefficients$count[1:80]<-lim.my.df[1:80,j]
mod.out$coefficients$zero[1:80]<-lim.my.df[81:160,j]
a<-predict(mod.out,clip.1)
b<-predict(mod.out,clip.mean)
diff.j<-mean(a-b)
# diff[,paste(i,”.”,j)]<-diff.j
diff<-cbind(diff,diff.j)
}
diff<-as.vector(diff)

And this works just fine! In fact, it barely  consumes 25% of my RAM.

Advertisements
Statistics

Generating correlated variables in R, and an uninvertability hypotheises.

This has been a problem I’ve been suffering with as of late.  How do you generate variables with an arbitrary covariance in R?  There’s no “covariance=3” button!  This creates some issues when I’m trying to simulate various behaviours in my econometrics studying (such as omitted variable bias, forthcoming). So I attempted to come up with a solution– only to discover the problem is much more difficult than I initially thought!

Say x is an arbitrary NxK matrix composed of N draws from K different random variables, and we would like to generate a matrix y (NxK) such that it has  an arbitrary covariance matrix c (KxK) and arbitrary mean m=E(y)=1xK. Define 1 as a 1xN vector of 1’s, which allows 1*x=E(x)=1xK, and N remains a scalar.

c=E(y'x)-E(y)'E(x)
c+E(y)'E(x)=E(y'x)
c+E(y)'(1*x/N)=y'x/N Convert into sample analogs, E(y) is a selected parameter, so it has no sample analog at the moment.
cN+E(y)'(1*x)=y'x Keep in mind N is a scalar
cNx'+E(y)'(1*x)x'=y'xx'
N*cx'inv(xx')+E(y)'1=y

But unfortunately, xx‘ needs to be invertible! xx‘ rarely is invertable as an NxN when N is large and K is small-> only x’x, a KxK is likely invertable!  I think we can show that as N approaches infinity, xx’ is singular with probability approaching 1 for a finite K. At a minimum, we can show that this isn’t going to work in R! Work below.

N<-50
K<-round(runif(1,1,10))
c<-matrix(runif(K^2,-1,1), ncol=K, nrow=K)
c[upper.tri(c)]<-c[lower.tri(c)] #correlation matrixes are symmetric by definition.
#diag(c)<-1
ey<-matrix(rnorm(K,0,1),1,K)
eine<-matrix(1,1,N)

ex<-rnorm(1,0,10)
x<-matrix(rnorm(N,ex,10)) #one row is alcready created.
tick<-0 #cheap trick to make sure your loop repeated successfully. You’ll see it.
for(i in 2:K){
ex<-rnorm(1,0,10)
holding<-as.matrix(rnorm(N,ex,1)) #it’s an Nx1, k=1
x<-cbind(x,holding)
tick<-tick+1
}
print(tick)

solve(t(x)%*%x/N) #kxk is invertable

solve(x%*%t(x)/N) #nxn is not!

My thought is that k squared terms: (x_{n1}^2+x_{n2}^2+...x_{nk}^2)/N for K different terms. But if K is small, then the sum of x_k^2‘s is an unincreasing finite number, divided by N, which approaches infinity. As such, all the diagonal elements approach 0.  The same is true of the cross diagonal elements. An NxN matrix with all 0’s in all elements has rank less than its dimension and cannot be inverted.

http://www.r-bloggers.com/simulating-random-multivariate-correlated-data-continuous-variables/