# A Brief Presentation for the Student Investment Club (SIC)

In an attempt to branch out and see what other people do in terms of work, I’ve been creating a model for the Student Investment Club to simultaneously forecast GDP, CPI, and Unemployment.  While such a prediction is clearly overly ambitious for a casual effort, I made an attempt at it using some basic methodologies.  The dependent variables that I used in this case were guided by the preferences of the group, rather than by any particular theoretical model.  As such, I have very little faith in the model to be a powerful predictor on a fundamental level, but I do expect it to be correlated with the actual values.

Attached is my presentation (5-10 minutes) about my preliminary forecasts and findings. It is meant to be delivered to a nontechnical audience and is meant to be a partial (but not complete) disclosure of the problems with my approach. Below is a version of the model I am using to make such a forecast, with some admittedly sparse commentary.

SIC Presentation – Professional (Warm)

library(tseries)
library(quantmod)
library(systemfit)

# Old work functions as a great key
# gdp<-getSymbols(‘GDPC1′,src=’FRED’)
# indp<-getSymbols(‘INDPRO’, src=’FRED’)
# ism<-getSymbols(‘NAPM’, src=’FRED’)
# cap<-getSymbols(‘TCU’,src=’FRED’)
# wage<-getSymbols(‘AHETPI’,src=’FRED’) #Productivity? Proxy:wages
# ppi<-getSymbols(‘PPIACO’,src=’FRED’)
# unemploy<-U1RATENSA
# libor<-USDONTD156N
#
# cpi<-getSymbols(‘CPIAUCSL’,src=’FRED’)
# nom_pers_inc<-getSymbols(‘PINCOME’,src=’FRED’) #this might need to be real
# senti<-getSymbols(‘UMCSENT’,src=’FRED’)
# #demand<-getSymbols(‘DEMOTHCONS’,src=’FRED’)#Consumer demand? Proxy: request for more loans
# #cpi<-getSymbols(‘TCU’,src=’FRED’) #Total sales? Proxy: Change in buisness inventories

#Get the data
out<-NULL
b_names<-c(“GDPC1″,”INDPRO”,”NAPM”,”TCU”,”AHETPI”,”PPIACO”,”CPIAUCSL”,”PINCOME”,”UMCSENT”,”FEDFUNDS”,”U1RATENSA”)
getSymbols(b_names,src=’FRED’)
b<-list(GDPC1,INDPRO,NAPM,TCU,AHETPI,PPIACO,CPIAUCSL,PINCOME,UMCSENT,FEDFUNDS,U1RATENSA)
FEDFUNDS<-na.exclude(FEDFUNDS)
out<-lapply(b, aggregate, by=as.yearqtr, mean)

# Scale it appropriately.
series<-lapply(out,window,start=as.yearqtr(“2000 Q1”), end=as.yearqtr(“2013 Q1”))#trims to a consistant window.
series<-lapply(series,cbind)
series<-data.frame(series)
names(series)<-b_names
series<-log(series) #log the series
series<-as.ts(series) #need time series for this following operator:
series<-diff.ts(series[,c(“GDPC1″,”INDPRO”,”TCU”,”AHETPI”,”PPIACO”,”CPIAUCSL”,”PINCOME”,”UMCSENT”,”FEDFUNDS”,”U1RATENSA”)]) #first difference
lagGDP<-series[,”GDPC1″]
lagCPI<-series[,”CPIAUCSL”]
lagUNEMP<-series[,”U1RATENSA”]
series<-data.frame(series) #back to df
series\$NAPM<-matrix(NAPM[(dim(NAPM)[1]+2-dim(series)[1]):dim(NAPM)[1]]) #Some may be stationary!
series\$lvl_UMCSENT<-matrix(UMCSENT[(dim(UMCSENT)[1]+2-dim(series)[1]):dim(UMCSENT)[1]])
series\$lvl_TCU<-matrix(TCU[(dim(TCU)[1]+2-dim(series)[1]):dim(TCU)[1]])
series\$lvl_NAPM<-matrix(NAPM[(dim(NAPM)[1]+2-dim(series)[1]):dim(NAPM)[1]])
series\$lvl_FEDFUNDS<-matrix(FEDFUNDS[(dim(FEDFUNDS)[1]+2-dim(series)[1]):dim(FEDFUNDS)[1]])
series\$t.index<-zooreg(series, start=as.yearqtr(“2000 Q1”),end=as.yearqtr(“2013 Q1″), frequency = 4) #need a time trend
series\$quarter<-as.vector(seq(from=1,to=4, by=1))
# series\$PINCOME_2<-(series\$PINCOME)^2 #are these acceptable?
# series\$GDPC_2<-(series\$GDPC1)^2
series_hold<-data.frame(series)
# documentation http://cran.r-project.org/web/packages/systemfit/vignettes/systemfit.pdf
series\$Lead_U1RATENSA<-lag(zoo(lagUNEMP),k=+2, na.pad=TRUE) #impact takes at least 2 quarters. This is needed because we are missing CPI numbers for last quarter. Sentiment is delayed 6 months as propietary information. If it is set to +2, the estimates are to see what it would be like if we had the current info (pay for it).
eq1<- Lead_GDPC1 ~ INDPRO + lvl_NAPM + lvl_UMCSENT + GDPC1 + TCU + CPIAUCSL + FEDFUNDS + U1RATENSA + factor(quarter)
eq2<- Lead_CPIAUCSL ~ INDPRO + lvl_NAPM + lvl_UMCSENT + GDPC1 + TCU + CPIAUCSL + FEDFUNDS + U1RATENSA + factor(quarter)
eq3<- Lead_U1RATENSA ~ INDPRO + lvl_NAPM + lvl_UMCSENT + GDPC1 + TCU + CPIAUCSL + FEDFUNDS + U1RATENSA + factor(quarter)
eqsystem<-list(GDP=eq1,CPI=eq2,UNEMP=eq3)
# series<-data.frame(series)
fit<-systemfit(eqsystem, method=”SUR”, data=series)
pred<-predict(fit,series, se.pred=TRUE)
pred_ci<-predict(fit, series, interval=”confidence”, level=0.95) #note events are not normal.
plot(series\$GDPC1, type=”l”, col=”darkgreen”, ylab=”% Change in GDP”, xlab=”Quarters (since 2000)”, main=”GDP forecast”) #the dimseries -40 gets me 10years.
points(pred[1], type=”l”, col=”blue”, lty=5)
points(pred_ci[,c(3)],type=”l”, col=”red”, lty=2)
points(pred_ci[,c(2)],type=”l”, col=”red”, lty=2)
legend(x=”bottomleft”,c(“Green= Actual GDP”,”Red= 95% CI”,”Blue=Forecast”), cex=0.90)

plot(series\$CPIAUCSL, type=”l”, col=”darkgreen”, ylab=”% Change in CPI”, xlab=”Quarters (since 2000)”,main=”CPI forecast”)
points(pred[3], type=”l”, col=”blue”, lty=5)
points(pred_ci[,5],type=”l”, col=”red”, lty=2)
points(pred_ci[,6],type=”l”, col=”red”, lty=2)
legend(x=”bottomleft”,c(“Green= Actual GDP”,”Red= 95% CI”,”Blue=Forecast”), cex=0.90)

plot(series\$U1RATENSA, type=”l”, col=”darkgreen”, ylab=”% Change in UNEMP”, xlab=”Quarters (since 2000)”, main=”UNEMP forecast”)
points(pred[5], type=”l”, col=”blue”, lty=5)
points(pred_ci[,8],type=”l”, col=”red”, lty=2)
points(pred_ci[,9],type=”l”, col=”red”, lty=2)
legend(x=”bottomleft”,c(“Green= Actual GDP”,”Red= 95% CI”,”Blue=Forecast”), cex=0.90)
summary(fit)

tail(pred)
pred<-rbind(0,rbind(0,pred))
pred_ci<-rbind(0,rbind(0,pred_ci))
tail(series[c(“CPIAUCSL”,”GDPC1″,”U1RATENSA”)])

Programming

# Accidental Art

I’m working on phased entry with spatial price discrimination. This got spat out, and I’m really enjoying the subtle patterns.

Statistics

# Generating correlated variables in R, and an uninvertability hypotheises.

This has been a problem I’ve been suffering with as of late.  How do you generate variables with an arbitrary covariance in R?  There’s no “covariance=3” button!  This creates some issues when I’m trying to simulate various behaviours in my econometrics studying (such as omitted variable bias, forthcoming). So I attempted to come up with a solution– only to discover the problem is much more difficult than I initially thought!

Say x is an arbitrary NxK matrix composed of N draws from K different random variables, and we would like to generate a matrix y (NxK) such that it has  an arbitrary covariance matrix c (KxK) and arbitrary mean m=E(y)=1xK. Define 1 as a 1xN vector of 1’s, which allows 1*x=E(x)=1xK, and N remains a scalar.

$c=E(y'x)-E(y)'E(x)$
$c+E(y)'E(x)=E(y'x)$
$c+E(y)'(1*x/N)=y'x/N$ Convert into sample analogs, E(y) is a selected parameter, so it has no sample analog at the moment.
$cN+E(y)'(1*x)=y'x$ Keep in mind N is a scalar
$cNx'+E(y)'(1*x)x'=y'xx'$
$N*cx'inv(xx')+E(y)'1=y$

But unfortunately, xx‘ needs to be invertible! xx‘ rarely is invertable as an NxN when N is large and K is small-> only x’x, a KxK is likely invertable!  I think we can show that as N approaches infinity, xx’ is singular with probability approaching 1 for a finite K. At a minimum, we can show that this isn’t going to work in R! Work below.

N<-50
K<-round(runif(1,1,10))
c<-matrix(runif(K^2,-1,1), ncol=K, nrow=K)
c[upper.tri(c)]<-c[lower.tri(c)] #correlation matrixes are symmetric by definition.
#diag(c)<-1
ey<-matrix(rnorm(K,0,1),1,K)
eine<-matrix(1,1,N)

ex<-rnorm(1,0,10)
x<-matrix(rnorm(N,ex,10)) #one row is alcready created.
tick<-0 #cheap trick to make sure your loop repeated successfully. You’ll see it.
for(i in 2:K){
ex<-rnorm(1,0,10)
holding<-as.matrix(rnorm(N,ex,1)) #it’s an Nx1, k=1
x<-cbind(x,holding)
tick<-tick+1
}
print(tick)

solve(t(x)%*%x/N) #kxk is invertable

solve(x%*%t(x)/N) #nxn is not!

My thought is that k squared terms: $(x_{n1}^2+x_{n2}^2+...x_{nk}^2)/N$ for K different terms. But if K is small, then the sum of $x_k^2$‘s is an unincreasing finite number, divided by N, which approaches infinity. As such, all the diagonal elements approach 0.  The same is true of the cross diagonal elements. An NxN matrix with all 0’s in all elements has rank less than its dimension and cannot be inverted.

http://www.r-bloggers.com/simulating-random-multivariate-correlated-data-continuous-variables/

# Basic bootstrapping in R

I’ve been having some trouble determining how strata works in the boot() command.   My intuition says it should select from within each type of strata. But no guarantees!  I can always just type “boot” and read the whole command line for line… but this isn’t always easy to interpret without comments.
So here’s a quick test to make sure it does what I think it does.

x<-rep(c(0,100,0,1), each=20) #make a long list of 20 zeros,20 one hundreds, 20 zeros, 20 ones.pool<-matrix(x, ncol=2) #make that list into a 20 by 2 matrix, filling it downward.

pool #let’s look at it.
f<-function(pool,i){
mean(pool[i,1]) #mean of the first column of pool, using individuals i
} #create a function that uses (data, individuals)
boot(pool,f, R=500) #resample and perform that operation 500 times.  Has variation in output.
boot(pool,f, R=500, strata=pool[,2]) #resample and perform that operation -reselecting from within the strata on the rhs. Since all observations within each strata are identical, the standard deviation should be zero.

Here’s an interesting mistake that I made while creating this command.

f<-function(pool,i){
mean(pool[,1])
} #create a function that uses (data, individuals)
boot(pool,f, R=500) #Has no variation in bootstrapping.  Why? What’s up with that?

Answer: There’s no individuals. It just resamples the entire population of the first column of pooled 500 times, getting the final result as shown.