Programming, Statistics

An Example of Plotting Multiple Time Series (Stock Values) on a Graph in R

I am currently in the process of designing a portfolio to manage investments. While such programs are not best plastered over the internet, a few basic concepts about plotting can be displayed.  For example, I have created a rather appealing plot, which demonstrates how to plot series of multiple images in a single plot, shown below:
Code is below, including my process to detrend the data. The critical lines are in bold, highlighting the fact that you can use sample(colors()) to select from the body of colors at random. This is useful when you may have to generate many plots, potentially without greatly detailed manual supervision, and you are not demanding publication-quality color selection (which is plausible for personal investigative use).

#after obtaining closing prices, you should make sure you clean your inputs. Ensure you know why there are NA’s, or you will make a critical error of omission.



data<-diff(data, lag=1, differences=1)

#Check for any remaining trends in data over and above the natural cyclical or time-trending motion of the stocks!
#Detrend based off of the bond, a necessary part of even a basic CAPM portfolio
detrended<-data-xhat #also, norm.
for(n in 2:N){

lines(index(detrended),detrended[,n], col=sample(colors(),size=1))


Programming, Statistics

Music and Math

Many people claim there is a strong correlation between music and math.
Below, I demonstrate that the patterns in music are NOT well predicted by typical statistical approaches.

I have taken a MIDI file of Beethoven’s 5th, and analyzed the track using non-parametric estimation techniques. These techniques included panel data techniques, ARMA, and extensive non-parametric estimation techniques (polynomial and Fourier series to capture cyclical components). I then use the song’s notes and my estimation technique to create a forecast of following notes. I then play the “forecasted song”.

After listening, the “forecasted song” does does not well match the original. As a consequence, I can state that the mathematical techniques common to forecasting do not well predict a song.  Below are several attempts which I have highlighted:

The R-squared for these estimations are in fact VERY high, in the high 90’s. (Only few of the coefficients are significant, the data is clearly overfitted in some regressions.) This song in fact falls into the so-called uncanny valley, and is only slightly deviant from the actual Beethoven’s 5th. However, the ear is strongly cultured to perfection in the subject of music, and the errors are devastating to us.


Accidental Art 2

My earlier paper mentioned in “Accidental Art” on phased entry has been postponed.   This postponement is a fortunate side effect of my successful publication in Regional Science and Urban Economics.  My successful publication has promptly propelled me into a direction of crime economics, rather than one of phased entry. As a result, this phased entry paper has been put on hold. Currently I’m doing some basic modeling on counter-terrorism, which makes me feel like a criminal mastermind. Amusingly enough, there’s still some some beautiful accidental art being churned out by my model, this time in Matlab!

accidental art 2 accidental art 1

Programming, Statistics

A Brief Presentation for the Student Investment Club (SIC)

In an attempt to branch out and see what other people do in terms of work, I’ve been creating a model for the Student Investment Club to simultaneously forecast GDP, CPI, and Unemployment.  While such a prediction is clearly overly ambitious for a casual effort, I made an attempt at it using some basic methodologies.  The dependent variables that I used in this case were guided by the preferences of the group, rather than by any particular theoretical model.  As such, I have very little faith in the model to be a powerful predictor on a fundamental level, but I do expect it to be correlated with the actual values.

Attached is my presentation (5-10 minutes) about my preliminary forecasts and findings. It is meant to be delivered to a nontechnical audience and is meant to be a partial (but not complete) disclosure of the problems with my approach. Below is a version of the model I am using to make such a forecast, with some admittedly sparse commentary.

SIC Presentation – Professional (Warm)


# Old work functions as a great key
# gdp<-getSymbols(‘GDPC1′,src=’FRED’)
# indp<-getSymbols(‘INDPRO’, src=’FRED’)
# ism<-getSymbols(‘NAPM’, src=’FRED’)
# cap<-getSymbols(‘TCU’,src=’FRED’)
# wage<-getSymbols(‘AHETPI’,src=’FRED’) #Productivity? Proxy:wages
# ppi<-getSymbols(‘PPIACO’,src=’FRED’)
# unemploy<-U1RATENSA
# libor<-USDONTD156N
# cpi<-getSymbols(‘CPIAUCSL’,src=’FRED’)
# nom_pers_inc<-getSymbols(‘PINCOME’,src=’FRED’) #this might need to be real
# senti<-getSymbols(‘UMCSENT’,src=’FRED’)
# #demand<-getSymbols(‘DEMOTHCONS’,src=’FRED’)#Consumer demand? Proxy: request for more loans
# #cpi<-getSymbols(‘TCU’,src=’FRED’) #Total sales? Proxy: Change in buisness inventories

#Get the data
out<-lapply(b, aggregate, by=as.yearqtr, mean)

# Scale it appropriately.
series<-lapply(out,window,start=as.yearqtr(“2000 Q1”), end=as.yearqtr(“2013 Q1”))#trims to a consistant window.
series<-log(series) #log the series
series<-as.ts(series) #need time series for this following operator:
series<-diff.ts(series[,c(“GDPC1″,”INDPRO”,”TCU”,”AHETPI”,”PPIACO”,”CPIAUCSL”,”PINCOME”,”UMCSENT”,”FEDFUNDS”,”U1RATENSA”)]) #first difference
series<-data.frame(series) #back to df
series$NAPM<-matrix(NAPM[(dim(NAPM)[1]+2-dim(series)[1]):dim(NAPM)[1]]) #Some may be stationary!
series$t.index<-zooreg(series, start=as.yearqtr(“2000 Q1”),end=as.yearqtr(“2013 Q1″), frequency = 4) #need a time trend
series$quarter<-as.vector(seq(from=1,to=4, by=1))
# series$PINCOME_2<-(series$PINCOME)^2 #are these acceptable?
# series$GDPC_2<-(series$GDPC1)^2
# documentation
series$Lead_GDPC1<-lag(zoo(lagGDP),k=+2, na.pad=TRUE)
series$Lead_CPIAUCSL<-lag(zoo(lagCPI),k=+2, na.pad=TRUE)
series$Lead_U1RATENSA<-lag(zoo(lagUNEMP),k=+2, na.pad=TRUE) #impact takes at least 2 quarters. This is needed because we are missing CPI numbers for last quarter. Sentiment is delayed 6 months as propietary information. If it is set to +2, the estimates are to see what it would be like if we had the current info (pay for it).
eq1<- Lead_GDPC1 ~ INDPRO + lvl_NAPM + lvl_UMCSENT + GDPC1 + TCU + CPIAUCSL + FEDFUNDS + U1RATENSA + factor(quarter)
eq2<- Lead_CPIAUCSL ~ INDPRO + lvl_NAPM + lvl_UMCSENT + GDPC1 + TCU + CPIAUCSL + FEDFUNDS + U1RATENSA + factor(quarter)
eq3<- Lead_U1RATENSA ~ INDPRO + lvl_NAPM + lvl_UMCSENT + GDPC1 + TCU + CPIAUCSL + FEDFUNDS + U1RATENSA + factor(quarter)
# series<-data.frame(series)
fit<-systemfit(eqsystem, method=”SUR”, data=series)
pred<-predict(fit,series, se.pred=TRUE)
pred_ci<-predict(fit, series, interval=”confidence”, level=0.95) #note events are not normal.
plot(series$GDPC1, type=”l”, col=”darkgreen”, ylab=”% Change in GDP”, xlab=”Quarters (since 2000)”, main=”GDP forecast”) #the dimseries -40 gets me 10years.
points(pred[1], type=”l”, col=”blue”, lty=5)
points(pred_ci[,c(3)],type=”l”, col=”red”, lty=2)
points(pred_ci[,c(2)],type=”l”, col=”red”, lty=2)
legend(x=”bottomleft”,c(“Green= Actual GDP”,”Red= 95% CI”,”Blue=Forecast”), cex=0.90)

plot(series$CPIAUCSL, type=”l”, col=”darkgreen”, ylab=”% Change in CPI”, xlab=”Quarters (since 2000)”,main=”CPI forecast”)
points(pred[3], type=”l”, col=”blue”, lty=5)
points(pred_ci[,5],type=”l”, col=”red”, lty=2)
points(pred_ci[,6],type=”l”, col=”red”, lty=2)
legend(x=”bottomleft”,c(“Green= Actual GDP”,”Red= 95% CI”,”Blue=Forecast”), cex=0.90)

plot(series$U1RATENSA, type=”l”, col=”darkgreen”, ylab=”% Change in UNEMP”, xlab=”Quarters (since 2000)”, main=”UNEMP forecast”)
points(pred[5], type=”l”, col=”blue”, lty=5)
points(pred_ci[,8],type=”l”, col=”red”, lty=2)
points(pred_ci[,9],type=”l”, col=”red”, lty=2)
legend(x=”bottomleft”,c(“Green= Actual GDP”,”Red= 95% CI”,”Blue=Forecast”), cex=0.90)




Using “cbind” and “as.vector”: Computationally intensive commands

As perhaps a mere interesting note, cbind when combined with as.vector can be a particularly RAM-intensive set of commands. I noted the following script excerpt caused my computer to quickly consume  11GB of RAM on a 300k entry dataset. :

for(j in c(1:200)){
# diff[,paste(i,”.”,j)]<-diff.j

The purpose of this script is to use bootstrapped coefficients generate an average partial effect between clip.1 and clip.mean. We will later use this to get a estimate of the standard errors of the APE. As it stands, it eats all my RAM quite promptly and causes the computer to crash.  The following script, nearly identical, does not have this problem:

for(j in c(1:200)){
# diff[,paste(i,”.”,j)]<-diff.j

And this works just fine! In fact, it barely  consumes 25% of my RAM.


Basic programming in Linux

I’ve just written my first program in Linux(Ubuntu) using bash commands. I’d love to give more details about what’s happening and how I’m doing it, but basically I’m following instructions and then playing with the little that I know about it.  I’ll be writing a program in two parts here, basically just as a demonstration of what they do and how they work.

Here’s the first program. Put it into a text file and save it as “testprogram”.

#!/usr/bin/env bash
# A simple command in a shell script
printf “Hello, world!\n”
printf “My name is \n”
printf “${name}! \n”
printf “What’s your name? \n”
read user_name
printf “I didn’t get that, is your name $name? \n”
read confirm
printf “What’s your name again? \n”
read user_name2
while [ $user_name = $user_name2 ]
printf “I heard your name is $user_name! No need to SHOUT!\n”

After doing that, navigate using “cd” and “ls” to the appropriate directory that you saved “testprogram” to.
Now, enter the command, this is the second part:

chmod a+rx testprogram

This will make the program “testprogram”, for All users (a), Readable and eXecutable (+rx).
You may now run the file by entering:


The program will produce the following output:

Hello World!
My name is COMPUTER!
What’s your name?
I didn’t get that, is your name <returns prompt>?
-prompt2, irrelevant what you say here, yes, your name again, etc.-
What’s your name again?
If your name in the first prompt is exactly the same as that of the third, it will scream:
I heard your name is <return prompt 1>! No need to SHOUT!
        It will repeat this until the program is terminated by the user.

I’m really pumped about this. This is part of my ongoing work at completing my dissertation (I need to make a standalone script to start the programs I want and pass arguments to it), as well as getting me to be a more useful part of the universe, so I’m always excited to get things done.