Below is a simple example showing why we may want the as our estimates of , when our naive intuition may suggest we only want the simple average of squared errors .
To show this in no uncertain terms, I have coded a linear regression by hand in R. Also embedded in the work below are several rules I follow about writing code. They are rules 0-6. There are many other rules, since code writing is an art.
####Coding in R
#### Rule 1: Always comment on every few lines of code. It is not unheard of to comment every single line, particularly for new coders, or complex code.
#### You will need to reference your work at a later date, and after about 3 months, the purpose is lost. Also, I need to read it.
#### Rule 2: Define your variables first. Luckily these names are shared for us.
#### For your projects, use names which are clear for your research: (y=crime in Williamsburg, VA, X= Number of jello puddings consumed)
set.seed(1223) #ensures replication. Sets seed of random number generators.
n<-25 #number of samples
x<-2*rnorm(n) #Our x’s come from a random sampling of X’s.
b_1<-3 #Those cursed jello puddings are associated with increased crime. Linear regression is supportive of association- not causation.
u<-rnorm(n) #We satisfy both independent mean and zero mean assumptions
y<-b_0+b_1*x+u #This is defining our true Y. The true relationship is linear.
plot(x,y) #Rule 0, really. Always check your data.
#### Rule 3: After definitions begin your second stage of work. Probably trimming existing data, etc. Do these in the order they were added.
hat_b_1<-sum( (x-mean(x)) * (y-mean(y)) ) / sum( (x-mean(x))^2 ) #Spaces between any parenthesized section of operations. We need to be able to see which parentheses are which.
hat_b_1 # Rule 4: Indent work which is conceptually subordinate. Indent more as needed. Four spaces=1 tab.
hat_b_0 # Rule 5: Check your work as you go along. For our example, I got 9.89
abline(a=hat_b_0, b=hat_b_1, col=”red”) #let’s add a red line of best fit. And we must see how our plot looks. Repeat rule 0.
plot(x,hat_u) # Let’s see our residuals
hist(hat_u) # Let’s see our histogram
#### Rule 6: Keep your final analysis as punchy and short as possible without sacrificing clarity.
#### The mean sum of the squared errors (usually unknown to us as researchers)
sigma_sq<-sum(u^2)/n #this is the value we’re trying to estimate
sigma_sq_naive<-sum(hat_u^2)/n #this is a naive estimation of it
sigma_sq_hat<-sum(hat_u^2)/(n-2) #this turns out to be more accurate, particularly in small samples. If n->infinity this goes away. Try it for yourself!
#R, is this assessment true? Is sig_sq_hat a better estimator of sig_sq than our naive estimator? Is it true we need the (-2)?
(sigma_sq-sigma_sq_naive) > (sigma_sq-sigma_sq_hat)
Here is one of several plots made by this code, showing a nice linear regression over the data:
Please don’t forget the derivation of why this is true! This is simply some supportive evidence that it might be true.