Below is a simple example showing why we may want the as our estimates of , when our naive intuition may suggest we only want the simple average of squared errors .

To show this in no uncertain terms, I have coded a linear regression by hand in R. Also embedded in the work below are several rules I follow about writing code. They are rules 0-6. There are many other rules, since code writing is an art.

####Coding in R

#### Rule 1: Always comment on every few lines of code. It is not unheard of to comment every single line, particularly for new coders, or complex code.

#### You will need to reference your work at a later date, and after about 3 months, the purpose is lost. Also, I need to read it.#### Rule 2: Define your variables first. Luckily these names are shared for us.

#### For your projects, use names which are clear for your research: (y=crime in Williamsburg, VA, X= Number of jello puddings consumed)set.seed(1223) #ensures replication. Sets seed of random number generators.

n<-25 #number of samples

x<-2*rnorm(n) #Our x’s come from a random sampling of X’s.

b_0<-10

b_1<-3 #Those cursed jello puddings are associated with increased crime. Linear regression is supportive of association- not causation.

u<-rnorm(n) #We satisfy both independent mean and zero mean assumptions

y<-b_0+b_1*x+u #This is defining our true Y. The true relationship is linear.plot(x,y) #Rule 0, really. Always check your data.

#### Rule 3: After definitions begin your second stage of work. Probably trimming existing data, etc. Do these in the order they were added.

hat_b_1<-sum( (x-mean(x)) * (y-mean(y)) ) / sum( (x-mean(x))^2 ) #Spaces between any parenthesized section of operations. We need to be able to see which parentheses are which.

hat_b_1 # Rule 4: Indent work which is conceptually subordinate. Indent more as needed. Four spaces=1 tab.

hat_b_0<-mean(y)-hat_b_1*mean(x)

hat_b_0 # Rule 5: Check your work as you go along. For our example, I got 9.89abline(a=hat_b_0, b=hat_b_1, col=”red”) #let’s add a red line of best fit. And we must see how our plot looks. Repeat rule 0.

hat_y<-hat_b_0+hat_b_1*x

hat_u<-hat_y-yplot(x,hat_u) # Let’s see our residuals

hist(hat_u) # Let’s see our histogram#### Rule 6: Keep your final analysis as punchy and short as possible without sacrificing clarity.

#### The mean sum of the squared errors (usually unknown to us as researchers)

sigma_sq<-sum(u^2)/n #this is the value we’re trying to estimate

sigma_sq_naive<-sum(hat_u^2)/n #this is a naive estimation of it

sigma_sq_hat<-sum(hat_u^2)/(n-2) #this turns out to be more accurate, particularly in small samples. If n->infinity this goes away. Try it for yourself!#R, is this assessment true? Is sig_sq_hat a better estimator of sig_sq than our naive estimator? Is it true we need the (-2)?

(sigma_sq-sigma_sq_naive) > (sigma_sq-sigma_sq_hat)

Here is one of several plots made by this code, showing a nice linear regression over the data:

Please don’t forget the derivation of *why* this is true! This is simply some supportive evidence that it might be true.