Directions

In this lab you will:

Open RStudio

Click on the home button, and scroll down until you find the RStudio directory. Click on RStudio.

Create a script file

Click on “File”, “New File”, “R Script”.

Manually insert some data

In class, I used a simple data set to illustrate Ordinary Least Squares (OLS). The data is \(Y = \lbrace 1,4,5,4 \rbrace \ , \ X = \lbrace 2,4,6,8 \rbrace\). This is a small data set that can be manually inserted into RStudio using the `combine'' functionc()`. Enter the variables now using the code:

Y <- c(1,4,5,4)
X <- c(2,4,6,8)

You should see the \(Y\) and \(X\) variables pop up in the top right of your screen. Plot the data using the command:

plot(X, Y)

To get the plot to look more like it does in the textbook, you can set some additional options for the plot:

plot(X, Y, xlim=c(0,10), ylim=c(0,10), col="violetred4", cex=1.5, cex.axis=.75, cex.lab=0.75)

Try changing the numbers in the above line of code, redrawing the plot each time, in order to see what all of the options do.

Calculate Least Squares estimates using the formula from class

The “population” model that we want to estimate is:

\(Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\)

Recall that in class we determined that in order to fit a line through the data (and estimate \(\beta_0\) and \(\beta_1\)) in a way that minimizes the sum of all of the squared vertical distances between each data point and the line itself, we can solve a minimization problem. The slope and intercept of such a line is:

\(b_1 = \frac{\sum_i^n(Y_i - \bar{Y})(X_i - \bar{X})}{\sum_i^n(X_i - \bar{X})^2}\)

and

\(b_0 = \bar{Y} - b_1\bar{X}\)

Use RStudio to calculate this now! (Try not to look at the answer below)

Answer:

b1 <- sum((Y - mean(Y)) * (X - mean(X))) / sum((X - mean(X)) ^ 2)
b0 <- mean(Y) - b1 * mean(X)

Note that we need to multiply (on the top) and square (on the bottom) before we sum. What values did you get for the slope and intercept?

Use an R function to estimate by OLS

You can estimate the LS line easily using the lm() function in RStudio. Try it now:

lm(Y ~ X)
## 
## Call:
## lm(formula = Y ~ X)
## 
## Coefficients:
## (Intercept)            X  
##         1.0          0.5

The estimated slope and intercept should be the same as what you obtained in the previous section.

Draw the estimated line through the data

Now, add the estimated line to your plot of data:

abline(lm(Y ~ X), col="blue")

Download the MPC data

Now, use the MPC data from Chapter 4. Download the data using the following command:

ld <- read.csv("http://home.cc.umanitoba.ca/~godwinrt/3040/data/mpc.csv")

Notice that, this time, I gave the data set a short, and easy to type, name: ld. Look at the data briefly in the spreadsheet using your own eyeballs. Then, plot the data:

plot(ld$income, ld$consumption, main="Consumption and Income in the U.K.", xlab="Income", ylab="Consumption")

Least squares again

The “population” model of interest is:

\(consumption = \beta_0 + \beta_1 \times income + \epsilon\),

where \(\beta_1\) is the MPC that we would like an estimate for. Estimate the intercept and slope by using the lm() command again:

lm(ld$consumption ~ ld$income)
## 
## Call:
## lm(formula = ld$consumption ~ ld$income)
## 
## Coefficients:
## (Intercept)    ld$income  
##     176.848        0.869

According to your estimated model, when income increases by 1, how much more do people consume?

Draw the line again

Add the estimated line to your plot of data:

abline(lm(ld$consumption ~ ld$income), col="purple")

Export your scatterplot

In the “plots” window, click “Export” and “Save as Image…”. This way, you can import graphics you create into Word, or another word processor.

Find \(R^2\) for a regression

In order to find the \(R^2\) for an OLS regression, use the summary() command:

summary(lm(ld$consumption ~ ld$income))
## 
## Call:
## lm(formula = ld$consumption ~ ld$income)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1804.00  -455.08   -57.85   388.88  2439.82 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.768e+02  2.584e+02   0.684    0.497    
## ld$income   8.690e-01  7.497e-03 115.911   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 905.3 on 56 degrees of freedom
## Multiple R-squared:  0.9958, Adjusted R-squared:  0.9958 
## F-statistic: 1.344e+04 on 1 and 56 DF,  p-value: < 2.2e-16

The \(R^2\) is 0.9958. This is very high!

Save your script file

Make sure the top-left window (your R script) is active (click on it), then click “File”, “Save As…”, and name your script file.

Assignment 2

Assignment 2 can be found here.