In this lab you will:
Click on the home button, and scroll down until you find the RStudio directory. Click on RStudio.
Click on “File”, “New File”, “R Script”.
In class, I used a simple data set to illustrate Ordinary Least Squares (OLS). The data is \(Y = \lbrace 1,4,5,4 \rbrace \ , \ X = \lbrace 2,4,6,8 \rbrace\). This is a small data set that can be manually inserted into RStudio using the `combine'' function
c()`. Enter the variables now using the code:
Y <- c(1,4,5,4)
X <- c(2,4,6,8)
You should see the \(Y\) and \(X\) variables pop up in the top right of your screen. Plot the data using the command:
plot(X, Y)
To get the plot to look more like it does in the textbook, you can set some additional options for the plot:
plot(X, Y, xlim=c(0,10), ylim=c(0,10), col="violetred4", cex=1.5, cex.axis=.75, cex.lab=0.75)
Try changing the numbers in the above line of code, redrawing the plot each time, in order to see what all of the options do.
The “population” model that we want to estimate is:
\(Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\)
Recall that in class we determined that in order to fit a line through the data (and estimate \(\beta_0\) and \(\beta_1\)) in a way that minimizes the sum of all of the squared vertical distances between each data point and the line itself, we can solve a minimization problem. The slope and intercept of such a line is:
\(b_1 = \frac{\sum_i^n(Y_i - \bar{Y})(X_i - \bar{X})}{\sum_i^n(X_i - \bar{X})^2}\)
and
\(b_0 = \bar{Y} - b_1\bar{X}\)
Use RStudio to calculate this now! (Try not to look at the answer below)
Answer:
b1 <- sum((Y - mean(Y)) * (X - mean(X))) / sum((X - mean(X)) ^ 2)
b0 <- mean(Y) - b1 * mean(X)
Note that we need to multiply (on the top) and square (on the bottom) before we sum. What values did you get for the slope and intercept?
You can estimate the LS line easily using the lm()
function in RStudio. Try it now:
lm(Y ~ X)
##
## Call:
## lm(formula = Y ~ X)
##
## Coefficients:
## (Intercept) X
## 1.0 0.5
The estimated slope and intercept should be the same as what you obtained in the previous section.
Now, add the estimated line to your plot of data:
abline(lm(Y ~ X), col="blue")
Now, use the MPC data from Chapter 4. Download the data using the following command:
ld <- read.csv("http://home.cc.umanitoba.ca/~godwinrt/3040/data/mpc.csv")
Notice that, this time, I gave the data set a short, and easy to type, name: ld
. Look at the data briefly in the spreadsheet using your own eyeballs. Then, plot the data:
plot(ld$income, ld$consumption, main="Consumption and Income in the U.K.", xlab="Income", ylab="Consumption")
The “population” model of interest is:
\(consumption = \beta_0 + \beta_1 \times income + \epsilon\),
where \(\beta_1\) is the MPC that we would like an estimate for. Estimate the intercept and slope by using the lm()
command again:
lm(ld$consumption ~ ld$income)
##
## Call:
## lm(formula = ld$consumption ~ ld$income)
##
## Coefficients:
## (Intercept) ld$income
## 176.848 0.869
According to your estimated model, when income increases by 1, how much more do people consume?
Add the estimated line to your plot of data:
abline(lm(ld$consumption ~ ld$income), col="purple")
In the “plots” window, click “Export” and “Save as Image…”. This way, you can import graphics you create into Word, or another word processor.
In order to find the \(R^2\) for an OLS regression, use the summary()
command:
summary(lm(ld$consumption ~ ld$income))
##
## Call:
## lm(formula = ld$consumption ~ ld$income)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1804.00 -455.08 -57.85 388.88 2439.82
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.768e+02 2.584e+02 0.684 0.497
## ld$income 8.690e-01 7.497e-03 115.911 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 905.3 on 56 degrees of freedom
## Multiple R-squared: 0.9958, Adjusted R-squared: 0.9958
## F-statistic: 1.344e+04 on 1 and 56 DF, p-value: < 2.2e-16
The \(R^2\) is 0.9958. This is very high!
Make sure the top-left window (your R script) is active (click on it), then click “File”, “Save As…”, and name your script file.