anscombe package:datasets R Documentation _A_n_s_c_o_m_b_e'_s _Q_u_a_r_t_e_t _o_f "_I_d_e_n_t_i_c_a_l" _S_i_m_p_l_e _L_i_n_e_a_r _R_e_g_r_e_s_s_i_o_n_s _D_e_s_c_r_i_p_t_i_o_n: Four x-y datasets which have the same traditional statistical properties (mean, variance, correlation, regression line, etc.), yet are quite different. _U_s_a_g_e: anscombe _F_o_r_m_a_t: A data frame with 11 observations on 8 variables. x1 == x2 == x3 the integers 4:14, specially arranged x4 values 8 and 19 y1, y2, y3, y4 numbers in (3, 12.5) with mean 7.5 and sdev 2.03 _S_o_u_r_c_e: Tufte, Edward R. (1989) _The Visual Display of Quantitative Information_, 13-14. Graphics Press. _R_e_f_e_r_e_n_c_e_s: Anscombe, Francis J. (1973) Graphs in statistical analysis. _American Statistician_, *27*, 17-21. _E_x_a_m_p_l_e_s: require(stats); require(graphics) summary(anscombe) ##-- now some "magic" to do the 4 regressions in a loop: ff <- y ~ x for(i in 1:4) { ff[2:3] <- lapply(paste(c("y","x"), i, sep=""), as.name) ## or ff[[2]] <- as.name(paste("y", i, sep="")) ## ff[[3]] <- as.name(paste("x", i, sep="")) assign(paste("lm.",i,sep=""), lmi <- lm(ff, data= anscombe)) print(anova(lmi)) } ## See how close they are (numerically!) sapply(objects(pattern="lm\\.[1-4]$"), function(n) coef(get(n))) lapply(objects(pattern="lm\\.[1-4]$"), function(n) coef(summary(get(n)))) ## Now, do what you should have done in the first place: PLOTS op <- par(mfrow=c(2,2), mar=.1+c(4,4,1,1), oma= c(0,0,2,0)) for(i in 1:4) { ff[2:3] <- lapply(paste(c("y","x"), i, sep=""), as.name) plot(ff, data =anscombe, col="red", pch=21, bg = "orange", cex = 1.2, xlim=c(3,19), ylim=c(3,13)) abline(get(paste("lm.",i,sep="")), col="blue") } mtext("Anscombe's 4 Regression data sets", outer = TRUE, cex=1.5) par(op)