influence.measures           package:stats           R Documentation

_R_e_g_r_e_s_s_i_o_n _D_e_l_e_t_i_o_n _D_i_a_g_n_o_s_t_i_c_s

_D_e_s_c_r_i_p_t_i_o_n:

     This suite of functions can be used to compute some of the
     regression (leave-one-out deletion) diagnostics for linear and
     generalized linear models discussed in Belsley, Kuh and Welsch
     (1980), Cook and Weisberg (1982), etc.

_U_s_a_g_e:

     influence.measures(model)

     rstandard(model, ...)
     ## S3 method for class 'lm':
     rstandard(model, infl = lm.influence(model, do.coef = FALSE),
               sd = sqrt(deviance(model)/df.residual(model)), ...)
     ## S3 method for class 'glm':
     rstandard(model, infl = lm.influence(model, do.coef = FALSE),
               ...)

     rstudent(model, ...)
     ## S3 method for class 'lm':
     rstudent(model, infl = lm.influence(model, do.coef = FALSE),
              res = infl$wt.res, ...)
     ## S3 method for class 'glm':
     rstudent(model, infl = influence(model, do.coef = FALSE), ...)

     dffits(model, infl = , res = )

     dfbeta(model, ...)
     ## S3 method for class 'lm':
     dfbeta(model, infl = lm.influence(model, do.coef = TRUE), ...)

     dfbetas(model, ...)
     ## S3 method for class 'lm':
     dfbetas(model, infl = lm.influence(model, do.coef = TRUE), ...)

     covratio(model, infl = lm.influence(model, do.coef = FALSE),
              res = weighted.residuals(model))

     cooks.distance(model, ...)
     ## S3 method for class 'lm':
     cooks.distance(model, infl = lm.influence(model, do.coef = FALSE),
                    res = weighted.residuals(model),
                    sd = sqrt(deviance(model)/df.residual(model)),
                    hat = infl$hat, ...)
     ## S3 method for class 'glm':
     cooks.distance(model, infl = influence(model, do.coef = FALSE),
                    res = infl$pear.res,
                    dispersion = summary(model)$dispersion,
                    hat = infl$hat, ...)

     hatvalues(model, ...)
     ## S3 method for class 'lm':
     hatvalues(model, infl = lm.influence(model, do.coef = FALSE), ...)

     hat(x, intercept = TRUE)

_A_r_g_u_m_e_n_t_s:

   model: an R object, typically returned by 'lm' or 'glm'.

    infl: influence structure as returned by 'lm.influence' or
          'influence' (the latter only for the 'glm' method of
          'rstudent' and 'cooks.distance').

     res: (possibly weighted) residuals, with proper default.

      sd: standard deviation to use, see default.

dispersion: dispersion (for 'glm' objects) to use, see default.

     hat: hat values H[i,i], see default.

       x: the X or design matrix.

intercept: should an intercept column be prepended to 'x'?

     ...: further arguments passed to or from other methods.

_D_e_t_a_i_l_s:

     The primary high-level function is 'influence.measures' which
     produces a class '"infl"' object tabular display showing the
     DFBETAS for each model variable, DFFITS, covariance ratios, Cook's
     distances and the diagonal elements of the hat matrix.  Cases
     which are influential with respect to any of these measures are
     marked with an asterisk.

     The functions 'dfbetas', 'dffits', 'covratio' and 'cooks.distance'
     provide direct access to the corresponding diagnostic quantities. 
     Functions 'rstandard' and 'rstudent' give the standardized and
     Studentized residuals respectively. (These re-normalize the
     residuals to have unit variance, using an overall and
     leave-one-out measure of the error variance respectively.)

     Values for generalized linear models are approximations, as
     described in Williams (1987) (except that Cook's distances are
     scaled as F rather than as chi-square values).  The approximations
     can be poor when some cases have large influence.

     The optional 'infl', 'res' and 'sd' arguments are there to
     encourage the use of these direct access functions, in situations
     where, e.g., the underlying basic influence measures (from
     'lm.influence' or the generic 'influence') are already available.

     Note that cases with 'weights == 0' are _dropped_ from all these
     functions, but that if a linear model has been fitted with
     'na.action = na.exclude', suitable values are filled in for the
     cases excluded during fitting.

     The function 'hat()' exists mainly for S (version 2)
     compatibility; we recommend using 'hatvalues()' instead.

_N_o_t_e:

     For 'hatvalues', 'dfbeta', and 'dfbetas', the method for linear
     models also works for generalized linear models.

_A_u_t_h_o_r(_s):

     Several R core team members and John Fox, originally in his 'car'
     package.

_R_e_f_e_r_e_n_c_e_s:

     Belsley, D. A., Kuh, E. and Welsch, R. E. (1980) _Regression
     Diagnostics_. New York: Wiley.

     Cook, R. D. and Weisberg, S. (1982) _Residuals and Influence in
     Regression_. London: Chapman and Hall.

     Williams, D. A. (1987) Generalized linear model diagnostics using
     the deviance and single case deletions. _Applied Statistics_ *36*,
     181-191.

     Fox, J. (1997) _Applied Regression, Linear Models, and Related
     Methods_. Sage.

     Fox, J. (2002) _An R and S-Plus Companion to Applied Regression_.
     Sage Publ.; <URL:
     http://www.socsci.mcmaster.ca/jfox/Books/Companion/>.

_S_e_e _A_l_s_o:

     'influence' (containing 'lm.influence').

     'plotmath' for the use of 'hat' in plot annotation.

_E_x_a_m_p_l_e_s:

     require(graphics)

     ## Analysis of the life-cycle savings data
     ## given in Belsley, Kuh and Welsch.
     lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings)

     inflm.SR <- influence.measures(lm.SR)
     which(apply(inflm.SR$is.inf, 1, any))
     # which observations 'are' influential
     summary(inflm.SR) # only these
     inflm.SR          # all
     plot(rstudent(lm.SR) ~ hatvalues(lm.SR)) # recommended by some

     ## The 'infl' argument is not needed, but avoids recomputation:
     rs <- rstandard(lm.SR)
     iflSR <- influence(lm.SR)
     identical(rs, rstandard(lm.SR, infl = iflSR))
     ## to "see" the larger values:
     1000 * round(dfbetas(lm.SR, infl = iflSR), 3)

     ## Huber's data [Atkinson 1985]
     xh <- c(-4:0, 10)
     yh <- c(2.48, .73, -.04, -1.44, -1.32, 0)
     summary(lmH <- lm(yh ~ xh))
     (im <- influence.measures(lmH))
     plot(xh,yh, main = "Huber's data: L.S. line and influential obs.")
     abline(lmH); points(xh[im$is.inf], yh[im$is.inf], pch=20, col=2)

     ## Irwin's data [Williams 1987]
     xi <- 1:5
     yi <- c(0,2,14,19,30) # number of mice responding to does xi
     mi <- rep(40, 5)      # number of mice exposed
     summary(lmI <- glm(cbind(yi, mi -yi) ~ xi, family = binomial))
     signif(cooks.distance(lmI), 3)# ~= Ci in Table 3, p.184
     (imI <- influence.measures(lmI))
     stopifnot(all.equal(imI$infmat[,"cook.d"],
               cooks.distance(lmI)))