princomp                package:stats                R Documentation

_P_r_i_n_c_i_p_a_l _C_o_m_p_o_n_e_n_t_s _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     'princomp' performs a principal components analysis on the given
     numeric data matrix and returns the results as an object of class
     'princomp'.

_U_s_a_g_e:

     princomp(x, ...)

     ## S3 method for class 'formula':
     princomp(formula, data = NULL, subset, na.action, ...)

     ## Default S3 method:
     princomp(x, cor = FALSE, scores = TRUE, covmat = NULL,
              subset = rep(TRUE, nrow(as.matrix(x))), ...)

     ## S3 method for class 'princomp':
     predict(object, newdata, ...)

_A_r_g_u_m_e_n_t_s:

 formula: a formula with no response variable, referring only to
          numeric variables.

    data: an optional data frame (or similar: see 'model.frame')
          containing the variables in the formula 'formula'.  By
          default the variables are taken from 'environment(formula)'.

  subset: an optional vector used to select rows (observations) of the
          data matrix 'x'.

na.action: a function which indicates what should happen when the data
          contain 'NA's.  The default is set by the 'na.action' setting
          of 'options', and is 'na.fail' if that is unset. The
          'factory-fresh' default is 'na.omit'.

       x: a numeric matrix or data frame which provides the data for
          the principal components analysis.

     cor: a logical value indicating whether the calculation should use
          the correlation matrix or the covariance matrix.  (The
          correlation matrix can only be used if there are no constant
          variables.)

  scores: a logical value indicating whether the score on each
          principal component should be calculated.

  covmat: a covariance matrix, or a covariance list as returned by
          'cov.wt' (and 'cov.mve' or 'cov.mcd' from package 'MASS'). If
          supplied, this is used rather than the covariance matrix of
          'x'.

     ...: arguments passed to or from other methods. If 'x' is a
          formula one might specify 'cor' or 'scores'.

  object: Object of class inheriting from '"princomp"'

 newdata: An optional data frame or matrix in which to look for
          variables with which to predict.  If omitted, the scores are
          used. If the original fit used a formula or a data frame or a
          matrix with column names, 'newdata' must contain columns with
          the same names. Otherwise it must contain the same number of
          columns, to be used in the same order. 

_D_e_t_a_i_l_s:

     'princomp' is a generic function with '"formula"' and '"default"'
     methods.

     The calculation is done using 'eigen' on the correlation or
     covariance matrix, as determined by 'cor'.  This is done for
     compatibility with the S-PLUS result.  A preferred method of
     calculation is to use 'svd' on 'x', as is done in 'prcomp'.

     Note that the default calculation uses divisor 'N' for the
     covariance matrix.

     The 'print' method for these objects prints the results in a nice
     format and the 'plot' method produces a scree plot ('screeplot'). 
     There is also a 'biplot' method.

     If 'x' is a formula then the standard NA-handling is applied to
     the scores (if requested): see 'napredict'.

     'princomp' only handles so-called R-mode PCA, that is feature
     extraction of variables.  If a data matrix is supplied (possibly
     via a formula) it is required that there are at least as many
     units as variables.  For Q-mode PCA use 'prcomp'.

_V_a_l_u_e:

     'princomp' returns a list with class '"princomp"' containing the
     following components: 

    sdev: the standard deviations of the principal components.

loadings: the matrix of variable loadings (i.e., a matrix whose columns
          contain the eigenvectors).  This is of class '"loadings"':
          see 'loadings' for its 'print' method.

  center: the means that were subtracted.

   scale: the scalings applied to each variable.

   n.obs: the number of observations.

  scores: if 'scores = TRUE', the scores of the supplied data on the
          principal components.  These are non-null only if 'x' was
          supplied, and if 'covmat' was also supplied if it was a
          covariance list.  For the formula method, 'napredict()' is
          applied to handle the treatment of values omitted by the
          'na.action'.

    call: the matched call.

na.action: If relevant.

_N_o_t_e:

     The signs of the columns of the loadings and scores are arbitrary,
     and so may differ between different programs for PCA, and even
     between different builds of R.

_R_e_f_e_r_e_n_c_e_s:

     Mardia, K. V., J. T. Kent and J. M. Bibby (1979). _Multivariate
     Analysis_, London: Academic Press.

     Venables, W. N. and B. D. Ripley (2002). _Modern Applied
     Statistics with S_, Springer-Verlag.

_S_e_e _A_l_s_o:

     'summary.princomp', 'screeplot', 'biplot.princomp', 'prcomp',
     'cor', 'cov', 'eigen'.

_E_x_a_m_p_l_e_s:

     require(graphics)

     ## The variances of the variables in the
     ## USArrests data vary by orders of magnitude, so scaling is appropriate
     (pc.cr <- princomp(USArrests))  # inappropriate
     princomp(USArrests, cor = TRUE) # =^= prcomp(USArrests, scale=TRUE)
     ## Similar, but different:
     ## The standard deviations differ by a factor of sqrt(49/50)

     summary(pc.cr <- princomp(USArrests, cor = TRUE))
     loadings(pc.cr)  ## note that blank entries are small but not zero
     plot(pc.cr) # shows a screeplot.
     biplot(pc.cr)

     ## Formula interface
     princomp(~ ., data = USArrests, cor = TRUE)
     # NA-handling
     USArrests[1, 2] <- NA
     pc.cr <- princomp(~ Murder + Assault + UrbanPop,
                       data = USArrests, na.action=na.exclude, cor = TRUE)
     pc.cr$scores