prcomp                 package:stats                 R Documentation

_P_r_i_n_c_i_p_a_l _C_o_m_p_o_n_e_n_t_s _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Performs a principal components analysis on the given data matrix
     and returns the results as an object of class 'prcomp'.

_U_s_a_g_e:

     prcomp(x, ...)

     ## S3 method for class 'formula':
     prcomp(formula, data = NULL, subset, na.action, ...)

     ## Default S3 method:
     prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE,
            tol = NULL, ...)

     ## S3 method for class 'prcomp':
     predict(object, newdata, ...)

_A_r_g_u_m_e_n_t_s:

 formula: a formula with no response variable, referring only to
          numeric variables.

    data: an optional data frame (or similar: see 'model.frame')
          containing the variables in the formula 'formula'.  By
          default the variables are taken from 'environment(formula)'.

  subset: an optional vector used to select rows (observations) of the
          data matrix 'x'.

na.action: a function which indicates what should happen when the data
          contain 'NA's.  The default is set by the 'na.action' setting
          of 'options', and is 'na.fail' if that is unset. The
          'factory-fresh' default is 'na.omit'.

     ...: arguments passed to or from other methods. If 'x' is a
          formula one might specify 'scale.' or 'tol'.

       x: a numeric or complex matrix (or data frame) which provides
          the data for the principal components analysis.

    retx: a logical value indicating whether the rotated variables
          should be returned.

  center: a logical value indicating whether the variables should be
          shifted to be zero centered. Alternately, a vector of length
          equal the number of columns of 'x' can be supplied. The value
          is passed to 'scale'.

  scale.: a logical value indicating whether the variables should be
          scaled to have unit variance before the analysis takes place.
          The default is 'FALSE' for consistency with S, but in general
          scaling is advisable.  Alternatively, a vector of length
          equal the number of columns of 'x' can be supplied.  The
          value is passed to 'scale'.

     tol: a value indicating the magnitude below which components
          should be omitted. (Components are omitted if their standard
          deviations are less than or equal to 'tol' times the standard
          deviation of the first component.) With the default null
          setting, no components are omitted.  Other settings for tol
          could be 'tol = 0' or 'tol = sqrt(.Machine$double.eps)',
          which would omit essentially constant components.

  object: Object of class inheriting from '"prcomp"'

 newdata: An optional data frame or matrix in which to look for
          variables with which to predict.  If omitted, the scores are
          used. If the original fit used a formula or a data frame or a
          matrix with column names, 'newdata' must contain columns with
          the same names. Otherwise it must contain the same number of
          columns, to be used in the same order. 

_D_e_t_a_i_l_s:

     The calculation is done by a singular value decomposition of the
     (centered and possibly scaled) data matrix, not by using 'eigen'
     on the covariance matrix.  This is generally the preferred method
     for numerical accuracy.  The 'print' method for these objects
     prints the results in a nice format and the 'plot' method produces
     a scree plot.

     Note that 'scale = TRUE' cannot be used if there are zero or
     constant (for 'center = TRUE') variables.

_V_a_l_u_e:

     'prcomp' returns a list with class '"prcomp"' containing the
     following components: 

    sdev: the standard deviations of the principal components (i.e.,
          the square roots of the eigenvalues of the
          covariance/correlation matrix, though the calculation is
          actually done with the singular values of the data matrix).

rotation: the matrix of variable loadings (i.e., a matrix whose columns
          contain the eigenvectors).  The function 'princomp' returns
          this in the element 'loadings'.

       x: if 'retx' is true the value of the rotated data (the centred
          (and scaled if requested) data multiplied by the 'rotation'
          matrix) is returned.  Hence, 'cov(x)' is the diagonal matrix
          'diag(sdev^2)'.  For the formula method, 'napredict()' is
          applied to handle the treatment of values omitted by the
          'na.action'.

center, scale: the centering and scaling used, or 'FALSE'.

_N_o_t_e:

     The signs of the columns of the rotation matrix are arbitrary, and
     so may differ between different programs for PCA, and even between
     different builds of R.

_R_e_f_e_r_e_n_c_e_s:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_. Wadsworth & Brooks/Cole.

     Mardia, K. V., J. T. Kent, and J. M. Bibby (1979) _Multivariate
     Analysis_, London: Academic Press.

     Venables, W. N. and B. D. Ripley (2002) _Modern Applied Statistics
     with S_, Springer-Verlag.

_S_e_e _A_l_s_o:

     'biplot.prcomp', 'screeplot', 'princomp', 'cor', 'cov', 'svd',
     'eigen'.

_E_x_a_m_p_l_e_s:

     require(graphics)

     ## the variances of the variables in the
     ## USArrests data vary by orders of magnitude, so scaling is appropriate
     prcomp(USArrests)  # inappropriate
     prcomp(USArrests, scale = TRUE)
     prcomp(~ Murder + Assault + Rape, data = USArrests, scale = TRUE)
     plot(prcomp(USArrests))
     summary(prcomp(USArrests, scale = TRUE))
     biplot(prcomp(USArrests, scale = TRUE))