prcomp package:stats R Documentation _P_r_i_n_c_i_p_a_l _C_o_m_p_o_n_e_n_t_s _A_n_a_l_y_s_i_s _D_e_s_c_r_i_p_t_i_o_n: Performs a principal components analysis on the given data matrix and returns the results as an object of class 'prcomp'. _U_s_a_g_e: prcomp(x, ...) ## S3 method for class 'formula': prcomp(formula, data = NULL, subset, na.action, ...) ## Default S3 method: prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE, tol = NULL, ...) ## S3 method for class 'prcomp': predict(object, newdata, ...) _A_r_g_u_m_e_n_t_s: formula: a formula with no response variable, referring only to numeric variables. data: an optional data frame (or similar: see 'model.frame') containing the variables in the formula 'formula'. By default the variables are taken from 'environment(formula)'. subset: an optional vector used to select rows (observations) of the data matrix 'x'. na.action: a function which indicates what should happen when the data contain 'NA's. The default is set by the 'na.action' setting of 'options', and is 'na.fail' if that is unset. The 'factory-fresh' default is 'na.omit'. ...: arguments passed to or from other methods. If 'x' is a formula one might specify 'scale.' or 'tol'. x: a numeric or complex matrix (or data frame) which provides the data for the principal components analysis. retx: a logical value indicating whether the rotated variables should be returned. center: a logical value indicating whether the variables should be shifted to be zero centered. Alternately, a vector of length equal the number of columns of 'x' can be supplied. The value is passed to 'scale'. scale.: a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is 'FALSE' for consistency with S, but in general scaling is advisable. Alternatively, a vector of length equal the number of columns of 'x' can be supplied. The value is passed to 'scale'. tol: a value indicating the magnitude below which components should be omitted. (Components are omitted if their standard deviations are less than or equal to 'tol' times the standard deviation of the first component.) With the default null setting, no components are omitted. Other settings for tol could be 'tol = 0' or 'tol = sqrt(.Machine$double.eps)', which would omit essentially constant components. object: Object of class inheriting from '"prcomp"' newdata: An optional data frame or matrix in which to look for variables with which to predict. If omitted, the scores are used. If the original fit used a formula or a data frame or a matrix with column names, 'newdata' must contain columns with the same names. Otherwise it must contain the same number of columns, to be used in the same order. _D_e_t_a_i_l_s: The calculation is done by a singular value decomposition of the (centered and possibly scaled) data matrix, not by using 'eigen' on the covariance matrix. This is generally the preferred method for numerical accuracy. The 'print' method for these objects prints the results in a nice format and the 'plot' method produces a scree plot. Note that 'scale = TRUE' cannot be used if there are zero or constant (for 'center = TRUE') variables. _V_a_l_u_e: 'prcomp' returns a list with class '"prcomp"' containing the following components: sdev: the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix). rotation: the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors). The function 'princomp' returns this in the element 'loadings'. x: if 'retx' is true the value of the rotated data (the centred (and scaled if requested) data multiplied by the 'rotation' matrix) is returned. Hence, 'cov(x)' is the diagonal matrix 'diag(sdev^2)'. For the formula method, 'napredict()' is applied to handle the treatment of values omitted by the 'na.action'. center, scale: the centering and scaling used, or 'FALSE'. _N_o_t_e: The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R. _R_e_f_e_r_e_n_c_e_s: Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S Language_. Wadsworth & Brooks/Cole. Mardia, K. V., J. T. Kent, and J. M. Bibby (1979) _Multivariate Analysis_, London: Academic Press. Venables, W. N. and B. D. Ripley (2002) _Modern Applied Statistics with S_, Springer-Verlag. _S_e_e _A_l_s_o: 'biplot.prcomp', 'screeplot', 'princomp', 'cor', 'cov', 'svd', 'eigen'. _E_x_a_m_p_l_e_s: require(graphics) ## the variances of the variables in the ## USArrests data vary by orders of magnitude, so scaling is appropriate prcomp(USArrests) # inappropriate prcomp(USArrests, scale = TRUE) prcomp(~ Murder + Assault + Rape, data = USArrests, scale = TRUE) plot(prcomp(USArrests)) summary(prcomp(USArrests, scale = TRUE)) biplot(prcomp(USArrests, scale = TRUE))