cov.rob package:MASS R Documentation _R_e_s_i_s_t_a_n_t _E_s_t_i_m_a_t_i_o_n _o_f _M_u_l_t_i_v_a_r_i_a_t_e _L_o_c_a_t_i_o_n _a_n_d _S_c_a_t_t_e_r _D_e_s_c_r_i_p_t_i_o_n: Compute a multivariate location and scale estimate with a high breakdown point - this can be thought of as estimating the mean and covariance of the 'good' part of the data. 'cov.mve' and 'cov.mcd' are compatibility wrappers. _U_s_a_g_e: cov.rob(x, cor = FALSE, quantile.used = floor((n + p + 1)/2), method = c("mve", "mcd", "classical"), nsamp = "best", seed) cov.mve(...) cov.mcd(...) _A_r_g_u_m_e_n_t_s: x: a matrix or data frame. cor: should the returned result include a correlation matrix? quantile.used: the minimum number of the data points regarded as 'good' points. method: the method to be used - minimum volume ellipsoid, minimum covariance determinant or classical product-moment. Using 'cov.mve' or 'cov.mcd' forces 'mve' or 'mcd' respectively. nsamp: the number of samples or '"best"' or '"exact"' or '"sample"'. If '"sample"' the number chosen is 'min(5*p, 3000)', taken from Rousseeuw and Hubert (1997). If '"best"' exhaustive enumeration is done up to 5000 samples: if '"exact"' exhaustive enumeration will be attempted however many samples are needed. seed: the seed to be used for random sampling: see 'RNGkind'. The current value of '.Random.seed' will be preserved if it is set. ...: arguments to 'cov.rob' other than 'method'. _D_e_t_a_i_l_s: For method '"mve"', an approximate search is made of a subset of size 'quantile.used' with an enclosing ellipsoid of smallest volume; in method '"mcd"' it is the volume of the Gaussian confidence ellipsoid, equivalently the determinant of the classical covariance matrix, that is minimized. The mean of the subset provides a first estimate of the location, and the rescaled covariance matrix a first estimate of scatter. The Mahalanobis distances of all the points from the location estimate for this covariance matrix are calculated, and those points within the 97.5% point under Gaussian assumptions are declared to be 'good'. The final estimates are the mean and rescaled covariance of the 'good' points. The rescaling is by the appropriate percentile under Gaussian data; in addition the first covariance matrix has an _ad hoc_ finite-sample correction given by Marazzi. For method '"mve"' the search is made over ellipsoids determined by the covariance matrix of 'p' of the data points. For method '"mcd"' an additional improvement step suggested by Rousseeuw and van Driessen (1999) is used, in which once a subset of size 'quantile.used' is selected, an ellipsoid based on its covariance is tested (as this will have no larger a determinant, and may be smaller). _V_a_l_u_e: A list with components center: the final estimate of location. cov: the final estimate of scatter. cor: (only is 'cor = TRUE') the estimate of the correlation matrix. sing: message giving number of singular samples out of total crit: the value of the criterion on log scale. For MCD this is the determinant, and for MVE it is proportional to the volume. best: the subset used. For MVE the best sample, for MCD the best set of size 'quantile.used'. n.obs: total number of observations. _R_e_f_e_r_e_n_c_e_s: P. J. Rousseeuw and A. M. Leroy (1987) _Robust Regression and Outlier Detection._ Wiley. A. Marazzi (1993) _Algorithms, Routines and S Functions for Robust Statistics._ Wadsworth and Brooks/Cole. P. J. Rousseeuw and B. C. van Zomeren (1990) Unmasking multivariate outliers and leverage points, _Journal of the American Statistical Association_, *85*, 633-639. P. J. Rousseeuw and K. van Driessen (1999) A fast algorithm for the minimum covariance determinant estimator. _Technometrics_ *41*, 212-223. P. Rousseeuw and M. Hubert (1997) Recent developments in PROGRESS. In _L1-Statistical Procedures and Related Topics _ ed Y. Dodge, IMS Lecture Notes volume *31*, pp. 201-214. _S_e_e _A_l_s_o: 'lqs' _E_x_a_m_p_l_e_s: set.seed(123) cov.rob(stackloss) cov.rob(stack.x, method = "mcd", nsamp = "exact")