boxplot.stats           package:grDevices           R Documentation

_B_o_x _P_l_o_t _S_t_a_t_i_s_t_i_c_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function is typically called by another function to gather
     the statistics necessary for producing box plots, but may be
     invoked separately.

_U_s_a_g_e:

     boxplot.stats(x, coef = 1.5, do.conf = TRUE, do.out = TRUE)

_A_r_g_u_m_e_n_t_s:

       x: a numeric vector for which the boxplot will be constructed
          ('NA's and 'NaN's are allowed and omitted).

    coef: this determines how far the plot 'whiskers' extend out from
          the box.  If 'coef' is positive, the whiskers extend to the
          most extreme data point which is no more than 'coef' times
          the length of the box away from the box. A value of zero
          causes the whiskers to extend to the data extremes (and no
          outliers be returned).

do.conf,do.out: logicals; if 'FALSE', the 'conf' or 'out' component
          respectively will be empty in the result.

_D_e_t_a_i_l_s:

     The two 'hinges' are versions of the first and third quartile,
     i.e., close to 'quantile(x, c(1,3)/4)'.  The hinges equal the
     quartiles for odd n (where 'n <- length(x)') and differ for even
     n.  Whereas the quartiles only equal observations for 'n %% 4 ==
     1' (n = 1 mod 4), the hinges do so _additionally_ for 'n %% 4 ==
     2' (n = 2 mod 4), and are in the middle of two observations
     otherwise.

     The notches (if requested) extend to '+/-1.58 IQR/sqrt(n)'. This
     seems to be based on the same calculations as the formula with
     1.57 in Chambers _et al._ (1983, p. 62), given in McGill _et al._
     (1978, p. 16).  They are based on asymptotic normality of the
     median and roughly equal sample sizes for the two medians being
     compared, and are said to be rather insensitive to the underlying
     distributions of the samples.  The idea appears to be to give
     roughly a 95% confidence interval for the difference in two
     medians.

_V_a_l_u_e:

     List with named components as follows: 

   stats: a vector of length 5, containing the extreme of the lower
          whisker, the lower 'hinge', the median, the upper 'hinge' and
          the extreme of the upper whisker.

       n: the number of non-'NA' observations in the sample.

    conf: the lower and upper extremes of the 'notch' ('if(do.conf)').
          See the details.

     out: the values of any data points which lie beyond the extremes
          of the whiskers ('if(do.out)').


     Note that '$stats' and '$conf' are sorted in _in_creasing order,
     unlike S, and that '$n' and '$out' include any '+- Inf' values.

_R_e_f_e_r_e_n_c_e_s:

     Tukey, J. W. (1977) _Exploratory Data Analysis._ Section 2C.

     McGill, R., Tukey, J. W. and Larsen, W. A. (1978) Variations of
     box plots. _The American Statistician_ *32*, 12-16.

     Velleman, P. F. and Hoaglin, D. C. (1981) _Applications, Basics
     and Computing of Exploratory Data Analysis._  Duxbury Press.

     Emerson, J. D and Strenio, J. (1983). Boxplots and batch
     comparison. Chapter 3 of _Understanding Robust and Exploratory
     Data Analysis_, eds. D. C. Hoaglin, F. Mosteller and J. W. Tukey. 
     Wiley.

     Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A.
     (1983) _Graphical Methods for Data Analysis._  Wadsworth &
     Brooks/Cole.

_S_e_e _A_l_s_o:

     'fivenum', 'boxplot', 'bxp'.

_E_x_a_m_p_l_e_s:

     require(stats)
     x <- c(1:100, 1000)
     (b1 <- boxplot.stats(x))
     (b2 <- boxplot.stats(x, do.conf=FALSE, do.out=FALSE))
     stopifnot(b1 $ stats == b2 $ stats) # do.out=F is still robust
     boxplot.stats(x, coef = 3, do.conf=FALSE)
     ## no outlier treatment:
     boxplot.stats(x, coef = 0)

     boxplot.stats(c(x, NA)) # slight change : n is 101
     (r <- boxplot.stats(c(x, -1:1/0)))
     stopifnot(r$out == c(1000, -Inf, Inf))