tapply package:base R Documentation _A_p_p_l_y _a _F_u_n_c_t_i_o_n _O_v_e_r _a "_R_a_g_g_e_d" _A_r_r_a_y _D_e_s_c_r_i_p_t_i_o_n: Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors. _U_s_a_g_e: tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE) _A_r_g_u_m_e_n_t_s: X: an atomic object, typically a vector. INDEX: list of factors, each of same length as 'X'. The elements are coerced to factors by 'as.factor'. FUN: the function to be applied, or 'NULL'. In the case of functions like '+', '%*%', etc., the function name must be backquoted or quoted. If 'FUN' is 'NULL', tapply returns a vector which can be used to subscript the multi-way array 'tapply' normally produces. ...: optional arguments to 'FUN': the Note section. simplify: If 'FALSE', 'tapply' always returns an array of mode '"list"'. If 'TRUE' (the default), then if 'FUN' always returns a scalar, 'tapply' returns an array with the mode of the scalar. _V_a_l_u_e: If 'FUN' is not 'NULL', it is passed to 'match.fun', and hence it can be a function or a symbol or character string naming a function. When 'FUN' is present, 'tapply' calls 'FUN' for each cell that has any data in it. If 'FUN' returns a single atomic value for each such cell (e.g., functions 'mean' or 'var') and when 'simplify' is 'TRUE', 'tapply' returns a multi-way array containing the values, and 'NA' for the empty cells. The array has the same number of dimensions as 'INDEX' has components; the number of levels in a dimension is the number of levels ('nlevels()') in the corresponding component of 'INDEX'. Note that if the return value has a class (e.g. an object of class '"Date"') the class is discarded. Note that contrary to S, 'simplify = TRUE' always returns an array, possibly 1-dimensional. If 'FUN' does not return a single atomic value, 'tapply' returns an array of mode 'list' whose components are the values of the individual calls to 'FUN', i.e., the result is a list with a 'dim' attribute. When there is an array answer, its 'dimnames' are named by the names of 'INDEX' and are based on the levels of the grouping factors (possibly after coercion). For a list result, the elements corresponding to empty cells are 'NULL'. _N_o_t_e: Optional arguments to 'FUN' supplied by the '...' argument are not divided into cells. It is therefore inappropriate for 'FUN' to expect additional arguments with the same length as 'X'. _R_e_f_e_r_e_n_c_e_s: Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S Language_. Wadsworth & Brooks/Cole. _S_e_e _A_l_s_o: the convenience functions 'by' and 'aggregate' (using 'tapply'); 'apply', 'lapply' with its versions 'sapply' and 'mapply'. _E_x_a_m_p_l_e_s: require(stats) groups <- as.factor(rbinom(32, n = 5, prob = 0.4)) tapply(groups, groups, length) #- is almost the same as table(groups) ## contingency table from data.frame : array with named dimnames tapply(warpbreaks$breaks, warpbreaks[,-1], sum) tapply(warpbreaks$breaks, warpbreaks[, 3, drop = FALSE], sum) n <- 17; fac <- factor(rep(1:3, length = n), levels = 1:5) table(fac) tapply(1:n, fac, sum) tapply(1:n, fac, sum, simplify = FALSE) tapply(1:n, fac, range) tapply(1:n, fac, quantile) ## example of ... argument: find quarterly means tapply(presidents, cycle(presidents), mean, na.rm = TRUE) ind <- list(c(1, 2, 2), c("A", "A", "B")) table(ind) tapply(1:3, ind) #-> the split vector tapply(1:3, ind, sum)