findInterval package:base R Documentation _F_i_n_d _I_n_t_e_r_v_a_l _N_u_m_b_e_r_s _o_r _I_n_d_i_c_e_s _D_e_s_c_r_i_p_t_i_o_n: Find the indices of 'x' in 'vec', where 'vec' must be sorted (non-decreasingly); i.e., if 'i <- findInterval(x,v)', we have v[i[j]] <= x[j] < v[i[j] + 1] where v[0] := - Inf, v[N+1] := + Inf, and 'N <- length(vec)'. At the two boundaries, the returned index may differ by 1, depending on the optional arguments 'rightmost.closed' and 'all.inside'. _U_s_a_g_e: findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE) _A_r_g_u_m_e_n_t_s: x: numeric. vec: numeric, sorted (weakly) increasingly, of length 'N', say. rightmost.closed: logical; if true, the rightmost interval, 'vec[N-1] .. vec[N]' is treated as _closed_, see below. all.inside: logical; if true, the returned indices are coerced into {1,...,N-1}, i.e., 0 is mapped to 1 and N to N-1. _D_e_t_a_i_l_s: The function 'findInterval' finds the index of one vector 'x' in another, 'vec', where the latter must be non-decreasing. Where this is trivial, equivalent to 'apply( outer(x, vec, ">="), 1, sum)', as a matter of fact, the internal algorithm uses interval search ensuring O(n * log(N)) complexity where 'n <- length(x)' (and 'N <- length(vec)'). For (almost) sorted 'x', it will be even faster, basically O(n). This is the same computation as for the empirical distribution function, and indeed, 'findInterval(t, sort(X))' is _identical_ to n * Fn(t; X[1],..,X[n]) where Fn is the empirical distribution function of X[1],..,X[n]. When 'rightmost.closed = TRUE', the result for 'x[j] = vec[N]' ( = max(vec)), is 'N - 1' as for all other values in the last interval. _V_a_l_u_e: vector of length 'length(x)' with values in '0:N' (and 'NA') where 'N <- length(vec)', or values coerced to '1:(N-1)' if and only if 'all.inside = TRUE' (equivalently coercing all x values _inside_ the intervals). Note that 'NA's are propagated from 'x', and 'Inf' values are allowed in both 'x' and 'vec'. _A_u_t_h_o_r(_s): Martin Maechler _S_e_e _A_l_s_o: 'approx(*, method = "constant")' which is a generalization of 'findInterval()', 'ecdf' for computing the empirical distribution function which is (up to a factor of n) also basically the same as findInterval(.). _E_x_a_m_p_l_e_s: N <- 100 X <- sort(round(stats::rt(N, df=2), 2)) tt <- c(-100, seq(-2,2, len=201), +100) it <- findInterval(tt, X) tt[it < 1 | it >= N] # only first and last are outside range(X)