findInterval              package:base              R Documentation

_F_i_n_d _I_n_t_e_r_v_a_l _N_u_m_b_e_r_s _o_r _I_n_d_i_c_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Find the indices of 'x' in 'vec', where 'vec' must be sorted
     (non-decreasingly); i.e., if 'i <- findInterval(x,v)', we have
     v[i[j]] <= x[j] < v[i[j] + 1] where v[0] := - Inf, v[N+1] := +
     Inf, and 'N <- length(vec)'. At the two boundaries, the returned
     index may differ by 1, depending on the optional arguments
     'rightmost.closed' and 'all.inside'.

_U_s_a_g_e:

     findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE)

_A_r_g_u_m_e_n_t_s:

       x: numeric.

     vec: numeric, sorted (weakly) increasingly, of length 'N', say.

rightmost.closed: logical; if true, the rightmost interval, 'vec[N-1]
          .. vec[N]' is treated as _closed_, see below.

all.inside: logical; if true, the returned indices are coerced into
          {1,...,N-1}, i.e., 0 is mapped to 1 and N to N-1.

_D_e_t_a_i_l_s:

     The function 'findInterval' finds the index of one vector 'x' in
     another, 'vec', where the latter must be non-decreasing.  Where
     this is trivial, equivalent to 'apply( outer(x, vec, ">="), 1,
     sum)', as a matter of fact, the internal algorithm uses interval
     search ensuring O(n * log(N)) complexity where 'n <- length(x)'
     (and 'N <- length(vec)').  For (almost) sorted 'x', it will be
     even faster, basically O(n).

     This is the same computation as for the empirical distribution
     function, and indeed, 'findInterval(t, sort(X))' is _identical_ to
     n * Fn(t; X[1],..,X[n]) where Fn is the empirical distribution
     function of X[1],..,X[n].

     When 'rightmost.closed = TRUE', the result for 'x[j] = vec[N]' ( =
     max(vec)), is 'N - 1' as for all other values in the last
     interval.

_V_a_l_u_e:

     vector of length 'length(x)' with values in '0:N' (and 'NA') where
     'N <- length(vec)', or values coerced to '1:(N-1)' if and only if
     'all.inside = TRUE' (equivalently coercing all x values _inside_
     the intervals).  Note that 'NA's are propagated from 'x', and
     'Inf' values are allowed in both 'x' and 'vec'.

_A_u_t_h_o_r(_s):

     Martin Maechler

_S_e_e _A_l_s_o:

     'approx(*, method = "constant")' which is a generalization of
     'findInterval()', 'ecdf' for computing the empirical distribution
     function which is (up to a factor of n) also basically the same as
     findInterval(.).

_E_x_a_m_p_l_e_s:

     N <- 100
     X <- sort(round(stats::rt(N, df=2), 2))
     tt <- c(-100, seq(-2,2, len=201), +100)
     it <- findInterval(tt, X)
     tt[it < 1 | it >= N] # only first and last are outside range(X)