agrep package:base R Documentation _A_p_p_r_o_x_i_m_a_t_e _S_t_r_i_n_g _M_a_t_c_h_i_n_g (_F_u_z_z_y _M_a_t_c_h_i_n_g) _D_e_s_c_r_i_p_t_i_o_n: Searches for approximate matches to 'pattern' (the first argument) within the string 'x' (the second argument) using the Levenshtein edit distance. _U_s_a_g_e: agrep(pattern, x, ignore.case = FALSE, value = FALSE, max.distance = 0.1, useBytes = FALSE) _A_r_g_u_m_e_n_t_s: pattern: a non-empty character string to be matched (_not_ a regular expression!). Coerced by 'as.character' to a string if possible. x: character vector where matches are sought. Coerced by 'as.character' to a character vector if possible. ignore.case: if 'FALSE', the pattern matching is _case sensitive_ and if 'TRUE', case is ignored during matching. value: if 'FALSE', a vector containing the (integer) indices of the matches determined is returned and if 'TRUE', a vector containing the matching elements themselves is returned. max.distance: Maximum distance allowed for a match. Expressed either as integer, or as a fraction of the _pattern_ length (will be replaced by the smallest integer not less than the corresponding fraction of the pattern length), or a list with possible components '_a_l_l': maximal (overall) distance '_i_n_s_e_r_t_i_o_n_s': maximum number/fraction of insertions '_d_e_l_e_t_i_o_n_s': maximum number/fraction of deletions '_s_u_b_s_t_i_t_u_t_i_o_n_s': maximum number/fraction of substitutions If 'all' is missing, it is set to 10%, the other components default to 'all'. The component names can be abbreviated. useBytes: logical. in a multibyte locale, should the comparison be character-by-chracter (the default) or byte-by-byte. _D_e_t_a_i_l_s: The Levenshtein edit distance is used as measure of approximateness: it is the total number of insertions, deletions and substitutions required to transform one string into another. The function is a simple interface to the 'apse' library developed by Jarkko Hietaniemi (also used in the Perl String::Approx module), modified to work with multibyte character sets. To save space it only supports the first 65536 characters of UTF-8 (where all the characters for human languages lie). Note that it can be quite slow in UTF-8, and 'useBytes = TRUE' will be much faster. _V_a_l_u_e: Either a vector giving the indices of the elements that yielded a match, or, if 'value' is 'TRUE', the matched elements (after coercion, preserving names but no other attributes). _A_u_t_h_o_r(_s): Original version by David Meyer, based on C code by Jarkko Hietaniemi. _S_e_e _A_l_s_o: 'grep' _E_x_a_m_p_l_e_s: agrep("lasy", "1 lazy 2") agrep("lasy", c(" 1 lazy 2", "1 lasy 2"), max = list(sub = 0)) agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2) agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, value = TRUE) agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, ignore.case = TRUE)