iconv package:base R Documentation _C_o_n_v_e_r_t _C_h_a_r_a_c_t_e_r _V_e_c_t_o_r _b_e_t_w_e_e_n _E_n_c_o_d_i_n_g_s _D_e_s_c_r_i_p_t_i_o_n: This uses system facilities to convert a character vector between encodings: the 'i' stands for 'internationalization'. _U_s_a_g_e: iconv(x, from ="", to = "", sub = NA) iconvlist() _A_r_g_u_m_e_n_t_s: x: A character vector, or an object to be converted to a character vector by 'as.character'. from: A character string describing the current encoding. to: A character string describing the target encoding. sub: character string. If not 'NA' it is used to replace any non-convertible bytes in the input. (This would normally be a single character, but can be more.) If '"byte"', the indication is '""' with the hex code of the byte. _D_e_t_a_i_l_s: The names of encodings and which ones are available (and indeed, if any are) is platform-dependent. On all systems that support 'iconv' you can use '""' for the encoding of the current locale, as well as '"latin1"' and '"UTF-8"'. On most systems (including those using 'glibc' or 'libinconv', Mac OS X and Windows) case is ignored when specifying an encoding. On many platforms 'iconvlist' provides an alphabetical list of the supported encodings. On others, the information is on the man page for 'iconv(5)' or elsewhere in the man pages (and beware that the system command 'iconv' may not support the same set of encodings as the C functions R calls). Unfortunately, the names are rarely common across platforms. Elements of 'x' which cannot be converted (perhaps because they are invalid or because they cannot be represented in the target encoding) will be returned as 'NA' unless 'sub' is specified. Most versions of 'iconv' will allow transliteration by appending '//TRANSLIT' to the 'to' encoding: see the examples. Any encoding bits (see 'Encoding') on elements of 'x' are ignored: they will always be translated as if from 'from' even if declared otherwise. '"UTF8"' will be accepted as meaning the (more correct) '"UTF-8"'. _V_a_l_u_e: A character vector of the same length and the same attributes as 'x' (after conversion). The elements of the result have a declared encoding if 'from' is '"latin1"' or '"UTF-8"', or if 'from = ""' and the current locale's encoding is detected as Latin-1 or UTF-8. _N_o_t_e: Not all platforms support these functions, although almost all support 'iconv'. See also 'capabilities("iconv")'. _S_e_e _A_l_s_o: 'localeToCharset', 'file'. _E_x_a_m_p_l_e_s: ## not all systems have iconvlist try(utils::head(iconvlist(), n = 50)) ## Not run: ## convert from Latin-2 to UTF-8: two of the glibc iconv variants. iconv(x, "ISO_8859-2", "UTF-8") iconv(x, "LATIN2", "UTF-8") ## End(Not run) ## Both x below are in latin1 and will only display correctly in a ## locale that can represent and display latin1. x <- "fa\xE7ile" Encoding(x) <- "latin1" x charToRaw(xx <- iconv(x, "latin1", "UTF-8")) xx iconv(x, "latin1", "ASCII") # NA iconv(x, "latin1", "ASCII", "?") # "fa?ile" iconv(x, "latin1", "ASCII", "") # "faile" iconv(x, "latin1", "ASCII", "byte") # "faile" # Extracts from R help files x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher") Encoding(x) <- "latin1" x try(iconv(x, "latin1", "ASCII//TRANSLIT")) # platform-dependent iconv(x, "latin1", "ASCII", sub="byte")