Encoding package:base R Documentation(latin1) _R_e_a_d _o_r _S_e_t _t_h_e _D_e_c_l_a_r_e_d _E_n_c_o_d_i_n_g_s _f_o_r _a _C_h_a_r_a_c_t_e_r _V_e_c_t_o_r _D_e_s_c_r_i_p_t_i_o_n: Read or set the declared encodings for a character vector. _U_s_a_g_e: Encoding(x) Encoding(x) <- value _A_r_g_u_m_e_n_t_s: x: A character vector. value: A character vector of positive length. _D_e_t_a_i_l_s: Character strings in R can be declared to be in '"latin1"' or '"UTF-8"'. These declarations can be read by 'Encoding', which will return a character vector of values '"latin1"', '"UTF-8"' or '"unknown"', or set, when 'value' is recycled as needed and other values are silently treated as '"unknown"'. ASCII strings will never be marked with a declared encoding, since their representation is the same in all encodings. There are other ways for character strings to acquire a declared encoding apart from explicitly setting it (and these have changed as R has evolved). Functions 'scan', 'read.table', 'readLines', and 'parse' have an 'encoding' argument that is used to declare encodings, 'iconv' declares encodings from its 'from' argument, and console input in suitable locales is also declared. 'intToUtf8' declares its output as '"UTF-8"', and output text connections are marked if running in a suitable locale. Under some circumstances (see its help page) 'source(encoding=)' will mark encodings of character strings it outputs. Most character manipulation functions will set the encoding on output strings if it was declared on the corresponding input. These include 'chartr', 'strsplit', 'strtrim', 'tolower' and 'toupper' as well as 'sub(useBytes = FALSE)' and 'gsub(useBytes = FALSE)'. Note that such functions do not _preserve_ the encoding, but if they know the input encoding and that the string has been successfully re-encoded to the current encoding, they mark the output with the latter (if it is '"latin1"' or '"UTF-8"'). 'substr' does preserve the encoding, and 'chartr', 'tolower' and 'toupper' preserve UTF-8 encoding on systems with Unicode wide characters. With their 'fixed' and 'perl' options, 'strsplit', 'sub' and 'gsub' will give a marked UTF-8 result if any of the inputs are UTF-8. 'paste' and 'sprintf' return a UTF-8 marked element if any of the inputs to that element are UTF-8. _V_a_l_u_e: A character vector. _E_x_a_m_p_l_e_s: ## x is intended to be in latin1 x <- "fa\xE7ile" Encoding(x) Encoding(x) <- "latin1" x xx <- iconv(x, "latin1", "UTF-8") Encoding(c(x, xx)) c(x, xx)