read.spss package:foreign R Documentation
_R_e_a_d _a_n _S_P_S_S _D_a_t_a _F_i_l_e
_D_e_s_c_r_i_p_t_i_o_n:
'read.spss' reads a file stored by the SPSS 'save' or 'export'
commands.
_U_s_a_g_e:
read.spss(file, use.value.labels = TRUE, to.data.frame = FALSE,
max.value.labels = Inf, trim.factor.names = FALSE,
trim_values = TRUE, reencode = NA, use.missings = to.data.frame)
_A_r_g_u_m_e_n_t_s:
file: Character string: the name of the file or URL to read.
use.value.labels: Convert variables with value labels into R factors
with those levels?
to.data.frame: return a data frame?
max.value.labels: Only variables with value labels and at most this
many unique values will be converted to factors if
'use.value.labels = TRUE'.
trim.factor.names: Logical: trim trailing spaces from factor levels?
trim_values: logical: should values and value labels have trailing
spaces ignored when matching for 'use.value.labels = TRUE'?
reencode: logical: should character strings be re-encoded to the
current locale. The default, 'NA', means to do so in a UTF-8
locale, only. Alternatively character, specifying an
encoding to assume.
use.missings: logical: should information on user-defined missing
values be used to set the corresponding values to 'NA'?
_D_e_t_a_i_l_s:
This uses modified code from the PSPP project ( for reading the SPSS formats.
If the filename appears to be a URL (of schemes 'http:', 'ftp:' or
'https:') the URL is first downloaded to a temporary fiie and then
read. ('https:' is only supported on some platforms.)
Occasionally in SPSS value labels will be added to some values of
a continuous variable (eg to distinguish different types of
missing data), and you will not want these variables converted to
factors. By setting 'max.val.labels' you can specify that
variables with a large number of distinct values are not converted
to factors even if they have value labels. In addition, variables
will not be converted to factors if there are non-missing values
that have no value label. The value labels are then returned in
the '"value.labels"' attribute of the variable.
If SPSS variable labels are present, they are returned as the
'"variable.labels"' attribute of the answer.
Fixed length strings (including value labels) are padded on the
right with spaces by SPSS, and so are read that way by R. The
default argument 'trim_values=TRUE' causes trailing spaces to be
ignored when matching to value labels, as examples have been seen
where the strings and the value labels had different amounts of
padding. See the examples for 'sub' for ways to remove trailing
spaces in character data.
URL
provides a list of translations from Windows codepage numbers to
encoding names that 'iconv' is likely to know about and so
suitable values for 'reencode'. Automatic re-encoding is
attempted for apparent codepages of 200 or more in a UTF-8 locale:
some other high-numbered codepages can be re-encoded on most
systems, but the encoding names are platform-dependent (see
'iconvlist'.
_V_a_l_u_e:
A list (or data frame) with one component for each variable in the
saved data set.
If what looks like a Windows codepage was recorded in the SPSS
file, it is attached (as a number) as attribute '"codepage"' to
the result.
There may be attributes '"label.table"' and '"variable.labels"'.
Attribute '"label.table"' is a named list of value labels with one
element per variable, either 'NULL' or a names character vector.
Attribute '"variable.labels"' is a named character vector with
names the short variable names and elements the long names.
If there are user-defined missing values, there will be a
attribute '"Missings"'. This is a named list with one list
element per variable. Each element has an element 'type', a
length-one character vector giving the type of missingness, and
may also have an element 'value' with the values corresponding to
missingness. This is a complex subject (where the R and C source
code for 'read.spss') is the main documentation), but the simplest
cases are types '"one"', '"two"' and '"three"' with a
corresponding number of (real or string) values whose labels can
be found from the '"label.table"' attribute. Other possibilities
are a finite or semi-infinite range, possibly plus a single value.
See also .
_N_o_t_e:
If SPSS value labels are converted to factors the underlying
numerical codes will not in general be the same as the SPSS
numerical values, since the numerical codes in R are always
1,2,3,....
You may see warnings about the file encoding for SPSS 'save'
files: it is possible such files contain non-ASCII character data
which need re-encoding. The most common occurrence is Windows
codepage 1252, a superset of Latin-1. The encoding is recorded
(as an integer) in attribute '"codepage"' of the result if it
looks like a Windows codepage. Automatic re-encoding is done only
in UTF-8 locales: see argument 'reencode'.
_A_u_t_h_o_r(_s):
Saikat DebRoy and the R Core team
_E_x_a_m_p_l_e_s:
## Not run:
read.spss("datafile")
## don't convert value labels to factor levels
read.spss("datafile", use.value.labels = FALSE)
## convert value labels to factors for variables with at most
## ten distinct values.
read.spss("datafile", max.val.labels = 10)
## End(Not run)