read.DIF                package:utils                R Documentation

_D_a_t_a _I_n_p_u_t _f_r_o_m _S_p_r_e_a_d_s_h_e_e_t

_D_e_s_c_r_i_p_t_i_o_n:

     Reads a file in Data Interchange Format (DIF) and creates a data
     frame from it.  DIF is a format for data matrices such as single
     spreadsheets.

_U_s_a_g_e:

     read.DIF(file, header = FALSE,
                dec = ".", row.names, col.names,
                as.is = !stringsAsFactors,
                na.strings = "NA", colClasses = NA, nrows = -1,
                skip = 0, check.names = TRUE,
                blank.lines.skip = TRUE,
                stringsAsFactors = default.stringsAsFactors(),
                transpose = FALSE)

_A_r_g_u_m_e_n_t_s:

    file: the name of the file which the data are to be read from, or a
          connection, or a complete URL. 

  header: a logical value indicating whether the spreadsheet contains
          the names of the variables as its first line.  If missing,
          the value is determined from the file format: 'header' is set
          to 'TRUE' if and only if the first row contains only
          character values and the top left cell is empty.

     dec: the character used in the file for decimal points.

row.names: a vector of row names.  This can be a vector giving the
          actual row names, or a single number giving the column of the
          table which contains the row names, or character string
          giving the name of the table column containing the row names.

          If there is a header and the first row contains one fewer
          field than the number of columns, the first column in the
          input is used for the row names.  Otherwise if 'row.names' is
          missing, the rows are numbered.

          Using 'row.names = NULL' forces row numbering. 

col.names: a vector of optional names for the variables. The default is
          to use '"V"' followed by the column number.

   as.is: the default behavior of 'read.DIF' is to convert character
          variables (which are not converted to logical, numeric or
          complex) to factors.  The variable 'as.is' controls the
          conversion of columns not otherwise specified by
          'colClasses'. Its value is either a vector of logicals
          (values are recycled if necessary), or a vector of numeric or
          character indices which specify which columns should not be
          converted to factors.

          Note: to suppress all conversions including those of numeric
          columns, set 'colClasses = "character"'.

          Note that 'as.is' is specified per column (not per variable)
          and so includes the column of row names (if any) and any
          columns to be skipped. 

na.strings: a character vector of strings which are to be interpreted
          as 'NA' values.  Blank fields are also considered to be
          missing values in logical, integer, numeric and complex
          fields.

colClasses: character.  A vector of classes to be assumed for the
          columns.  Recycled as necessary, or if the character vector
          is named, unspecified values are taken to be 'NA'.

          Possible values are 'NA' (when 'type.convert' is used),
          '"NULL"' (when the column is skipped), one of the atomic
          vector classes (logical, integer, numeric, complex,
          character, raw), or '"factor"', '"Date"' or '"POSIXct"'. 
          Otherwise there needs to be an 'as' method (from package
          'methods') for conversion from '"character"' to the specified
          formal class.

          Note that 'colClasses' is specified per column (not per
          variable) and so includes the column of row names (if any). 

   nrows: the maximum number of rows to read in.  Negative values are
          ignored.

    skip: the number of lines of the data file to skip before beginning
          to read data.

check.names: logical.  If 'TRUE' then the names of the variables in the
          data frame are checked to ensure that they are syntactically
          valid variable names.  If necessary they are adjusted (by
          'make.names') so that they are, and also to ensure that there
          are no duplicates.

blank.lines.skip: logical: if 'TRUE' blank lines in the input are
          ignored.

stringsAsFactors: logical: should character vectors be converted to
          factors?

transpose: logical, indicating if the row and column interpretation
          should be transposed.  Microsoft's Excel has been known to
          produce (non-standard conforming) DIF files which would need
          'transpose = TRUE' to be read correctly.

_V_a_l_u_e:

     A data frame ('data.frame') containing a representation of the
     data in the file.  Empty input is an error unless 'col.names' is
     specified, when a 0-row data frame is returned: similarly giving
     just a header line if 'header = TRUE' results in a 0-row data
     frame.

_N_o_t_e:

     The columns referred to in 'as.is' and 'colClasses' include the
     column of row names (if any).

     Less memory will be used if 'colClasses' is specified as one of
     the six atomic vector classes.

_A_u_t_h_o_r(_s):

     R Core; 'transpose' option by Christoph Buser, ETH Zurich

_R_e_f_e_r_e_n_c_e_s:

     The DIF format specification can be found by searching on <URL:
     http://www.wotsit.org/>; the optional header fields are ignored.
     See also <URL:
     http://en.wikipedia.org/wiki/Data_Interchange_Format>.

     The term is likely to lead to confusion: Windows will have a
     'Windows Data Interchange Format (DIF) data format' as part of its
     WinFX system, which may or may not be compatible.

_S_e_e _A_l_s_o:

     The _R Data Import/Export_ manual.

     'scan', 'type.convert', 'read.fwf' for reading _f_ixed _w_idth
     _f_ormatted input; 'read.table'; 'data.frame'.

_E_x_a_m_p_l_e_s:

     ## read.DIF() needs transpose=TRUE for file exported from Excel
     udir <- system.file("misc", package="utils")
     dd <- read.DIF(file.path(udir, "exDIF.dif"), header= TRUE, transpose=TRUE)
     dc <- read.csv(file.path(udir, "exDIF.csv"), header= TRUE)
     stopifnot(identical(dd,dc), dim(dd) == c(4,2))