model.frame              package:stats              R Documentation

_E_x_t_r_a_c_t_i_n_g _t_h_e "_E_n_v_i_r_o_n_m_e_n_t" _o_f _a _M_o_d_e_l _F_o_r_m_u_l_a

_D_e_s_c_r_i_p_t_i_o_n:

     'model.frame' (a generic function) and its methods return a
     'data.frame' with the variables needed to use 'formula' and any
     '...' arguments.

_U_s_a_g_e:

     model.frame(formula, ...)

     ## Default S3 method:
     model.frame(formula, data = NULL,
                subset = NULL, na.action = na.fail,
                drop.unused.levels = FALSE, xlev = NULL, ...)

     ## S3 method for class 'aovlist':
     model.frame(formula, data = NULL, ...)

     ## S3 method for class 'glm':
     model.frame(formula, ...)

     ## S3 method for class 'lm':
     model.frame(formula, ...)

     get_all_vars(formula, data, ...)

_A_r_g_u_m_e_n_t_s:

 formula: a model 'formula' or 'terms' object or an R object.

    data: a data.frame, list or environment (or object coercible by
          'as.data.frame' to a data.frame), containing the variables in
          'formula'.  Neither a matrix nor an array will be accepted.

  subset: a specification of the rows to be used: defaults to all rows.
          This can be any valid indexing vector (see '[.data.frame')
          for the rows of 'data' or if that is not supplied, a data
          frame made up of the variables used in 'formula'.

na.action: how 'NA's are treated.  The default is first, any
          'na.action' attribute of 'data', second a 'na.action' setting
          of 'options', and third 'na.fail' if that is unset.  The
          'factory-fresh' default is 'na.omit'.  Another possible value
          is 'NULL'.

drop.unused.levels: should factors have unused levels dropped? Defaults
          to 'FALSE'.

    xlev: a named list of character vectors giving the full set of
          levels to be assumed for each factor.

     ...: further arguments such as 'data', 'na.action', 'subset'.  Any
          additional arguments such as 'offset' and 'weights' which
          reach the default method are used to create further columns
          in the model frame, with parenthesised names such as
          '"(offset)"'.

_D_e_t_a_i_l_s:

     Exactly what happens depends on the class and attributes of the
     object 'formula'.  If this is an object of fitted-model class such
     as '"lm"', the method will either return the saved model frame
     used when fitting the model (if any, often selected by argument
     'model = TRUE') or pass the call used when fitting on to the
     default method.  The default method itself can cope with rather
     standard model objects such as those of class '"lqs"' from package
     'MASS' if no other arguments are supplied.

     The rest of this section applies only to the default method.

     If either 'formula' or 'data' is already a model frame (a data
     frame with a '"terms"' attribute) and the other is missing, the
     model frame is returned.  Unless 'formula' is a terms object,
     'as.formula' and then 'terms' is called on it.  (If you wish to
     use the 'keep.order' argument of 'terms.formula', pass a terms
     object rather than a formula.)

     Row names for the model frame are taken from the 'data' argument
     if present, then from the names of the response in the formula (or
     rownames if it is a matrix), if there is one.

     All the variables in 'formula', 'subset' and in '...' are looked
     for first in 'data' and then in the environment of 'formula' (see
     the help for 'formula()' for further details) and collected into a
     data frame.  Then the 'subset' expression is evaluated, and it is
     used as a row index to the data frame.  Then the 'na.action'
     function is applied to the data frame (and may well add
     attributes).  The levels of any factors in the data frame are
     adjusted according to the 'drop.unused.levels' and 'xlev'
     arguments.

     Unless 'na.action = NULL', time-series attributes will be removed
     from the variables found (since they will be wrong if 'NA's are
     removed).

     Note that _all_ the variables in the formula are included in the
     data frame, even those preceded by '-'.

     Only variables whose type is raw, logical, integer, real, complex
     or character can be included in a model frame: this includes
     classed variables such as factors (whose underlying type is
     integer), but excludes lists.

     'get_all_vars' returns a 'data.frame' containing the variables
     used in 'formula' plus those specified '...'. Unlike
     'model.frame.default', it returns the input variables and not
     those resulting from function calls in 'formula'.

_V_a_l_u_e:

     A 'data.frame' containing the variables used in 'formula' plus
     those specified in '...'.  It will have additional attributes,
     including '"terms"' for an object of class '"terms"' derived from
     'formula', and possibly '"na.action"' giving information on the
     handling of 'NA's (which will not be present if no special
     handling was done, e.g. by 'na.pass').

_R_e_f_e_r_e_n_c_e_s:

     Chambers, J. M. (1992) _Data for models._ Chapter 3 of
     _Statistical Models in S_ eds J. M. Chambers and T. J. Hastie,
     Wadsworth & Brooks/Cole.

_S_e_e _A_l_s_o:

     'model.matrix' for the 'design matrix', 'formula' for formulas 
     and 'expand.model.frame' for model.frame manipulation.

_E_x_a_m_p_l_e_s:

     data.class(model.frame(dist ~ speed, data = cars))