formula package:stats R Documentation _M_o_d_e_l _F_o_r_m_u_l_a_e _D_e_s_c_r_i_p_t_i_o_n: The generic function 'formula' and its specific methods provide a way of extracting formulae which have been included in other objects. 'as.formula' is almost identical, additionally preserving attributes when 'object' already inherits from '"formula"'. The default value of the 'env' argument is used only when the formula would otherwise lack an environment. _U_s_a_g_e: formula(x, ...) as.formula(object, env = parent.frame()) _A_r_g_u_m_e_n_t_s: x, object: R object. ...: further arguments passed to or from other methods. env: the environment to associate with the result. _D_e_t_a_i_l_s: The models fit by, e.g., the 'lm' and 'glm' functions are specified in a compact symbolic form. The '~' operator is basic in the formation of such models. An expression of the form 'y ~ model' is interpreted as a specification that the response 'y' is modelled by a linear predictor specified symbolically by 'model'. Such a model consists of a series of terms separated by '+' operators. The terms themselves consist of variable and factor names separated by ':' operators. Such a term is interpreted as the interaction of all the variables and factors appearing in the term. In addition to '+' and ':', a number of other operators are useful in model formulae. The '*' operator denotes factor crossing: 'a*b' interpreted as 'a+b+a:b'. The '^' operator indicates crossing to the specified degree. For example '(a+b+c)^2' is identical to '(a+b+c)*(a+b+c)' which in turn expands to a formula containing the main effects for 'a', 'b' and 'c' together with their second-order interactions. The '%in%' operator indicates that the terms on its left are nested within those on the right. For example 'a + b %in% a' expands to the formula 'a + a:b'. The '-' operator removes the specified terms, so that '(a+b+c)^2 - a:b' is identical to 'a + b + c + b:c + a:c'. It can also used to remove the intercept term: 'y ~ x - 1' is a line through the origin. A model with no intercept can be also specified as 'y ~ x + 0' or 'y ~ 0 + x'. While formulae usually involve just variable and factor names, they can also involve arithmetic expressions. The formula 'log(y) ~ a + log(x)' is quite legal. When such arithmetic expressions involve operators which are also used symbolically in model formulae, there can be confusion between arithmetic and symbolic operator use. To avoid this confusion, the function 'I()' can be used to bracket those portions of a model formula where the operators are used in their arithmetic sense. For example, in the formula 'y ~ a + I(b+c)', the term 'b+c' is to be interpreted as the sum of 'b' and 'c'. Variable names can be quoted by backticks '`like this`' in formulae, although there is no guarantee that all code using formulae will accept such non-syntactic names. Most model-fitting functions accept formulae with right-hand-side including the function 'offset' to indicate terms with a fixed coefficient of one. Some functions accept other 'specials' such as 'strata' or 'cluster' (see the 'specials' argument of 'terms.formula)'. There are two special interpretations of '.' in a formula. The usual one is in the context of a 'data' argument of model fitting functions and means 'all columns not otherwise in the formula': see 'terms.formula'. In the context of 'update.formula', *only*, it means 'what was previously in this part of the formula'. When 'formula' is called on a fitted model object, either a specific method is used (such as that for class '"nls"') or the default method. The default first looks for a '"formula"' component of the object (and evaluates it), then a '"terms"' component, then a 'formula' parameter of the call (and evaluates its value) and finally a '"formula"' attribute. There is a method for data frames. If there is only one column this forms the RHS with an empty LHS. For more columns, the first column is the LHS of the formula and the remaining columns separated by '+' form the RHS. _V_a_l_u_e: All the functions above produce an object of class '"formula"' which contains a symbolic model formula. _E_n_v_i_r_o_n_m_e_n_t_s: A formula object has an associated environment, and this environment (rather than the parent environment) is used by 'model.frame' to evaluate variables that are not found in the supplied 'data' argument. Formulas created with the '~' operator use the environment in which they were created. Formulas created with 'as.formula' will use the 'env' argument for their environment. Pre-existing formulas extracted with 'as.formula' will only have their environment changed if 'env' is given explicitly. _R_e_f_e_r_e_n_c_e_s: Chambers, J. M. and Hastie, T. J. (1992) _Statistical models._ Chapter 2 of _Statistical Models in S_ eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole. _S_e_e _A_l_s_o: 'I', 'offset'. For formula manipulation: 'terms', and 'all.vars'; for typical use: 'lm', 'glm', and 'coplot'. _E_x_a_m_p_l_e_s: class(fo <- y ~ x1*x2) # "formula" fo typeof(fo)# R internal : "language" terms(fo) environment(fo) environment(as.formula("y ~ x")) environment(as.formula("y ~ x", env=new.env())) ## Create a formula for a model with a large number of variables: xnam <- paste("x", 1:25, sep="") (fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))