ppr                  package:stats                  R Documentation

_P_r_o_j_e_c_t_i_o_n _P_u_r_s_u_i_t _R_e_g_r_e_s_s_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Fit a projection pursuit regression model.

_U_s_a_g_e:

     ppr(x, ...)

     ## S3 method for class 'formula':
     ppr(formula, data, weights, subset, na.action,
         contrasts = NULL, ..., model = FALSE)

     ## Default S3 method:
     ppr(x, y, weights = rep(1,n),
         ww = rep(1,q), nterms, max.terms = nterms, optlevel = 2,
         sm.method = c("supsmu", "spline", "gcvspline"),
         bass = 0, span = 0, df = 5, gcvpen = 1, ...)

_A_r_g_u_m_e_n_t_s:

 formula: a formula specifying one or more numeric response variables
          and the explanatory variables. 

       x: numeric matrix of explanatory variables.  Rows represent
          observations, and columns represent variables.  Missing
          values are not accepted. 

       y: numeric matrix of response variables.  Rows represent
          observations, and columns represent variables.  Missing
          values are not accepted. 

  nterms: number of terms to include in the final model.

    data: a data frame (or similar: see 'model.frame') from which
          variables specified in 'formula' are preferentially to be
          taken. 

 weights: a vector of weights 'w_i' for each _case_.

      ww: a vector of weights for each _response_, so the fit criterion
          is the sum over case 'i' and responses 'j' of 'w_i ww_j (y_ij
          - fit_ij)^2' divided by the sum of 'w_i'. 

  subset: an index vector specifying the cases to be used in the
          training sample.  (NOTE: If given, this argument must be
          named.) 

na.action: a function to specify the action to be taken if 'NA's are
          found. The default action is given by
          'getOption("na.action")'. (NOTE: If given, this argument must
          be named.) 

contrasts: the contrasts to be used when any factor explanatory
          variables are coded. 

max.terms: maximum number of terms to choose from when building the
          model. 

optlevel: integer from 0 to 3 which determines the thoroughness of an
          optimization routine in the SMART program. See the 'Details'
          section. 

sm.method: the method used for smoothing the ridge functions.  The
          default is to use Friedman's super smoother 'supsmu'.  The
          alternatives are to use the smoothing spline code underlying
          'smooth.spline', either with a specified (equivalent) degrees
          of freedom for each ridge functions, or to allow the
          smoothness to be chosen by GCV. 

    bass: super smoother bass tone control used with automatic span
          selection (see 'supsmu'); the range of values is 0 to 10,
          with larger values resulting in increased smoothing. 

    span: super smoother span control (see 'supsmu').  The default,
          '0', results in automatic span selection by local cross
          validation. 'span' can also take a value in '(0, 1]'. 

      df: if 'sm.method' is '"spline"' specifies the smoothness of each
          ridge term via the requested equivalent degrees of freedom. 

  gcvpen: if 'sm.method' is '"gcvspline"' this is the penalty used in
          the GCV selection for each degree of freedom used. 

     ...: arguments to be passed to or from other methods.

   model: logical.  If true, the model frame is returned.

_D_e_t_a_i_l_s:

     The basic method is given by Friedman (1984), and is essentially
     the same code used by S-PLUS's 'ppreg'.  This code is extremely
     sensitive to the compiler used.

     The algorithm first adds up to 'max.terms' ridge terms one at a
     time; it will use less if it is unable to find a term to add that
     makes sufficient difference.  It then removes the least important
     term at each step until 'nterms' terms are left.

     The levels of optimization (argument 'optlevel') differ in how
     thoroughly the models are refitted during this process. At level 0
     the existing ridge terms are not refitted.  At level 1 the
     projection directions are not refitted, but the ridge functions
     and the regression coefficients are. Levels 2 and 3 refit all the
     terms and are equivalent for one response; level 3 is more careful
     to re-balance the contributions from each regressor at each step
     and so is a little less likely to converge to a saddle point of
     the sum of squares criterion.

_V_a_l_u_e:

     A list with the following components, many of which are for use by
     the method functions.

    call: the matched call

       p: the number of explanatory variables (after any coding)

       q: the number of response variables

      mu: the argument 'nterms'

      ml: the argument 'max.terms'

     gof: the overall residual (weighted) sum of squares for the
          selected model

    gofn: the overall residual (weighted) sum of squares against the
          number of terms, up to 'max.terms'.  Will be invalid (and
          zero) for less than 'nterms'.

      df: the argument 'df'

     edf: if 'sm.method' is '"spline"' or '"gcvspline"' the equivalent
          number of degrees of freedom for each ridge term used.

  xnames: the names of the explanatory variables

  ynames: the names of the response variables

   alpha: a matrix of the projection directions, with a column for each
          ridge term

    beta: a matrix of the coefficients applied for each response to the
          ridge terms: the rows are the responses and the columns the
          ridge terms

      yb: the weighted means of each response

      ys: the overall scale factor used: internally the responses are
          divided by 'ys' to have unit total weighted sum of squares.

fitted.values: the fitted values, as a matrix if 'q > 1'.

residuals: the residuals, as a matrix if 'q > 1'.

    smod: internal work array, which includes the ridge functions
          evaluated at the training set points.

   model: (only if 'model=TRUE') the model frame.

_R_e_f_e_r_e_n_c_e_s:

     Friedman, J. H. and Stuetzle, W. (1981) Projection pursuit
     regression. _Journal of the American Statistical Association_,
     *76*, 817-823.

     Friedman, J. H. (1984) SMART User's Guide. Laboratory for
     Computational Statistics, Stanford University Technical Report No.
     1.

     Venables, W. N. & Ripley, B. D. (2002) _Modern Applied Statistics
     with S._  Springer.

_S_e_e _A_l_s_o:

     'plot.ppr', 'supsmu', 'smooth.spline'

_E_x_a_m_p_l_e_s:

     require(graphics)

     # Note: your numerical values may differ
     attach(rock)
     area1 <- area/10000; peri1 <- peri/10000
     rock.ppr <- ppr(log(perm) ~ area1 + peri1 + shape,
                     data = rock, nterms = 2, max.terms = 5)
     rock.ppr
     # Call:
     # ppr.formula(formula = log(perm) ~ area1 + peri1 + shape, data = rock,
     #     nterms = 2, max.terms = 5)
     #
     # Goodness of fit:
     #  2 terms  3 terms  4 terms  5 terms
     # 8.737806 5.289517 4.745799 4.490378

     summary(rock.ppr)
     # .....  (same as above)
     # .....
     #
     # Projection direction vectors:
     #       term 1      term 2
     # area1  0.34357179  0.37071027
     # peri1 -0.93781471 -0.61923542
     # shape  0.04961846  0.69218595
     #
     # Coefficients of ridge terms:
     #    term 1    term 2
     # 1.6079271 0.5460971

     par(mfrow=c(3,2))# maybe: , pty="s")
     plot(rock.ppr, main="ppr(log(perm)~ ., nterms=2, max.terms=5)")
     plot(update(rock.ppr, bass=5), main = "update(..., bass = 5)")
     plot(update(rock.ppr, sm.method="gcv", gcvpen=2),
          main = "update(..., sm.method=\"gcv\", gcvpen=2)")
     cbind(perm=rock$perm, prediction=round(exp(predict(rock.ppr)), 1))
     detach()