predict.gam               package:mgcv               R Documentation

_P_r_e_d_i_c_t_i_o_n _f_r_o_m _f_i_t_t_e_d _G_A_M _m_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     Takes a fitted 'gam' object produced by 'gam()'  and produces
     predictions given a new set of values for the model covariates  or
     the original values used for the model fit. Predictions can be
     accompanied by standard errors, based on the posterior
     distribution of the model coefficients. The routine can optionally
     return the matrix by which the model coefficients must be
     pre-multiplied in order to yield the values of the linear
     predictor at the supplied covariate values: this is useful for
     obtaining credible regions for quantities derived from the model,
     and for lookup table prediction outside 'R' (see example code
     below).

_U_s_a_g_e:

     ## S3 method for class 'gam':
     predict(object,newdata,type="link",se.fit=FALSE,terms=NULL,
             block.size=1000,newdata.guaranteed=FALSE,na.action=na.pass,...)

_A_r_g_u_m_e_n_t_s:

  object: a fitted 'gam' object as produced by 'gam()'. 

 newdata: A data frame or list containing the values of the model
          covariates at which predictions are required. If this is not
          provided then predictions corresponding to the original data
          are returned. If 'newdata' is provided then it should contain
          all the variables needed for prediction: a warning is
          generated if not. 

    type: When this has the value '"link"' (default) the linear
          predictor (possibly with associated standard errors) is
          returned. When 'type="terms"' each component of the  linear
          predictor is returned seperately (possibly with standard
          errors): this includes  parametric model components, followed
          by each smooth component, but excludes any offset and any
          intercept. 'type="iterms"' is the same, except that any
          standard errors  returned for smooth components will include
          the uncertainty about the intercept/overall mean.  When 
          'type="response"' predictions  on the scale of the response
          are returned (possibly with approximate standard errors).
          When 'type="lpmatrix"' then a matrix is returned which yields
          the values of the linear predictor (minus any offset) when
          postmultiplied by the parameter vector (in this case 'se.fit'
          is ignored). The latter option is most useful for getting
          variance estimates for quantities derived from the model: for
          example integrated quantities, or derivatives of smooths. A
          linear predictor matrix can also be used to implement
          approximate prediction outside 'R' (see example code, below). 

  se.fit: when this is TRUE (not default) standard error estimates are
          returned for each prediction.

   terms: if 'type=="terms"' then only results for the terms given in
          this array will be returned.

block.size: maximum number of predictions to process per call to
          underlying code: larger is quicker, but more memory
          intensive. Set to < 1 to use total number of predictions as
          this.

newdata.guaranteed: Set to 'TRUE' to turn off all checking of 'newdata'
          except for sanity of factor levels: this can speed things up
          for large prediction tasks, but 'newdata' must be complete,
          with no 'NA' values for predictors required in the model. 

na.action: what to do about 'NA' values in 'newdata'. With the default
          'na.pass', any row of 'newdata' containing 'NA' values for
          required predictors, gives rise to 'NA' predictions (even if
          the term concerned has no 'NA' predictors). 'na.exclude' or
          'na.omit' result in the dropping of 'newdata' rows, if they
          contain any 'NA' values for required predictors. If 'newdata'
          is missing then 'NA' handling is  determined from
          'object$na.action'.

     ...: other arguments.

_D_e_t_a_i_l_s:

     The standard errors produced by 'predict.gam' are based on the
     Bayesian posterior covariance matrix of the parameters 'Vp' in the
     fitted gam object. 

     To facilitate plotting with 'termplot', if 'object' possesses an
     attribute '"para.only"' and 'type=="terms"' then only parametric
     terms of order 1 are returned (i.e. those that 'termplot' can
     handle).

     Note that, in common with other prediction functions, any offset
     supplied to 'gam' as an argument is always ignored when
     predicting, unlike offsets specified in the gam model formula.

     See the examples for how to use the 'lpmatrix' for obtaining
     credible regions for quantities derived from the model.

_V_a_l_u_e:

     If 'type=="lpmatrix"' then a matrix is returned which will give a
     vector of linear predictor values (minus any offest) at the
     supplied covariate values, when applied to the model coefficient
     vector.  Otherwise, if 'se.fit' is 'TRUE' then a 2 item list is
     returned with items (both arrays) 'fit' and 'se.fit' containing
     predictions and associated standard error estimates, otherwise an 
     array of predictions is returned. The dimensions of the returned
     arrays depends on whether  'type' is '"terms"' or not: if it is
     then the array is 2 dimensional with each  term in the linear
     predictor separate, otherwise the array is 1 dimensional and
     contains the  linear predictor/predicted values (or corresponding
     s.e.s). The linear predictor returned termwise will  not include
     the offset or the intercept.

     'newdata' can be a data frame, list or model.frame: if it's a
     model frame then all variables must be supplied.

_W_A_R_N_I_N_G:

     Note that the behaviour of this function is not identical to 
     'predict.gam()' in Splus.

     'type=="terms"' does not exactly match what 'predict.lm' does for
     parametric model components.

_A_u_t_h_o_r(_s):

     Simon N. Wood simon.wood@r-project.org 

     The design is inspired by the S function of the same name
     described in Chambers and Hastie (1993) (but is not a clone).

_R_e_f_e_r_e_n_c_e_s:

     Chambers and Hastie (1993) Statistical Models in S. Chapman &
     Hall.

     Wood S.N. (2006b) Generalized Additive Models: An Introduction
     with R. Chapman and Hall/CRC Press.

_S_e_e _A_l_s_o:

     'gam', 'gamm', 'plot.gam'

_E_x_a_m_p_l_e_s:

     library(mgcv)
     n<-200
     sig <- 2
     dat <- gamSim(1,n=n,scale=sig)

     b<-gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat)

     newd <- data.frame(x0=(0:30)/30,x1=(0:30)/30,x2=(0:30)/30,x3=(0:30)/30)
     pred <- predict.gam(b,newd)

     ## difference between "terms" and "iterms"
     nd2 <- data.frame(x0=c(.25,.5),x1=c(.25,.5),x2=c(.25,.5),x3=c(.25,.5))
     predict(b,nd2,type="terms",se=TRUE)
     predict(b,nd2,type="iterms",se=TRUE)

     ## now get variance of sum of predictions using lpmatrix

     Xp <- predict(b,newd,type="lpmatrix") 

     ## Xp %*% coef(b) yields vector of predictions

     a <- rep(1,31)
     Xs <- t(a) %*% Xp ## Xs %*% coef(b) gives sum of predictions
     var.sum <- Xs %*% b$Vp %*% t(Xs)

     ## Now get the variance of non-linear function of predictions
     ## by simulation from posterior distribution of the params

     library(MASS)
     br<-mvrnorm(1000,coef(b),b$Vp) ## 1000 replicate param. vectors
     res <- rep(0,1000)
     for (i in 1:1000)
     { pr <- Xp %*% br[i,] ## replicate predictions
       res[i] <- sum(log(abs(pr))) ## example non-linear function
     }
     mean(res);var(res)

     ## loop is replace-able by following .... 

     res <- colSums(log(abs(Xp %*% t(br))))

     ## The following shows how to use use an "lpmatrix" as a lookup 
     ## table for approximate prediction. The idea is to create 
     ## approximate prediction matrix rows by appropriate linear 
     ## interpolation of an existing prediction matrix. The additivity 
     ## of a GAM makes this possible. 
     ## There is no reason to ever do this in R, but the following 
     ## code provides a useful template for predicting from a fitted 
     ## gam *outside* R: all that is needed is the coefficient vector 
     ## and the prediction matrix. Use larger `Xp'/ smaller `dx' and/or 
     ## higher order interpolation for higher accuracy.  

     xn <- c(.341,.122,.476,.981) ## want prediction at these values
     x0 <- 1         ## intercept column
     dx <- 1/30      ## covariate spacing in `newd'
     for (j in 0:2) { ## loop through smooth terms
       cols <- 1+j*9 +1:9      ## relevant cols of Xp
       i <- floor(xn[j+1]*30)  ## find relevant rows of Xp
       w1 <- (xn[j+1]-i*dx)/dx ## interpolation weights
       ## find approx. predict matrix row portion, by interpolation
       x0 <- c(x0,Xp[i+2,cols]*w1 + Xp[i+1,cols]*(1-w1))
     }
     dim(x0)<-c(1,28) 
     fv <- x0%*%coef(b) + xn[4];fv    ## evaluate and add offset
     se <- sqrt(x0%*%b$Vp%*%t(x0));se ## get standard error
     ## compare to normal prediction
     predict(b,newdata=data.frame(x0=xn[1],x1=xn[2],
             x2=xn[3],x3=xn[4]),se=TRUE)