predict.gam package:mgcv R Documentation _P_r_e_d_i_c_t_i_o_n _f_r_o_m _f_i_t_t_e_d _G_A_M _m_o_d_e_l _D_e_s_c_r_i_p_t_i_o_n: Takes a fitted 'gam' object produced by 'gam()' and produces predictions given a new set of values for the model covariates or the original values used for the model fit. Predictions can be accompanied by standard errors, based on the posterior distribution of the model coefficients. The routine can optionally return the matrix by which the model coefficients must be pre-multiplied in order to yield the values of the linear predictor at the supplied covariate values: this is useful for obtaining credible regions for quantities derived from the model, and for lookup table prediction outside 'R' (see example code below). _U_s_a_g_e: ## S3 method for class 'gam': predict(object,newdata,type="link",se.fit=FALSE,terms=NULL, block.size=1000,newdata.guaranteed=FALSE,na.action=na.pass,...) _A_r_g_u_m_e_n_t_s: object: a fitted 'gam' object as produced by 'gam()'. newdata: A data frame or list containing the values of the model covariates at which predictions are required. If this is not provided then predictions corresponding to the original data are returned. If 'newdata' is provided then it should contain all the variables needed for prediction: a warning is generated if not. type: When this has the value '"link"' (default) the linear predictor (possibly with associated standard errors) is returned. When 'type="terms"' each component of the linear predictor is returned seperately (possibly with standard errors): this includes parametric model components, followed by each smooth component, but excludes any offset and any intercept. 'type="iterms"' is the same, except that any standard errors returned for smooth components will include the uncertainty about the intercept/overall mean. When 'type="response"' predictions on the scale of the response are returned (possibly with approximate standard errors). When 'type="lpmatrix"' then a matrix is returned which yields the values of the linear predictor (minus any offset) when postmultiplied by the parameter vector (in this case 'se.fit' is ignored). The latter option is most useful for getting variance estimates for quantities derived from the model: for example integrated quantities, or derivatives of smooths. A linear predictor matrix can also be used to implement approximate prediction outside 'R' (see example code, below). se.fit: when this is TRUE (not default) standard error estimates are returned for each prediction. terms: if 'type=="terms"' then only results for the terms given in this array will be returned. block.size: maximum number of predictions to process per call to underlying code: larger is quicker, but more memory intensive. Set to < 1 to use total number of predictions as this. newdata.guaranteed: Set to 'TRUE' to turn off all checking of 'newdata' except for sanity of factor levels: this can speed things up for large prediction tasks, but 'newdata' must be complete, with no 'NA' values for predictors required in the model. na.action: what to do about 'NA' values in 'newdata'. With the default 'na.pass', any row of 'newdata' containing 'NA' values for required predictors, gives rise to 'NA' predictions (even if the term concerned has no 'NA' predictors). 'na.exclude' or 'na.omit' result in the dropping of 'newdata' rows, if they contain any 'NA' values for required predictors. If 'newdata' is missing then 'NA' handling is determined from 'object$na.action'. ...: other arguments. _D_e_t_a_i_l_s: The standard errors produced by 'predict.gam' are based on the Bayesian posterior covariance matrix of the parameters 'Vp' in the fitted gam object. To facilitate plotting with 'termplot', if 'object' possesses an attribute '"para.only"' and 'type=="terms"' then only parametric terms of order 1 are returned (i.e. those that 'termplot' can handle). Note that, in common with other prediction functions, any offset supplied to 'gam' as an argument is always ignored when predicting, unlike offsets specified in the gam model formula. See the examples for how to use the 'lpmatrix' for obtaining credible regions for quantities derived from the model. _V_a_l_u_e: If 'type=="lpmatrix"' then a matrix is returned which will give a vector of linear predictor values (minus any offest) at the supplied covariate values, when applied to the model coefficient vector. Otherwise, if 'se.fit' is 'TRUE' then a 2 item list is returned with items (both arrays) 'fit' and 'se.fit' containing predictions and associated standard error estimates, otherwise an array of predictions is returned. The dimensions of the returned arrays depends on whether 'type' is '"terms"' or not: if it is then the array is 2 dimensional with each term in the linear predictor separate, otherwise the array is 1 dimensional and contains the linear predictor/predicted values (or corresponding s.e.s). The linear predictor returned termwise will not include the offset or the intercept. 'newdata' can be a data frame, list or model.frame: if it's a model frame then all variables must be supplied. _W_A_R_N_I_N_G: Note that the behaviour of this function is not identical to 'predict.gam()' in Splus. 'type=="terms"' does not exactly match what 'predict.lm' does for parametric model components. _A_u_t_h_o_r(_s): Simon N. Wood simon.wood@r-project.org The design is inspired by the S function of the same name described in Chambers and Hastie (1993) (but is not a clone). _R_e_f_e_r_e_n_c_e_s: Chambers and Hastie (1993) Statistical Models in S. Chapman & Hall. Wood S.N. (2006b) Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC Press. _S_e_e _A_l_s_o: 'gam', 'gamm', 'plot.gam' _E_x_a_m_p_l_e_s: library(mgcv) n<-200 sig <- 2 dat <- gamSim(1,n=n,scale=sig) b<-gam(y~s(x0)+s(I(x1^2))+s(x2)+offset(x3),data=dat) newd <- data.frame(x0=(0:30)/30,x1=(0:30)/30,x2=(0:30)/30,x3=(0:30)/30) pred <- predict.gam(b,newd) ## difference between "terms" and "iterms" nd2 <- data.frame(x0=c(.25,.5),x1=c(.25,.5),x2=c(.25,.5),x3=c(.25,.5)) predict(b,nd2,type="terms",se=TRUE) predict(b,nd2,type="iterms",se=TRUE) ## now get variance of sum of predictions using lpmatrix Xp <- predict(b,newd,type="lpmatrix") ## Xp %*% coef(b) yields vector of predictions a <- rep(1,31) Xs <- t(a) %*% Xp ## Xs %*% coef(b) gives sum of predictions var.sum <- Xs %*% b$Vp %*% t(Xs) ## Now get the variance of non-linear function of predictions ## by simulation from posterior distribution of the params library(MASS) br<-mvrnorm(1000,coef(b),b$Vp) ## 1000 replicate param. vectors res <- rep(0,1000) for (i in 1:1000) { pr <- Xp %*% br[i,] ## replicate predictions res[i] <- sum(log(abs(pr))) ## example non-linear function } mean(res);var(res) ## loop is replace-able by following .... res <- colSums(log(abs(Xp %*% t(br)))) ## The following shows how to use use an "lpmatrix" as a lookup ## table for approximate prediction. The idea is to create ## approximate prediction matrix rows by appropriate linear ## interpolation of an existing prediction matrix. The additivity ## of a GAM makes this possible. ## There is no reason to ever do this in R, but the following ## code provides a useful template for predicting from a fitted ## gam *outside* R: all that is needed is the coefficient vector ## and the prediction matrix. Use larger `Xp'/ smaller `dx' and/or ## higher order interpolation for higher accuracy. xn <- c(.341,.122,.476,.981) ## want prediction at these values x0 <- 1 ## intercept column dx <- 1/30 ## covariate spacing in `newd' for (j in 0:2) { ## loop through smooth terms cols <- 1+j*9 +1:9 ## relevant cols of Xp i <- floor(xn[j+1]*30) ## find relevant rows of Xp w1 <- (xn[j+1]-i*dx)/dx ## interpolation weights ## find approx. predict matrix row portion, by interpolation x0 <- c(x0,Xp[i+2,cols]*w1 + Xp[i+1,cols]*(1-w1)) } dim(x0)<-c(1,28) fv <- x0%*%coef(b) + xn[4];fv ## evaluate and add offset se <- sqrt(x0%*%b$Vp%*%t(x0));se ## get standard error ## compare to normal prediction predict(b,newdata=data.frame(x0=xn[1],x1=xn[2], x2=xn[3],x3=xn[4]),se=TRUE)