gam package:mgcv R Documentation _G_e_n_e_r_a_l_i_z_e_d _a_d_d_i_t_i_v_e _m_o_d_e_l_s _w_i_t_h _i_n_t_e_g_r_a_t_e_d _s_m_o_o_t_h_n_e_s_s _e_s_t_i_m_a_t_i_o_n _D_e_s_c_r_i_p_t_i_o_n: Fits a generalized additive model (GAM) to data, the term `GAM' being taken to include any quadratically penalized GLM. The degree of smoothness of model terms is estimated as part of fitting. 'gam' can also fit any GLM subject to multiple quadratic penalties (including estimation of degree of penalization). Isotropic or scale invariant smooths of any number of variables are available as model terms, as are linear functionals of such smooths; confidence/credible intervals are readily available for any quantity predicted using a fitted model; 'gam' is extendable: users can add smooths. Smooth terms are represented using penalized regression splines (or similar smoothers) with smoothing parameters selected by GCV/UBRE/AIC/REML or by regression splines with fixed degrees of freedom (mixtures of the two are permitted). Multi-dimensional smooths are available using penalized thin plate regression splines (isotropic) or tensor product splines (when an isotropic smooth is inappropriate). For an overview of the smooths available see 'smooth.terms'. For more on specifying models see 'gam.models' and 'linear.functional.terms'. For more on model selection see 'gam.selection'. 'gam()' is not a clone of what S-PLUS provides: the major differences are (i) that by default estimation of the degree of smoothness of model terms is part of model fitting, (ii) a Bayesian approach to variance estimation is employed that makes for easier confidence interval calculation (with good coverage probabilites), (iii) that the model can depend on any (bounded) linear functional of smooth terms, and, (iv) the parametric part of the model can be penalized, and (v) the facilities for incorporating smooths of more than one variable are different: specifically there are no 'lo' smooths, but instead (a) 's' terms can have more than one argument, implying an isotropic smooth and (b) 'te' smooths are provided as an effective means for modelling smooth interactions of any number of variables via scale invariant tensor product smooths. If you want a clone of what S-PLUS provides use gam from package 'gam'. _U_s_a_g_e: gam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL, na.action,offset=NULL,method="GCV.Cp", optimizer=c("outer","newton"),control=gam.control(),scale=0, select=FALSE,knots=NULL,sp=NULL,min.sp=NULL,H=NULL,gamma=1, fit=TRUE,paraPen=NULL,G=NULL,in.out,...) _A_r_g_u_m_e_n_t_s: formula: A GAM formula (see 'formula.gam' and also 'gam.models'). This is exactly like the formula for a GLM except that smooth terms, 's' and 'te' can be added to the right hand side to specify that the linear predictor depends on smooth functions of predictors (or linear functionals of these). family: This is a family object specifying the distribution and link to use in fitting etc. See 'glm' and 'family' for more details. A negative binomial family is provided: see 'negbin'. data: A data frame or list containing the model response variable and covariates required by the formula. By default the variables are taken from 'environment(formula)': typically the environment from which 'gam' is called. weights: prior weights on the data. subset: an optional vector specifying a subset of observations to be used in the fitting process. na.action: a function which indicates what should happen when the data contain `NA's. The default is set by the `na.action' setting of `options', and is `na.fail' if that is unset. The ``factory-fresh'' default is `na.omit'. offset: Can be used to supply a model offset for use in fitting. Note that this offset will always be completely ignored when predicting, unlike an offset included in 'formula': this conforms to the behaviour of 'lm' and 'glm'. control: A list of fit control parameters returned by 'gam.control'. method: The smoothing parameter estimation method. '"GCV.Cp"' to use GCV for unknown scale parameter and Mallows' Cp/UBRE/AIC for known scale. '"GACV.Cp"' is equivalent, but using GACV in place of GCV. '"REML"' for REML estimation, including of unknown scale, '"P-REML"' for REML estimation, but using a Pearson estimate of the scale. '"ML"' and '"P-ML"' are similar, but using maximum likelihood in place of REML. optimizer: An array specifying the numerical optimization method to use to optimize the smoothing parameter estimation criterion (given by 'method'). '"perf"' for performance iteration. '"outer"' for the more stable direct approach. '"outer"' can use several alternative optimizers, specified in the second element of 'optimizer': '"newton"' (default), '"bfgs"', '"optim"', '"nlm"' and '"nlm.fd"' (the latter is based entirely on finite differenced derivatives and is very slow). scale: If this is positive then it is taken as the known scale parameter. Negative signals that the scale paraemter is unknown. 0 signals that the scale parameter is 1 for Poisson and binomial and unknown otherwise. Note that (RE)ML methods can only work with scale parameter 1 for the Poisson and binomial cases. select: If this is 'TRUE' then 'gam' can add an extra penalty to each term so that it can be penalized to zero. This means that the smoothing parameter estimation that is part of fitting can completely remove terms from the model. If the corresponding smoothing parameter is estimated as zero then the extra penalty has no effect. knots: this is an optional list containing user specified knot values to be used for basis construction. For most bases the user simply supplies the knots to be used, which must match up with the 'k' value supplied (note that the number of knots is not always just 'k'). See 'tprs' for what happens in the '"tp"/"ts"' case. Different terms can use different numbers of knots, unless they share a covariate. sp: A vector of smoothing parameters can be provided here. Smoothing parameters must be supplied in the order that the smooth terms appear in the model formula. Negative elements indicate that the parameter should be estimated, and hence a mixture of fixed and estimated parameters is possible. If smooths share smoothing parameters then 'length(sp)' must correspond to the number of underlying smoothing parameters. min.sp: Lower bounds can be supplied for the smoothing parameters. Note that if this option is used then the smoothing parameters 'full.sp', in the returned object, will need to be added to what is supplied here to get the smoothing parameters actually multiplying the penalties. 'length(min.sp)' should always be the same as the total number of penalties (so it may be longer than 'sp', if smooths share smoothing parameters). H: A user supplied fixed quadratic penalty on the parameters of the GAM can be supplied, with this as its coefficient matrix. A common use of this term is to add a ridge penalty to the parameters of the GAM in circumstances in which the model is close to un-identifiable on the scale of the linear predictor, but perfectly well defined on the response scale. gamma: It is sometimes useful to inflate the model degrees of freedom in the GCV or UBRE/AIC score by a constant multiplier. This allows such a multiplier to be supplied. fit: If this argument is 'TRUE' then 'gam' sets up the model and fits it, but if it is 'FALSE' then the model is set up and an object 'G' containing what would be required to fit is returned is returned. See argument 'G'. paraPen: optional list specifying any penalties to be applied to parametric model terms. 'gam.models' explains more. G: Usually 'NULL', but may contain the object returned by a previous call to 'gam' with 'fit=FALSE', in which case all other arguments are ignored except for 'gamma', 'in.out', 'control', 'method' and 'fit'. in.out: optional list for initializing outer iteration. If supplied then this must contain two elements: 'sp' should be an array of initialization values for all smoothing parameters (there must be a value for all smoothing parameters, whether fixed or to be estimated, but those for fixed s.p.s are not used); 'scale' is the typical scale of the GCV/UBRE function, for passing to the outer optimizer. ...: further arguments for passing on e.g. to 'gam.fit' (such as 'mustart'). _D_e_t_a_i_l_s: A generalized additive model (GAM) is a generalized linear model (GLM) in which the linear predictor is given by a user specified sum of smooth functions of the covariates plus a conventional parametric component of the linear predictor. A simple example is: log(E(y_i))=f_1(x_1i)+f_2(x_2i) where the (independent) response variables y_i~Poi, and f_1 and f_2 are smooth functions of covariates x_1 and x_2. The log is an example of a link function. If absolutely any smooth functions were allowed in model fitting then maximum likelihood estimation of such models would invariably result in complex overfitting estimates of f_1 and f_2. For this reason the models are usually fit by penalized likelihood maximization, in which the model (negative log) likelihood is modified by the addition of a penalty for each smooth function, penalizing its `wiggliness'. To control the tradeoff between penalizing wiggliness and penalizing badness of fit each penalty is multiplied by an associated smoothing parameter: how to estimate these parameters, and how to practically represent the smooth functions are the main statistical questions introduced by moving from GLMs to GAMs. The 'mgcv' implementation of 'gam' represents the smooth functions using penalized regression splines, and by default uses basis functions for these splines that are designed to be optimal, given the number basis functions used. The smooth terms can be functions of any number of covariates and the user has some control over how smoothness of the functions is measured. 'gam' in 'mgcv' solves the smoothing parameter estimation problem by using the Generalized Cross Validation (GCV) criterion n D/(n - DoF)^2 or an Un-Biased Risk Estimator (UBRE )criterion D/n + 2 s DoF / n -s where D is the deviance, n the number of data, s the scale parameter and DoF the effective degrees of freedom of the model. Notice that UBRE is effectively just AIC rescaled, but is only used when s is known. Alternatives are GACV, or a Laplace approximation to REML, there is some evidence that the latter may actually be the most effective choise. Smoothing parameters are chosen to minimize the GCV, UBRE/AIC, GACV or REML scores for the model, and the main computational challenge solved by the 'mgcv' package is to do this efficiently and reliably. Various alternative numerical methods are provided which can be set by argument 'optimizer'. Broadly 'gam' works by first constructing basis functions and one or more quadratic penalty coefficient matrices for each smooth term in the model formula, obtaining a model matrix for the strictly parametric part of the model formula, and combining these to obtain a complete model matrix (/design matrix) and a set of penalty matrices for the smooth terms. Some linear identifiability constraints are also obtained at this point. The model is fit using 'gam.fit', a modification of 'glm.fit'. The GAM penalized likelihood maximization problem is solved by Penalized Iteratively Reweighted Least Squares (P-IRLS) (see e.g. Wood 2000). Smoothing parameter selection is integrated in one of two ways. (i) `Performance iteration' uses the fact that at each P-IRLS iteration a penalized weighted least squares problem is solved, and the smoothing parameters of that problem can estimated by GCV or UBRE. Eventually, in most cases, both model parameter estimates and smoothing parameter estimates converge. (ii) Alternatively the P-IRLS scheme is iterated to convergence for each trial set of smoothing parameters, and GCV, UBRE or REML scores are only evaluated on convergence - optimization is then `outer' to the P-IRLS loop: in this case the P-IRLS iteration has to be differentiated, to facilitate optimization, and 'gam.fit3' is used in place of 'gam.fit'. The default is the second method, outer iteration. Several alternative basis-penalty types are built in for representing model smooths, but alternatives can easily be added (see 'smooth.terms' for an overview and 'smooth.construct' for how to add smooth classes). In practice the default basis is usually the best choise, but the choise of the basis dimension ('k' in the 's' and 'te' terms) is something that should be considered carefully (the exact value is not critical, but it is important not to make it restrictively small, nor very large and computationally costly). The basis should be chosen to be larger than is believed to be necessary to approximate the smooth function concerned. The effective degrees of freedom for the smooth will then be controlled by the smoothing penalty on the term, and (usually) selected automatically (withy an upper limit set by 'k-1' or occasionally 'k'). Of course the 'k' should not be made too large, or computation will be slow (or in extreme cases there will be more coefficients to estimate than there are data). Note that 'gam' assumes a very inclusive definition of what counts as a GAM: basically any penalized GLM can be used: to this end 'gam' allows the non smooth model components to be penalized via argument 'paraPen' and allows the linear predictor to depend on general linear functionals of smooths, via the summation convention mechanism described in 'linear.functional.terms'. Details of the default underlying fitting methods are given in Wood (2004 and 2008). Some alternative methods are discussed in Wood (2000 and 2006). _V_a_l_u_e: If 'fit=FALSE' the function returns a list 'G' of items needed to fit a GAM, but doesn't actually fit it. Otherwise the function returns an object of class '"gam"' as described in 'gamObject'. _W_A_R_N_I_N_G_S: You must have more unique combinations of covariates than the model has total parameters. (Total parameters is sum of basis dimensions plus sum of non-spline terms less the number of spline terms). Automatic smoothing parameter selection is not likely to work well when fitting models to very few response data. For data with many zeroes clustered together in the covariate space it is quite easy to set up GAMs which suffer from identifiability problems, particularly when using Poisson or binomial families. The problem is that with e.g. log or logit links, mean value zero corresponds to an infinite range on the linear predictor scale. _A_u_t_h_o_r(_s): Simon N. Wood simon.wood@r-project.org Front end design inspired by the S function of the same name based on the work of Hastie and Tibshirani (1990). Underlying methods owe much to the work of Wahba (e.g. 1990) and Gu (e.g. 2002). _R_e_f_e_r_e_n_c_e_s: Key References on this implementation: Wood, S.N. (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Ass. 99:673-686. [Default method for additive case (but no longer for generalized)] Wood, S.N. (2008) Fast stable direct fitting and smoothness selection for generalized additive models. J.R.Statist.Soc.B 70(3):495-518. [Generalized additive model case] Wood, S.N. (2003) Thin plate regression splines. J.R.Statist.Soc.B 65(1):95-114 Wood, S.N. (2006a) Low rank scale invariant tensor product smooths for generalized additive mixed models. Biometrics 62(4):1025-1036 Wood S.N. (2006b) Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC Press. Wood, S.N. (2006c) On confidence intervals for generalized additive models based on penalized regression splines. Australian and New Zealand Journal of Statistics. 48(4): 445-464. Key Reference on GAMs and related models: Hastie (1993) in Chambers and Hastie (1993) Statistical Models in S. Chapman and Hall. Hastie and Tibshirani (1990) Generalized Additive Models. Chapman and Hall. Wahba (1990) Spline Models of Observational Data. SIAM Wood, S.N. (2000) Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties. J.R.Statist.Soc.B 62(2):413-428 [The original mgcv paper, but no longer the default methods.] Background References: Green and Silverman (1994) Nonparametric Regression and Generalized Linear Models. Chapman and Hall. Gu and Wahba (1991) Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM J. Sci. Statist. Comput. 12:383-398 Gu (2002) Smoothing Spline ANOVA Models, Springer. O'Sullivan, Yandall and Raynor (1986) Automatic smoothing of regression functions in generalized linear models. J. Am. Statist.Ass. 81:96-103 Wood (2001) mgcv:GAMs and Generalized Ridge Regression for R. R News 1(2):20-25 Wood and Augustin (2002) GAMs with integrated model selection using penalized regression splines and applications to environmental modelling. Ecological Modelling 157:157-177 _S_e_e _A_l_s_o: 'mgcv-package', 'gamObject', 'gam.models', 'smooth.terms', 'linear.functional.terms', 's', 'te' 'predict.gam', 'plot.gam', 'summary.gam', 'gam.side', 'gam.selection','mgcv', 'gam.control' 'gam.check', 'linear.functional.terms' 'negbin', 'magic','vis.gam' _E_x_a_m_p_l_e_s: library(mgcv) set.seed(0) ## simulate some data... dat <- gamSim(1,n=400,dist="normal",scale=2) b<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat) summary(b) plot(b,pages=1,residuals=TRUE) ## show partial residuals plot(b,pages=1,seWithMean=TRUE) ## `with intercept' CIs ## same fit in two parts ..... G<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),fit=FALSE,data=dat) b<-gam(G=G) print(b) ## change the smoothness selection method to REML b0<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat,method="REML") plot(b,pages=1) ## ... and do automatic terms selection as well b1<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat, method="REML",select=TRUE) plot(b1,pages=1) ## set the smoothing parameter for the first term, estimate rest ... bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),sp=c(0.01,-1,-1,-1),data=dat) plot(bp,pages=1) ## alternatively... bp <- gam(y~s(x0,sp=.01)+s(x1)+s(x2)+s(x3),data=dat) # set lower bounds on smoothing parameters .... bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3), min.sp=c(0.001,0.01,0,10),data=dat) print(b);print(bp) # same with REML bp<-gam(y~s(x0)+s(x1)+s(x2)+s(x3), min.sp=c(0.1,0.1,0,10),data=dat,method="REML") print(b0);print(bp) ## now a GAM with 3df regression spline term & 2 penalized terms b0<-gam(y~s(x0,k=4,fx=TRUE,bs="tp")+s(x1,k=12)+s(x2,k=15),data=dat) plot(b0,pages=1) ## now fit a 2-d term to x0,x1 b1<-gam(y~s(x0,x1)+s(x2)+s(x3),data=dat) par(mfrow=c(2,2)) plot(b1) par(mfrow=c(1,1)) ## now simulate poisson data... dat <- gamSim(1,n=400,dist="poisson",scale=.25) b2<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,data=dat) plot(b2,pages=1) ## repeat fit using performance iteration b3<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson, data=dat,optimizer="perf") plot(b3,pages=1) ## repeat using GACV as in Wood 2008... b4<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson, data=dat,method="GACV.Cp",scale=-1) plot(b4,pages=1) ## repeat using REML as in Wood 2008... b5<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson, data=dat,method="REML") plot(b5,pages=1) ## a binary example dat <- gamSim(1,n=400,dist="binary",scale=.33) lr.fit <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=binomial,data=dat) ## plot model components with truth overlaid in red op <- par(mfrow=c(2,2)) fn <- c("f0","f1","f2","f3");xn <- c("x0","x1","x2","x3") for (k in 1:4) { plot(lr.fit,residuals=TRUE,select=k) ff <- dat[[fn[k]]];xx <- dat[[xn[k]]] ind <- sort.int(xx,index.return=TRUE)$ix lines(xx[ind],(ff-mean(ff))[ind]*.33,col=2) } par(op) anova(lr.fit) lr.fit1 <- gam(y~s(x0)+s(x1)+s(x2),family=binomial,data=dat) lr.fit2 <- gam(y~s(x1)+s(x2),family=binomial,data=dat) AIC(lr.fit,lr.fit1,lr.fit2) ## now a 2D smoothing example... eg <- gamSim(2,n=500,scale=.1) attach(eg) op <- par(mfrow=c(2,2),mar=c(4,4,1,1)) contour(truth$x,truth$z,truth$f) ## contour truth b4 <- gam(y~s(x,z),data=data) ## fit model fit1 <- matrix(predict.gam(b4,pr,se=FALSE),40,40) contour(truth$x,truth$z,fit1) ## contour fit persp(truth$x,truth$z,truth$f) ## persp truth vis.gam(b4) ## persp fit detach(eg) par(op) ## very large dataset example with user defined knots par(mfrow=c(1,1)) eg <- gamSim(2,n=10000,scale=.5) attach(eg) ind<-sample(1:10000,1000,replace=FALSE) b5<-gam(y~s(x,z,k=50),data=data, knots=list(x=data$x[ind],z=data$z[ind])) vis.gam(b5) ## and a pure "knot based" spline of the same data b6<-gam(y~s(x,z,k=100),data=data,knots=list(x= rep((1:10-0.5)/10,10), z=rep((1:10-0.5)/10,rep(10,10)))) vis.gam(b6,color="heat") ## varying the default large dataset behaviour via `xt' b7 <- gam(y~s(x,z,k=50,xt=list(max.knots=1000,seed=2)),data=data) vis.gam(b7) detach(eg)