magic package:mgcv R Documentation _S_t_a_b_l_e _M_u_l_t_i_p_l_e _S_m_o_o_t_h_i_n_g _P_a_r_a_m_e_t_e_r _E_s_t_i_m_a_t_i_o_n _b_y _G_C_V _o_r _U_B_R_E, _w_i_t_h _o_p_t_i_o_n_a_l _f_i_x_e_d _p_e_n_a_l_t_y _D_e_s_c_r_i_p_t_i_o_n: Function to efficiently estimate smoothing parameters in generalized ridge regression problems with multiple (quadratic) penalties, by GCV or UBRE. The function uses Newton's method in multi-dimensions, backed up by steepest descent to iteratively adjust the smoothing parameters for each penalty (one penalty may have a smoothing parameter fixed at 1). For maximal numerical stability the method is based on orthogonal decomposition methods, and attempts to deal with numerical rank deficiency gracefully using a truncated singular value decomposition approach. _U_s_a_g_e: magic(y,X,sp,S,off,L=NULL,lsp0=NULL,rank=NULL,H=NULL,C=NULL,w=NULL,gamma=1, scale=1,gcv=TRUE,ridge.parameter=NULL, control=list(maxit=50,tol=1e-6,step.half=25, rank.tol=.Machine$double.eps^0.5),extra.rss=0,n.score=length(y)) _A_r_g_u_m_e_n_t_s: y: is the response data vector. X: is the model matrix. sp: is the array of smoothing parameters. The vector 'L%*%log(sp) + lsp0' contains the logs of the smoothing parameters that actually multiply the penalty matrices stored in 'S' ('L' is taken as the identity if 'NULL'). Any 'sp' values that are negative are autoinitialized, otherwise they are taken as supplying starting values. A supplied starting value will be reset to a default starting value if the gradient of the GCV/UBRE score is too small at the supplied value. S: is a list of of penalty matrices. 'S[[i]]' is the ith penalty matrix, but note that it is not stored as a full matrix, but rather as the smallest square matrix including all the non-zero elements of the penalty matrix. Element 1,1 of 'S[[i]]' occupies element 'off[i]', 'off[i]' of the ith penalty matrix. Each 'S[[i]]' must be positive semi-definite. Set to 'list()' if there are no smoothing parameters to be estimated. off: is an array indicating the first parameter in the parameter vector that is penalized by the penalty involving 'S[[i]]'. L: is a matrix mapping 'log(sp)' to the log smoothing parameters that actually multiply the penalties defined by the elemts of 'S'. Taken as the identity, if 'NULL'. See above under 'sp'. lsp0: If 'L' is not 'NULL' this is a vector of constants in the linear transformation from 'log(sp)' to the actual log smoothing parameters. So the logs of the smoothing parameters multiplying the 'S[[i]]' are given by 'L%*%log(sp) + lsp0'. Taken as 0 if 'NULL'. rank: is an array specifying the ranks of the penalties. This is useful, but not essential, for forming square roots of the penalty matrices. H: is the optional offset penalty - i.e. a penalty with a smoothing parameter fixed at 1. This is useful for allowing regularization of the estimation process, fixed smoothing penalties etc. C: is the optional matrix specifying any linear equality constraints on the fitting problem. If b is the parameter vector then the parameters are forced to satisfy Cb=0. w: the regression weights. If this is a matrix then it is taken as being the square root of the inverse of the covariance matrix of 'y', specifically V_y^{-1}=w'w. If 'w' is an array then it is taken as the diagonal of this matrix, or simply the weight for each element of 'y'. See below for an example using this. gamma: is an inflation factor for the model degrees of freedom in the GCV or UBRE score. scale: is the scale parameter for use with UBRE. gcv: should be set to 'TRUE' if GCV is to be used, 'FALSE' for UBRE. ridge.parameter: It is sometimes useful to apply a ridge penalty to the fitting problem, penalizing the parameters in the constrained space directly. Setting this parameter to a value greater than zero will cause such a penalty to be used, with the magnitude given by the parameter value. control: is a list of iteration control constants with the following elements: _m_a_x_i_t The maximum number of iterations of the magic algorithm to allow. _t_o_l The tolerance to use in judging convergence. _s_t_e_p._h_a_l_f If a trial step fails then the method tries halving it up to a maximum of 'step.half' times. _r_a_n_k._t_o_l is a constant used to test for numerical rank deficiency of the problem. Basically any singular value less than 'rank_tol' multiplied by the largest singular value of the problem is set to zero. extra.rss: is a constant to be added to the residual sum of squares (squared norm) term in the calculation of the GCV, UBRE and scale parameter estimate. In conjuction with 'n.score', this is useful for certain methods for dealing with very large data sets. n.score: number to use as the number of data in GCV/UBRE score calculation: usually the actual number of data, but there are methods for dealing with very large datasets that change this. _D_e_t_a_i_l_s: The method is a computationally efficient means of applying GCV or UBRE (often approximately AIC) to the problem of smoothing parameter selection in generalized ridge regression problems of the form: min ||W(Xb-y)||^2 + b'Hb + theta_1 b'S_1 b + theta_2 b'S_2 b + . . . possibly subject to constraints Cb=0. X is a design matrix, b a parameter vector, y a data vector, W a weight matrix, S_i a positive semi-definite matrix of coefficients defining the ith penalty with associated smoothing parameter theta_i, H is the positive semi-definite offset penalty matrix and C a matrix of coefficients defining any linear equality constraints on the problem. X need not be of full column rank. The theta_i are chosen to minimize either the GCV score: V_g = n ||W(y-Ay)||^2/[tr(I - g A)]^2 or the UBRE score: V_u =||W(y-Ay||^2/n - 2 s tr(I - g A)/n + s where g is 'gamma' the inflation factor for degrees of freedom (usually set to 1) and s is 'scale', the scale parameter. A is the hat matrix (influence matrix) for the fitting problem (i.e the matrix mapping data to fitted values). Dependence of the scores on the smoothing parameters is through A. The method operates by Newton or steepest descent updates of the logs of the theta_i. A key aspect of the method is stable and economical calculation of the first and second derivatives of the scores w.r.t. the log smoothing parameters. Because the GCV/UBRE scores are flat w.r.t. very large or very small theta_i, it's important to get good starting parameters, and to be careful not to step into a flat region of the smoothing parameter space. For this reason the algorithm rescales any Newton step that would result in a log(theta_i) change of more than 5. Newton steps are only used if the Hessian of the GCV/UBRE is postive definite, otherwise steepest descent is used. Similarly steepest descent is used if the Newton step has to be contracted too far (indicating that the quadratic model underlying Newton is poor). All initial steepest descent steps are scaled so that their largest component is 1. However a step is calculated, it is never expanded if it is successful (to avoid flat portions of the objective), but steps are successively halved if they do not decrease the GCV/UBRE score, until they do, or the direction is deemed to have failed. (Given the smoothing parameters the optimal b parameters are easily found.) The method is coded in 'C' with matrix factorizations performed using LINPACK and LAPACK routines. _V_a_l_u_e: The function returns a list with the following items: b: The best fit parameters given the estimated smoothing parameters. scale: the estimated (GCV) or supplied (UBRE) scale parameter. score: the minimized GCV or UBRE score. sp: an array of the estimated smoothing parameters. sp.full: an array of the smoothing parameters that actually multiply the elements of 'S' (same as 'sp' if 'L' was 'NULL'). This is 'exp(L%*%log(sp))'. rV: a factored form of the parameter covariance matrix. The (Bayesian) covariance matrix of the parametes 'b' is given by 'rV%*%t(rV)*scale'. gcv.info: is a list of information about the performance of the method with the following elements: _f_u_l_l._r_a_n_k The apparent rank of the problem: number of parameters less number of equality constraints. _r_a_n_k The estimated actual rank of the problem (at the final iteration of the method). _f_u_l_l_y._c_o_n_v_e_r_g_e_d is 'TRUE' if the method converged by satisfying the convergence criteria, and 'FALSE' if it coverged by failing to decrease the score along the search direction. _h_e_s_s._p_o_s._d_e_f is 'TRUE' if the hessian of the UBRE or GCV score was positive definite at convergence. _i_t_e_r is the number of Newton/Steepest descent iterations taken. _s_c_o_r_e._c_a_l_l_s is the number of times that the GCV/UBRE score had to be evaluated. _r_m_s._g_r_a_d is the root mean square of the gradient of the UBRE/GCV score w.r.t. the smoothing parameters. Note that some further useful quantities can be obtained using 'magic.post.proc'. _A_u_t_h_o_r(_s): Simon N. Wood simon.wood@r-project.org _R_e_f_e_r_e_n_c_e_s: Wood, S.N. (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Ass. 99:673-686 _S_e_e _A_l_s_o: 'magic.post.proc', 'mgcv','gam' _E_x_a_m_p_l_e_s: ## Use `magic' for a standard additive model fit ... library(mgcv) set.seed(1);n <- 400;sig <- 2 dat <- gamSim(1,n=n,scale=sig) ## set up additive model G <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),fit=FALSE,data=dat) ## fit using magic mgfit <- magic(G$y,G$X,G$sp,G$S,G$off,rank=G$rank,C=G$C) ## and fit using gam as consistency check b <- gam(G=G) mgfit$sp;b$sp # compare smoothing parameter estimates edf <- magic.post.proc(G$X,mgfit,G$w)$edf # get e.d.f. per param range(edf-b$edf) # compare ## constrain first two smooths to have identical smoothing parameters L <- diag(3);L <- rbind(L[1,],L) mgfit <- magic(G$y,G$X,rep(-1,3),G$S,G$off,L=L,rank=G$rank,C=G$C) ## Now a correlated data example ... library(nlme) ## simulate truth set.seed(1);n<-400;sig<-2 x <- 0:(n-1)/(n-1) f <- 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10 ## produce scaled covariance matrix for AR1 errors... V <- corMatrix(Initialize(corAR1(.6),data.frame(x=x))) Cv <- chol(V) # t(Cv) ## Simulate AR1 errors ... e <- t(Cv)%*%rnorm(n,0,sig) # so cov(e) = V * sig^2 ## Observe truth + AR1 errors y <- f + e ## GAM ignoring correlation par(mfrow=c(1,2)) b <- gam(y~s(x,k=20)) plot(b);lines(x,f-mean(f),col=2);title("Ignoring correlation") ## Fit smooth, taking account of *known* correlation... w <- solve(t(Cv)) # V^{-1} = w'w ## Use `gam' to set up model for fitting... G <- gam(y~s(x,k=20),fit=FALSE) ## fit using magic, with weight *matrix* mgfit <- magic(G$y,G$X,G$sp,G$S,G$off,rank=G$rank,C=G$C,w=w) ## Modify previous gam object using new fit, for plotting... mg.stuff <- magic.post.proc(G$X,mgfit,w) b$edf <- mg.stuff$edf;b$Vp <- mg.stuff$Vb b$coefficients <- mgfit$b plot(b);lines(x,f-mean(f),col=2);title("Known correlation")