negbin                 package:mgcv                 R Documentation

_G_A_M _n_e_g_a_t_i_v_e _b_i_n_o_m_i_a_l _f_a_m_i_l_y

_D_e_s_c_r_i_p_t_i_o_n:

     The 'gam' modelling function is designed to be able to use  the
     'negbin' family (a modification of MASS library
     'negative.binomial' family  by Venables and Ripley), with or
     without a known theta parameter.  Two approaches to estimating the
     'theta' parameter are available:


        *  If `performance iteration' is used for smoothing parameter
           estimation  (see 'gam'), then smoothing parameters are
           chosen by GCV and  'theta' is chosen in order to ensure that
           the Pearson estimate of the scale  parameter is as close as
           possible to 1, the value that the scale parameter should
           have.

        *  If `outer iteration' is used for smoothing parameter
           selection, and smoothing parameters  are chosen by UBRE/AIC
           (with scale parameter set to 1) then a value of 'theta' is  
           searched for which minimizes the AIC of the model.
           Alternatively If (RE)ML is used for smoothing  parameter
           estimation then a value of 'theta' is searched for which
           maximizes the (restricted)  likelihood. 

     The second option is much slower than the first, but the first can
     sometimes fail to converge.  To use the first option, set the
     'optimizer' argument of 'gam' to '"perf"'.

_U_s_a_g_e:

     negbin(theta = stop("'theta' must be specified"), link = "log")

_A_r_g_u_m_e_n_t_s:

   theta: Either i) a single value known value of theta, ii) two values
          of theta specifying the  endpoints of an interval over which
          to search for theta or iii) an array of values of theta,
          specifying the set of theta values to search. (iii) is only
          available with AIC based theta estimation. 

    link: The link function: one of '"log"', '"identity"' or '"sqrt"'

_D_e_t_a_i_l_s:

     If a single value of 'theta' is supplied then it is always taken
     as the known fixed value,  and estimation of smoothing paramaters
     is then by UBRE/AIC. If 'theta' is two numbers
     ('theta[2]>theta[1]')  then they are taken as specifying the range
     of values over which to search for  the optimal theta. If 'theta'
     is any other array of numbers then they are taken as the discrete
     set of values of theta over which to search for 'theta'. The
     latter  option only works with AIC based outer iteration, if
     performance iteration is used then an array will only be used to
     define a search range.

     If performance iteration is used (see 'gam' argument 'optimizer')
     then the method  of estimation is to choose theta  so that the GCV
     (Pearson) estimate  of the scale parameter is one (since the scale
     parameter  is one for the negative binomial). In this case theta
     estimation is nested within the IRLS loop  used for GAM fitting.
     After each call to fit an iteratively weighted additive model to
     the IRLS pseudodata,  the theta estimate is updated. This is done
     by conditioning on all components of the current GCV/Pearson 
     estimator of the scale parameter except theta and then searching
     for the  theta which equates this conditional  estimator to one.
     The search is  a simple bisection search after an initial crude
     line search to bracket one. The search will  terminate at the
     upper boundary of the search region is a Poisson fit would have
     yielded an estimated  scale parameter <1.

     If outer iteration is used then theta is estimated by searching
     for the value yielding the lowest AIC. The search is either over
     the supplied array of values, or is a grid search over the
     supplied range, followed by a  golden section search. A full fit
     is required for each trial theta, so the process is slow, but
     speed is enhanced by making the changes in theta as small as
     possible, from one step to the next, and  using the previous
     smothing parameter and fitted values to start the new fit. 

     In a simulation test based on 800 replicates of the first example
     data, given below, the GCV based (performance iteration) method
     yielded models with,  on avergage 6% better MSE performance than
     the AIC based (outer iteration) method.  'theta' had a 0.86
     correlation coefficient between the two methods. 'theta' estimates
     averaged 3.36 with a standard deviation of 0.44 for the AIC based
     method and 3.22 with a standard deviation of 0.43 for the GCV
     based method. However the GCV based method is  less
     computationally reliable, failing in around 4% of replicates.

_V_a_l_u_e:

     An object inheriting from class 'family', with additional elements 

    dvar: the function giving the first derivative of the variance
          function w.r.t. 'mu'.

   d2var: the function giving the second derivative of the variance
          function w.r.t. 'mu'.

getTheta: A function for retrieving the value(s) of theta. This also
          useful for retriving the  estimate of 'theta' after fitting
          (see example).

_W_A_R_N_I_N_G_S:

     'gamm' does not support 'theta' estimation

     The negative binomial functions from the MASS library are no
     longer supported.

_A_u_t_h_o_r(_s):

     Simon N. Wood simon.wood@r-project.org modified from Venables and
     Ripley's 'negative.binomial' family.

_R_e_f_e_r_e_n_c_e_s:

     Venables, B. and B.R. Ripley (2002) Modern Applied Statistics in
     S, Springer.

_E_x_a_m_p_l_e_s:

     library(mgcv)
     set.seed(3)
     n<-400
     dat <- gamSim(1,n=n)
     g <- exp(dat$f/5)

     # negative binomial data  
     dat$y <- rnbinom(g,size=3,mu=g)
     # known theta ...
     b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(3),data=dat)
     plot(b,pages=1)
     print(b)

     ## unknown theta via performance iteration...
     b1 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(c(1,10)),
               optimizer="perf",data=dat)
     plot(b1,pages=1)
     print(b1)

     ## unknown theta via outer iteration and AIC search...
     b2<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(c(1,10)),
             data=dat)
     plot(b2,pages=1)
     print(b2)

     ## Same again all by  REML...
     b2a <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(c(1,10)),
             data=dat,method="REML")
     plot(b2a,pages=1)
     print(b2a)

     ## how to retrieve Theta...
     b2a$family$getTheta()

     ## unknown theta via outer iteration and AIC search
     ## over a discrete set of values...
     b3<-gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(2:10/2),
             data=dat)
     plot(b3,pages=1)
     print(b3)

     ## another example...
     set.seed(1)
     f <- dat$f
     f <- f - min(f);g <- f^2
     dat$y <- rnbinom(g,size=3,mu=g)
     b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negbin(1:10,link="sqrt"),
              data=dat) 
     plot(b,pages=1)
     print(b)
     rm(dat)