survfit               package:survival               R Documentation

_C_o_m_p_u_t_e _a _S_u_r_v_i_v_a_l _C_u_r_v_e _f_o_r _C_e_n_s_o_r_e_d _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     Computes an estimate of a survival curve for censored data  using
     either the Kaplan-Meier or the Fleming-Harrington method  or
     computes the predicted survivor function. For competing risks data
     it computes the cumulative incidence curve. See 'survfit.coxph'
     for survival curves from a fitted Cox model.

_U_s_a_g_e:

     survfit(formula,...)
     ## S3 method for class 'formula':
     survfit(formula, data, weights, subset, na.action,  
             etype, id, ...)

_A_r_g_u_m_e_n_t_s:

 formula: a formula object, which must have a  'Surv' object as the  
          response on the left of the '~' operator and, if desired,
          terms   separated by + operators on the right.  One of the
          terms may be a 'strata' object. For a single survival curve
          the right hand side should be '~ 1'. 

    data: a data frame in which to interpret the variables named in the
          formula,  'subset' and 'weights' arguments.  

 weights: The weights must be nonnegative and it is strongly
          recommended that   they be strictly positive, since zero
          weights are ambiguous, compared  to use of the 'subset'
          argument. 

  subset: expression saying that only a subset of the rows of the data 
          should be used in the fit.  

na.action: a missing-data filter function, applied to the model frame,
          after any  'subset' argument has been used.  Default is
          'options()$na.action'.  

   etype: a variable giving the type of event. Presence of this
          variable signals the program to compute the cumulative
          incidece estimate.  For each event 'status==1', the etype
          variable indicates the type of event.  For a censored
          observation the value of 'etype' is ignored - but do not set
          it to NA, since that will cause na.action to delete the
          observation. 

      id: identifies individual subjects, when a given person can have
          multiple lines of data. when used with the 'etype' variable,
          this allows the compuation of a cumulative prevalence
          estimate, i.e., the incidence over time. 

     ...: The following additional arguments are passed to internal
          functions called by 'survfit'.

          _t_y_p_e a character string specifying the type of survival
               curve.  Possible values are '"kaplan-meier"', 
               '"fleming-harrington"' or '"fh2"'  if a formula is given
                and '"aalen"' or '"kaplan-meier"'  if the first
               argument is a 'coxph' object,  (only the first two
               characters are necessary).  The default is '"aalen"'
               when  a 'coxph' object is given,  and it is
               '"kaplan-meier"' otherwise.   Earlier versions of
               'survfit'  used 'type="tsiatis"' to get the '"aalen"'
               estimator.  For backward  compatibility, this is still
               allowed. 

          _e_r_r_o_r a character string specifying the error.  Possible
               values are  '"greenwood"' for the Greenwood formula or 
               '"tsiatis"' for the Tsiatis formula,  (only the first
               character is  necessary).   The default is '"tsiatis"'
               when  a 'coxph' object is given, and it is '"greenwood"'
               otherwise. 

          _c_o_n_f._t_y_p_e One of '"none"', '"plain"', '"log"' (the default),
               or '"log-log"'.  Only enough of the string to uniquely
               identify it is necessary. The first option causes
               confidence intervals not to be generated.  The second
               causes the standard intervals 'curve +- k *se(curve)',
               where k is determined from 'conf.int'.  The log option
               calculates intervals based on the cumulative hazard or
               log(survival). The last option bases intervals on the
               log hazard or log(-log(survival)). 

          _c_o_n_f._l_o_w_e_r a character string to specify modified lower
               limits to the curve, the  upper limit remains unchanged.
                 Possible values are '"usual"' (unmodified),  '"peto"',
                and '"modified"'.  T he modified lower limit  is based
               on an "effective n" argument.  The confidence  bands
               will agree with the usual calculation at each death
               time, but unlike  the usual bands the confidence
               interval becomes wider at each censored  observation. 
               The extra width is obtained by multiplying the usual 
               variance by a factor m/n, where n is the number
               currently at risk and  m is the number at risk at the
               last death time.  (The bands thus agree  with the
               un-modified bands at each death time.)  This is
               especially useful for survival curves with a long flat
               tail. 

               The Peto lower limit is based on the same "effective n"
               argument as the  modified limit, but also replaces the
               usual Greenwood variance term with  a simple
               approximation.  It is known to be conservative. 

          _s_t_a_r_t._t_i_m_e numeric value specifying a time to start
               calculating survival information. The resulting curve is
               the survival conditional on surviving to 'start.time'.

          _c_o_n_f._i_n_t the level for a two-sided confidence interval on the
               survival curve(s).  Default is 0.95. 

          _s_e._f_i_t a logical value indicating whether standard errors
               should be  computed.  Default is 'TRUE'. 


_D_e_t_a_i_l_s:

     The estimates used are the Kalbfleisch-Prentice  (Kalbfleisch and
     Prentice, 1980, p.86) and the Tsiatis/Link/Breslow,  which reduce
     to the Kaplan-Meier and Fleming-Harrington estimates, 
     respectively, when the weights are unity.  

     The Greenwood formula for the variance is a sum of terms 
     d/(n*(n-m)), where d is the number of deaths at a given time
     point, n  is the sum of weights for all individuals still at risk
     at that time, and  m is the sum of weights for the deaths at that
     time.  The  justification is based on a binomial argument when
     weights are all  equal to one; extension to the weighted case is
     ad hoc.  Tsiatis  (1981) proposes a sum of terms d/(n*n), based on
     a counting process  argument which includes the weighted case. 

     The two variants of the F-H estimate have to do with how ties are
     handled. If there were 3 deaths out of 10 at risk, then the first
     increments the hazard by 3/10 and the second by 1/10 + 1/9 + 1/8.
     For the first method S(t) = exp(H), where H is  the Nelson-Aalen
     cumulative hazard estimate, whereas the 'fh2' method will  give
     results S(t) results closer to the Kaplan-Meier. 

     When the data set includes left censored or interval censored data
     (or both), then the EM approach of Turnbull is used to compute the
     overall curve. When the baseline method is the Kaplan-Meier, this
     is known to converge to the maximum likelihood estimate.

     The cumulative incidence curve is an alternative to the
     Kaplan-Meier for competing risks data. For instance, in patients
     with MGUS, conversion to an overt plasma cell malignancy occurs at
     about 1% per year.   A Kaplan-Meier estimate, treating death due
     to other causes as censored, gives a 20 year cumulate rate of 33%
     for the 241 early patients of Kyle.   This estimates the incidence
     of conversion, if other causes of death were removed.

     The CI estimate, on the other hand, estimates the total number of
     conversions that will actually occur.  Because the population is
     older, this is much smaller than the KM, 22% at 20 years for
     Kyle's data. If there were no censoring, then CI(t) would simply
     be the total number of patients with progression by time t,
     divided by the sample size n.

_V_a_l_u_e:

     an object of class '"survfit"'.   See 'survfit.object' for 
     details. Methods defined for survfit objects are   'print',
     'plot',  'lines', and 'points'.

_R_e_f_e_r_e_n_c_e_s:

     Dorey, F. J. and Korn, E. L. (1987).  Effective sample sizes for
     confidence  intervals for survival probabilities.  _Statistics in
     Medicine_  *6*, 679-87. 

     Fleming, T. H. and Harrington, D. P. (1984).  Nonparametric
     estimation of the  survival distribution in censored data.  _Comm.
     in Statistics_   *13*, 2469-86. 

     Kablfleisch, J. D. and Prentice, R. L. (1980).   _The Statistical
     Analysis of Failure Time Data._ New York:Wiley. 

     Kyle, R. A. (1997). Moncolonal gammopathy of undetermined
     significance and solitary plasmacytoma. Implications for
     progression to overt multiple myeloma}, _Hematology/Oncology
     Clinics N. Amer._ *11*, 71-87.

     Link, C. L. (1984). Confidence intervals for the survival 
     function using Cox's proportional hazards model with   covariates.
      _Biometrics_   *40*, 601-610.

     Turnbull, B. W. (1974).  Nonparametric estimation of a
     survivorship function with doubly censored data. _J Am Stat
     Assoc_, *69*, 169-173.

_S_e_e _A_l_s_o:

     'survfit.coxph' for survival curves from Cox models.

     'print',   'plot',   'lines',    'coxph',   'Surv',   'strata'.

_E_x_a_m_p_l_e_s:

     #fit a Kaplan-Meier and plot it 
     fit <- survfit(Surv(time, status) ~ x, data = aml) 
     plot(fit, lty = 2:3) 
     legend(100, .8, c("Maintained", "Nonmaintained"), lty = 2:3) 

     #fit a Cox proportional hazards model and plot the  
     #predicted survival for a 60 year old 
     fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian) 
     plot(survfit(fit, newdata=data.frame(age=60)),
          xscale=365.25, xlab = "Years", ylab="Survival") 

     # Here is the data set from Turnbull
     #  There are no interval censored subjects, only left-censored (status=3),
     #  right-censored (status 0) and observed events (status 1)
     #
     #                             Time
     #                         1    2   3   4
     # Type of observation
     #           death        12    6   2   3
     #          losses         3    2   0   3
     #      late entry         2    4   2   5
     #
     tdata <- data.frame(time  =c(1,1,1,2,2,2,3,3,3,4,4,4),
                         status=rep(c(1,0,2),4),
                         n     =c(12,3,2,6,2,4,2,0,2,3,3,5))
     fit  <- survfit(Surv(time, time, status, type='interval') ~1, 
                   data=tdata, weight=n)

     #
     # Time to progression/death for patients with monoclonal gammopathy
     #  Competing risk curves (cumulative incidence)
     fit1 <- survfit(Surv(stop, event=='progression') ~1, data=mgus1,
                         subset=(start==0))
     fit2 <- survfit(Surv(stop, status) ~1, data=mgus1,
                         subset=(start==0), etype=event) #competing risks
     # CI curves are always plotted from 0 upwards, rather than 1 down
     plot(fit2, fun='event', xscale=365.25, xmax=7300, mark.time=FALSE,
                 col=2:3, xlab="Years post diagnosis of MGUS")
     lines(fit1, fun='event', xscale=365.25, xmax=7300, mark.time=FALSE,
                 conf.int=FALSE)
     text(10, .4, "Competing Risk: death", col=3)
     text(16, .15,"Competing Risk: progression", col=2)
     text(15, .30,"KM:prog")