survfit {survival} | R Documentation |
Computes an estimate of a survival curve for censored data
using either the Kaplan-Meier or the Fleming-Harrington method
or computes the predicted survivor function.
For competing risks data it computes the
cumulative incidence curve. See survfit.coxph
for survival curves
from a fitted Cox model.
survfit(formula,...) ## S3 method for class 'formula': survfit(formula, data, weights, subset, na.action, etype, id, ...)
formula |
a formula object, which must have a
Surv object as the
response on the left of the ~ operator and, if desired, terms
separated by + operators on the right.
One of the terms may be a strata object.
For a single survival curve the right hand side should be ~ 1 .
|
data |
a data frame in which to interpret the variables named in the formula,
subset and weights arguments.
|
weights |
The weights must be nonnegative and it is strongly recommended that
they be strictly positive, since zero weights are ambiguous, compared
to use of the subset argument.
|
subset |
expression saying that only a subset of the rows of the data should be used in the fit. |
na.action |
a missing-data filter function, applied to the model frame, after any
subset argument has been used.
Default is options()$na.action .
|
etype |
a variable giving the type of event.
Presence of this variable signals the program to compute the cumulative
incidece estimate. For each event status==1 , the etype
variable indicates the type of event. For a censored observation the
value of etype is ignored - but do not set it to NA, since that
will cause na.action to delete the observation.
|
id |
identifies individual subjects, when a given person can have multiple
lines of data. when used with the etype variable, this allows
the compuation of
a cumulative prevalence estimate, i.e., the incidence over time.
|
... |
The following additional arguments are passed to internal functions
called by survfit .
|
The estimates used are the Kalbfleisch-Prentice (Kalbfleisch and Prentice, 1980, p.86) and the Tsiatis/Link/Breslow, which reduce to the Kaplan-Meier and Fleming-Harrington estimates, respectively, when the weights are unity.
The Greenwood formula for the variance is a sum of terms d/(n*(n-m)), where d is the number of deaths at a given time point, n is the sum of weights for all individuals still at risk at that time, and m is the sum of weights for the deaths at that time. The justification is based on a binomial argument when weights are all equal to one; extension to the weighted case is ad hoc. Tsiatis (1981) proposes a sum of terms d/(n*n), based on a counting process argument which includes the weighted case.
The two variants of the F-H estimate have to do with how ties are handled.
If there were 3 deaths out of 10 at risk, then the first
increments the hazard by 3/10 and the second
by 1/10 + 1/9 + 1/8.
For the first method S(t) = exp(H), where H is
the Nelson-Aalen cumulative hazard estimate,
whereas the fh2
method will
give results S(t) results closer to the Kaplan-Meier.
When the data set includes left censored or interval censored data (or both), then the EM approach of Turnbull is used to compute the overall curve. When the baseline method is the Kaplan-Meier, this is known to converge to the maximum likelihood estimate.
The cumulative incidence curve is an alternative to the Kaplan-Meier for competing risks data. For instance, in patients with MGUS, conversion to an overt plasma cell malignancy occurs at about 1% per year. A Kaplan-Meier estimate, treating death due to other causes as censored, gives a 20 year cumulate rate of 33% for the 241 early patients of Kyle. This estimates the incidence of conversion, if other causes of death were removed.
The CI estimate, on the other hand, estimates the total number of conversions that will actually occur. Because the population is older, this is much smaller than the KM, 22% at 20 years for Kyle's data. If there were no censoring, then CI(t) would simply be the total number of patients with progression by time t, divided by the sample size n.
an object of class "survfit"
.
See survfit.object
for
details. Methods defined for survfit objects are
print
, plot
,
lines
, and points
.
Dorey, F. J. and Korn, E. L. (1987). Effective sample sizes for confidence intervals for survival probabilities. Statistics in Medicine 6, 679-87.
Fleming, T. H. and Harrington, D. P. (1984). Nonparametric estimation of the survival distribution in censored data. Comm. in Statistics 13, 2469-86.
Kablfleisch, J. D. and Prentice, R. L. (1980). The Statistical Analysis of Failure Time Data. New York:Wiley.
Kyle, R. A. (1997). Moncolonal gammopathy of undetermined significance and solitary plasmacytoma. Implications for progression to overt multiple myeloma}, Hematology/Oncology Clinics N. Amer. 11, 71-87.
Link, C. L. (1984). Confidence intervals for the survival function using Cox's proportional hazards model with covariates. Biometrics 40, 601-610.
Turnbull, B. W. (1974). Nonparametric estimation of a survivorship function with doubly censored data. J Am Stat Assoc, 69, 169-173.
survfit.coxph
for survival curves from Cox models.
print
,
plot
,
lines
,
coxph
,
Surv
,
strata
.
#fit a Kaplan-Meier and plot it fit <- survfit(Surv(time, status) ~ x, data = aml) plot(fit, lty = 2:3) legend(100, .8, c("Maintained", "Nonmaintained"), lty = 2:3) #fit a Cox proportional hazards model and plot the #predicted survival for a 60 year old fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian) plot(survfit(fit, newdata=data.frame(age=60)), xscale=365.25, xlab = "Years", ylab="Survival") # Here is the data set from Turnbull # There are no interval censored subjects, only left-censored (status=3), # right-censored (status 0) and observed events (status 1) # # Time # 1 2 3 4 # Type of observation # death 12 6 2 3 # losses 3 2 0 3 # late entry 2 4 2 5 # tdata <- data.frame(time =c(1,1,1,2,2,2,3,3,3,4,4,4), status=rep(c(1,0,2),4), n =c(12,3,2,6,2,4,2,0,2,3,3,5)) fit <- survfit(Surv(time, time, status, type='interval') ~1, data=tdata, weight=n) # # Time to progression/death for patients with monoclonal gammopathy # Competing risk curves (cumulative incidence) fit1 <- survfit(Surv(stop, event=='progression') ~1, data=mgus1, subset=(start==0)) fit2 <- survfit(Surv(stop, status) ~1, data=mgus1, subset=(start==0), etype=event) #competing risks # CI curves are always plotted from 0 upwards, rather than 1 down plot(fit2, fun='event', xscale=365.25, xmax=7300, mark.time=FALSE, col=2:3, xlab="Years post diagnosis of MGUS") lines(fit1, fun='event', xscale=365.25, xmax=7300, mark.time=FALSE, conf.int=FALSE) text(10, .4, "Competing Risk: death", col=3) text(16, .15,"Competing Risk: progression", col=2) text(15, .30,"KM:prog")