Survival Analysis Lecture.ppt

Outline
 What is survival analysis?
 Censored and truncated data
 Life table
 Kaplan-meier estimator
 Log-rank test
 Cox regression model

Survival Analysis
 We estimate and compare means and proportions by
confidence intervals and hypothesis testing
 We also make predictions by regression models
 But these methods cannot usually be used for ‘survival’ data
 This is because survival data differ from the types of data we
have studied so far in two important aspects:
Some observations do not experience the event of interest
when the study period is completed (censored)
Survival times are hardly ever normally distributed (skewed)

What is survival analysis?
 It is a branch of statistics that focuses on time-to-event
data and their analysis.
 Survival data deal with time until occurrence of any well-
defined event.
 The outcome variable examined is the survival time (the
time until the occurrence of the event).
 It is especial because it can incorporate information
about censored data into analysis.

Objectives of survival analysis
 Estimates the probability that an individual surpasses some
time-to-event.
E.g. The probability of surviving longer than two months
until second heart attach for a group of MI patients.
 Compare time-to-event between two or more groups.
E.g. Treatment vs placebo patients for a randomized
controlled trial.
 Assess the relationship of covariates to time-to-event.
E.g. Does weight, BP, sugar, height influence the survival
time for a group of patients?

Survival analysis
In order to define a failure time random variable, we need:
 An unambiguous time origin. (e.g. date of randomization
to clinical trial, time of exposure etc.)
 A real time ( e.g. days, years)
 Definition of the event (e.g. death,)

Survival analysis
You can use survival analysis when you wish to analyze survival
times or “time-to-event” intervals like:
From diagnosis to death
Time until response to a treatment
From exposure to development of symptom of disease
From treatment of infertility to conception
From the start of treatment to its failure
Time until resumption of smoking by someone who had quit
Time until certain percentage of weight loss
The statistical treatment of survival times (survival data) is
known as survival analysis.

8
Truncation and Censoring
 Truncation is about entering the study
Right: Event has already occurred (e.g. cancer registry)
Left: “staggered entry”
 Whereas, censoring is about leaving the study. This is because
survival data can be one of two types:
 Complete data – the value of each sample unit is observed or
known.
 Censored data – the time to the event of interest may not be
observed or the exact time is not known.

Truncation and Censoring cont…
 Censored data can occur when:
 The event of interest is death, but the patient is still alive at
the time of analysis.
 The individual was lost to follow-up without having the event
of interest.
 The event of interest is death by cancer but the patient died
of an unrelated cause, such as a car accident.
 The patient is dropped from the study without having
experienced the event of interest due to a protocol violation.
 Even if an observation is censored we will still include it in our
analysis.

10
Type of Censoring
 The most common type of censoring occurs when the event in
question has not yet occurred as of the time of last observation.
 This type of censoring is called right censoring.
 A follow up time is left censored if we know that the event of
interest took place at unknown time prior to the actual
observed time.
 Example: In a study modeling the age at which “regular”
smoking starts, a 12 year old subject may report that he is a
regular smoker but that he doesn’t remember when he started
smoking regularly

A ………..
B ._______________________________.
C ._______________________________...............
D ._____________________________.......
Recruitment interval Additional follow up interval
In clinical and some public health studies, participants are typically recruited
over a recruitment interval (this is called staggered entry) and then followed for
an additional period of time.
Type of Censoring cont..

12
Types of Censoring
 A is left censored.
 B is fully observed.
 C is right censored because the observation is lost to study.
This type of right censoring does not cause any problems if the
censoring is random.
 D is right censored because the observation period ends
before the event has occurred. This type of censoring does not
cause any problems for the analysis.

13
Assumptions
 Any standard method of survival analysis deals with right
censoring and left truncation, and has the following assumptions:
 Those at risk at time t are a random sample from the population of
interest at risk at time t. (This is called non-informative or
independent assumption)
 That means among those with the same values of X (group),
censored subjects must be at similar risk of subsequent events as
subjects with continued follow-up
 Censoring is inappropriate if the censoring mechanism is in any
way related to the probability of the event of interest. We call such
mechanisms ‘informative censoring’.

14
Life table
 The set of probabilities used in estimating the probability of the
occurrence of an event or survival at each year and the cumulative
probability of survival to each year is called a life table.
 To carry out the calculation, we first set out for each year (X) :
 the number alive at the start = nx
 the number withdrawn during the year=wx
 the number at risk = rx
 the number dying = dx

15
Life table calculation for parathyroid cancer survival:
the survival times are given in years after diagnosis
Year
( x )
Number at start
( nx )
Withdrawn
during year
( wx )
At risk
( rx )
Deaths
( dx )
Prob. of death
( qx )
Prob. of surviving
year X
( px )
Cumulative prob. of
surviving x years
( Px )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
20
17
15
14
14
13
12
9
8
7
5
3
2
2
2
1
1
1
2
2
0
0
1
1
1
0
1
0
2
0
0
0
0
0
0
1
19
16
15
14
13.5
12.5
11.5
9
7.5
7
4
3
2
2
2
1
1
0.5
1
0
1
0
0
0
2
1
0
2
0
1
0
0
1
0
0
0
0.0526
0
0.0667
0
0
0
0.1739
0.1111
0
0.2857
0
0.3333
0
0
0.5000
0
0
0
0.9474
1
0.9333
1
1
1
0.8261
0.8889
1
0.7143
1
0.6667
1
1
0.5000
1
1
1
0.9474
0.9474
0.8842
0.8842
0.8842
0.8842
0.7304
0.6493
0.6493
0.4638
0.4638
0.3092
0.3092
0.3092
0.1546
0.1546
0.1546
0.1546
rx=nx-½wx, qx=dx/rx, px=1-qx, Px=pxPx-1

Function describing survival times
Let T be a random variable that represents survival time
 The distribution of survival time can be described by
the survival function, S(t)
 S(t) = P(T > t)
 S(t) is the probability that a subject selected randomly
survives longer than time t
Properties
 S(t=0) = 1
 S(t) is bounded by 0 and 1, since it is a probability
 S(t) is a non-increasing function

Function ….. cont’d
 The median survival time (call it τ ) is just the time where 50%
of the observations have experienced the event.
 That means median survival time is the time where S(τ ) = 0.5
 In practice, however, we don’t usually hit the median survival
at exactly one of the failure times. In this case, the estimated
median survival is the smallest time τ such that: S(τ ) < 0.5

Survival function
 If there is no censoring, then a good estimator of S(t) at
time t, is:
S(t) = number of patients surviving longer than time t
total number of patients on trial
= Simply the proportion
 But usually there is censoring. Therefore, we can best
estimate S(t) using the Kaplam-Meier estimator or life
table

Kaplan-Meier (KM) estimator
 KM estimator helps us to find S(t) when there are censored
data.
 To find KM estimator, we break up survival probability into a
sequence of conditions
 The probability of surviving t (t > 2) or more years from the
beginning of the study is the product of the observed survival
rates. i.e. S(t) = p1p2p3…pt
 Note that if all the data are uncensored, the numerator of pi
cancel out with the denominator of pi+1 to give (nt-dt)/n0 which
is simply the proportion (look in the previous & the next slide)

Kaplan-meier estimator
 Mathematically we can put KM estimator as:
 Pj = estimated by the proportion of people living through tj out
those who have survived beyond tj-1
 nj = Number at risk at time tj
 dj = Number who died at time tj
 nj – dj = Number who survived beyond tj
 By convention , unlike life table , if any subjects are censored at time
tj, then they are considered to have survived for longer than the time tj
and adjustments of the form of (nj = 1/2wj) are not applied.

How to calculate the KM estimator
(Parathyroid cancer data)
E.g. -1: We have nine event times and ‘+’ indicates time of censoring
1+, 1+, 1, 2+, 2+, 3, 5+, 6+, 7+, 7, 7, 8, 9+, 10, 10, 11+, 11+, 12, 15, 18+
Recall that:
Ŝ(1) = Ŝ(0)p1 = (1)(19/20) = 0.9500
Ŝ(3) = Ŝ(1)p3 = (0.9500)(14/15) = 0.8867
Ŝ(7) = Ŝ(3)p7 = (0.8867)(10/12) = 0.7389
Ŝ(8) = Ŝ(7)p8 = (0.7389)(8/9) = 0.6568
Ŝ(10) = Ŝ(8)p10 = (0.6568)(5/7) = 0.4691
Ŝ(12) = Ŝ(10)p12 = (0.4691)(2/3) = 0.3128
Ŝ(15) = Ŝ(12)p15 = (0.3128)(1/2) = 0.1564
Take notice of why the
survival time using KM is
different from the survival
time from life table which
is due to the difference in
conventions in treating
censored data while
calculating number at risk
Where the product is taken over all
time intervals in which a death
occurred , up to and including t

Kaplan Meier curve cont…
Example-2; Motion sickness data: In an experiment of two
drugs on 49 passengers to delay vomiting, there were 21
passengers in the first experiment (drug) of which five of
them were definite events (vomiting) at 30, 50, 51, 82, and
92 minutes.
In the second experiment, there were 28 individuals from
which 14 were the events at 5, 13, 24, 63, 65, 79, 102, and
115 minutes and 2 each at 11, 69, and 82 minutes.

Kaplan Meier curve cont…
Data set
Experiment 1 Experiment 2
Subject
number
Survival time
(min)
Subject
number
Survival time
(min)
Subject
number
Survival time
(min)
1 30 1 5 14 102
2 50 2 6* 15 115
3 50* 3 11 16 120*
4 51 4 11 17 120*
5 66* 5 13 . .
6 82 6 24 . .
7 92 7 63 . .
8 120* 8 65 28 120*
9 120* 9 69
. . 10 69
. . 11 79
. . 12 82
21 120* 13 82

Kaplan Meier curve
Variable definition for motion sickness data (study):
 Time is the time in minute from the point of randomization to
either vomiting or censoring
 Status has a value of 1 if a passenger vomited and a value of 0 if
censored. This tells us that the censored value will be 0 if a
passenger did not vomit till the end of the study
 Drug specifies a value of 1 or 2 that corresponds to treatment
1 and treatment 2 respectively

How to use SPSS
 Analyze > Survival > Kaplan Meier
 Time: Time
 Status: status(1)
Here define 1 since it is the value indicating event has
occurred (i.e. vomiting)
 Options: Check the survival plot
Kaplan Meier curve …

Kaplan Meier curve …
If the last observation is uncensored, the K-M estimate at
that time equals zero
Each time there is a censoring, the denominator in the life table
changes, but the plot in K-M curve stays the same. Censorings are
marked by hash-marks.

29
Limitations of Kaplan-Meier
 Mainly descriptive
 Doesn’t control for covariates
 Requires categorical predictors
 Can’t accommodate time-dependent variables

Cox regression Models
 Cox regression is a regression method introduced by Sir Cox
in 1972
 It is a model that relates the time that passes before some
event occurs to one or more covariates that may be
associated with that amount of time.
 It is also known as proportional hazard regression analysis.

Cox regression model
 This model produces a survival function that predicts the
probability that an event has occurred at a given time t, for
given predictor variables (covariates).
 Unlike linear regression, survival analysis has a dichotomous
(binary) outcome
 Unlike logistic regression, it analyzes the time to an event, and
has the following added values
Able to account for censoring
Can compare survival between two or more groups
Assess relationship between covariates and survival time

33
Cox Regression model…
 It is a semi-parametric model
 Cox regression models the effect of predictors and covariates
on the hazard rate but leaves the baseline hazard rate
unspecified.
 It does NOT assume knowledge of absolute risk.
 Rather it estimates relative rather than absolute risk.

Cox regression model
h(t) = ho(t)eβXi + βo)
t is the time
Xi is the covariate for the ith individual
ho(t) is the baseline hazard function.
 i.e. ho(t) is the function when all the covariates equal to
zero

35
Hazard Function
t
t
T
t
t
T
t
P
t
h
t 









)
/
(
lim
)
(
0
In words: the probability that if you survive to t, you will
surrender to the event in the next instant.

Interpretation of the betas
 First we need to find the ratio when there is a one unit increase
in the covariate, provided the other covariate remain fixed.
h(t, x1+1) = ho(t)eβ(x+1) = eβ
h(t, x1) ho(t)eβ(x)
 β is the increase in log hazard ratio for a unit increase in
covariate X
 Note that Censored cases are not used in the computation of the
regression coefficients, but are used to compute the baseline
hazards.

The Cox regression model cont…
 The time variable should be quantitative.
 The status variable can be categorical or continuous.
 The independent variables (covariates) can be continuous or
categorical;
 If categorical, they should be dummy or indicator coded (there
is an option in the procedure to recode categorical variables
automatically).
37

 If there is one or no categorical covariate, the Kplan-Meier
or Life Table procedure can be used
 If there is no censored data in the sample, the linear
regression procedure can be used to model the relationship
between predictors and time to event.
38

SPSS output for motion sickness data
Interpretation:
 The hazard of vomiting for patients receiving treatment 1 is
25.4% of that on treatment 2 patients.
Variables in the Equation
B SE Wald df Sig. Exp(B)
95.0% CI for
Exp(B)
Lower Upper
treatment -1.372 .547 6.286 1 .012 .254 .087 .741

Variable Coefficient Standard error 95% CI for B Exp(B) P-value
Multiple gal
stone
0.838 0.401 0.046 to1.631 2.313 0.036
Maximum
diameter
-0.023 0.036 -0.094 to 0.049 0.978 0.531
Months to
dissolve
0.044 0.017 0.011 to 0.078 1.045 0.008
Example -2 Recurrence of gallbladder (data can be accessed from SPSS
template or Martin B.)
The chi-squared statistics tests the relationship between the
time to recurrence and the three variables together
A positive b coefficient shows an increased risk of the event, in
this case recurrence.

Cox regression cont…
 The maximum diameter has no significance relationship to time
to recurrence,
 The coefficient for multiple gall stones is 0.838. If we antilog
this, we get exp(0.838) = 2.31
 This is interpreted as a patient with multiple gall stone is 2.31
times as likely to have a recurrence at any time as a patient with
a single stone.
 The 95% CI for this estimate (relative hazard) is 1.05 to 5.11
. 41

Example -3
In a cancer drug trial, 37 patients were randomized to the treatment
group and 32 patients to the control group. Their survival times
(until death) are measured in months and some observations are
censored. (variables: group. Sex, and age)
Result of Cox regression for the cancer trial example
42
Explanatory variable Hazard Ratio 95% CI P-value
Group
Control
Treatment
1.0
0.1052 0.086-0.262 <0.0001
Sex
Male
Female
1.0
0.9127 0.732-1.366 0.4342
Age 1.127 1.103-1.152 0.002

Interpretation
 The death hazard in the treatment group is 0.1052 times (95 per
cent CI: 0.086–0.262) than in the control group, reducing the risk
by almost 90 per cent at any given time. (p < 0.0001)
 The death hazard for females does not significantly differ from that
for males (the CI includes 1.0, the p-value is large)
 Each 1-year increase in age results in the death hazard increasing
by a factor of 1.127, a p-value of 0.002:
43

Survival Analysis Lecture.ppt

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Survival Analysis Lecture.ppt

Similaire à Survival Analysis Lecture.ppt (20)

Plus de habtamu biazin

Plus de habtamu biazin (20)

Dernier

Dernier (20)

Survival Analysis Lecture.ppt

Notes de l'éditeur