4. Clinical Epidemiology : the essentials. 4th edition.
Fletcher & Fletcher. Lippincott Williams & Wilkins
Thomas Newman :Lecture notes series -UCSF
Paul Rheeder: Lecture notes - UP
4
5. Examples of studies of diagnostic accuracy
◦ Diagnostic accuracy of CD4 cell count increase for virologic response after
initiating highly active antiretroviral therapy
Bisson, G P; Gross, R; Rollins, C; Bellamy, S; Weinstein, R; Friedman, H; et al
. AIDS, August 1, 2006, 20(12):1613-1619
Changes in total lymphocyte count as a surrogate for changes in CD4 count
following initiation of HAART: implications for monitoring in resource-limited
settings.Mahajan AP, Hogan JW, Snyder B, Kumarasamy N, Mehta K J Acquir
Immune Defic Syndr. 2004 May 1;36(1):567-75
◦ Validation of a WHO algorithm with risk assessment for the clinical management of
vaginal discharge in Mwanza, Tanzania.Mayaud P, ka-Gina G, Cornelissen J, Todd J,
Kaatano G , reference ?
5
6. The Evolution of Diagnostic reasoning
Patient either has the disease or not : D+ or D-
Test results are dichotomous
◦ Most tests have more than two possible answers
Disease states are dichotomous
◦ Many diseases occur on a spectrum
◦ There are many kinds of “nondisease”!
Evaluating diagnostic tests
Reliability
Accuracy
Usefulness
6
7. Sens and specificity
Logistic regression
PPV and NPV
ROC curves
Likelihood ratio
diagram from P. Rheeder :EBM notes 7
8. AIM: to use clinical and non clinical factors to cross
thresholds
Crossing test / treatment threshold
Do not test Test and treat on basis of Do not test
Do not treat test result Get on with treatment
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Likelihood of target disorder
8
9. Figure 1 : the relationship between a diagnostic test result and
occurrence of disease
Disease Disease
present (+) absent (-)
Test (+) True False
positives positives
Test (-) False True
negatives negatives
9
10. fig 1 shows the relationship between a diagnostic test result and
occurrence of disease
the goal of all studies aimed at describing the value of diagnostic tests
should be to obtain data for all four cells shown in fig 1
a test‟s accuracy is considered in relation to some reference standard or „
Gold Standard‟
some issues with tests of diagnostic accuracy :
1. lack of information on negative tests
2. lack of information on test results in the nondiseased
3. Lack of objective standards for disease
all the above 3 issues lead to the concern that no new test can perform
better than an established gold standard unless special strategies are used
10
11. describe how often the test is correct in the diseased
and non-diseased groups respectively
a Sn test has a high true positive ratio ( TPR ) and is
good in detecting patients with the target disease
a Sp test has a high true negative ratio ( TNR ) and is
good in detecting patients without the target disease
11
12. Sn = TPR = p(T+|D+ ) = a/a+c
Proportion of patients with disease who test positive
SNOUTS : sensitive test - ,rules out ( high true
positive ratio )
Disease present Disease absent
(+) (-)
Test (+) True positives a False positives b
Test (-) False negatives c True negatives d
12
13. Sp = TNR = p(T-|D- ) = d/b+d
The proportion of patients without disease who test
negative
SPINS : specific test + ,rules in ( high true negative
ratio ) Disease present Disease absent
(+) (-)
Test (+) True positives a False positives b
Test (-) False negatives c True negatives d
13
14. it is obviously desirable to have a test that is both
highly sensitive and highly specific
unfortunately , usually not possible – instead there is
a trade –off between the Sp and Sn
especially true when data take on range of values – in
this case , the location of a cuttoff point is an
arbitrary decision
as a result, for any given test result on a continous
scale, one characteristic i.e. Sn can only be increased
at the expense of the other i.e Sp
14
16. expresses the relationship between Sn and Sp
it is a popular summary measure of the discriminatory ability
of a clinical marker that can be used when there is a gold
standard
the ROC plots Sn against 1-Sp ( True positivity vs False
positivity ) for all thresholds that could have been used to
define „test positive‟
assessed by measuring the area under the curve (AUC)
AUC ranges from 0.5 (no discriminatory ability) to 1.0
(perfect discriminatory ability)
two diagnostic tests can be compared by calculating the
difference between the areas under their 2 ROC curves
16
19. Characteristics
1. shows how severe the trade –off between Sp and Sn is for a
test
2. the best cutt-off point is at or near the „shoulder „ of the
curve
3. the closer the curve follows the left hand border and then the
top border of the ROC space, the more accurate the test
4. the closer the curve follows the 45 degree diagonal of the
ROC space ,the less accurate the test
5. the slope of the tangent line at a cutt-off point gives the
likelihood ratio (LR) for the value of the test
19
20. Comparing diagnostic test performance
Accuracy is measured by the AUC
◦ 0.90 to 1 = excellent
◦ 0.80 to 0.90 = good
◦ 0.70 to 0.80 = fair
◦ 0.60 to 0.70 = poor
◦ 0.50 to 0.60 = fail
20
21. clinicians are more concerned with the following
question (than with Sp and Sn):
does the patient have the disease , given the results of a
test ?
21
22. the predictive value (PV) is the probability of disease,
given the results of a test
Only absolute diagnostic measure of diagnostic
accuracy
it is also known as the posterior ( posttest )
probability - the probability of disease after the test
result is known
Positive Predictive Value (PPV) is the probability of
disease in a patient with a positive (abnormal) test
result
Negative Predictive Value (NPV) is the probability of
not having the disease when the test result is negative
(normal)
22
23. PPV = a/a+b
NPV = d/c+d
P = a+c/(a+b+c+d)
Disease present Disease absent
(+) (-)
Test (+) True positives a False positives b
Test (-) False negatives c True negatives d
23
24. Prevalence ( P ) is the proportion of persons in a
defined population at a given point in time having the
condition in question
prevalence is also known as pretest (prior )
probability
24
25. Determinants of predictive value (PV)
the formula relating these concepts is derived from
Baye‟s theorem of conditional probabilities :
PPV = Sn * P / (Sn* P) + (1-Sp)*(1-P)
25
26. prevalence is an important determinant of the
interpretation of the result of a diagnostic test
when the prevalence of disease in the population
tested is relatively high – the test performs well
at lower prevalences, the PPV drops to nearly zero,
and the test is virtually useless
As Sn and Sp fall, the influence of prevalence on PV
becomes more pronounced !
26
27. Pitfalls in the literature
data from publications are often gathered in
university teaching hospitals were prevalence of
serious disease is relatively high – as a result
,statements about PPV of a test are applied in less
highly selected settings
occasionally ,authors compare the perfomance of a
test in a number of diseased patients to an equal
number of undiseased patients – this is efficient for
Sn and Sp but means little for PPV because already
the investigators have artificially set the prevalence of
disease at 50%
27
28. revisiting Baye‟s theorem :
The posttest probability (PPV)of disease is related to the pretest probability (prev )
and the test characteristics
Baye‟s formula makes use of 2 concepts
1. Odds
2. Likelihood ratio
The Likelihood Ratio (LR) is the probability of having a positive test
result when you have disease divided by the probability of having the
same result when you do not have disease ( it is an odds ratio )
pretest odds x LR = posttest odds
28
29. Advantages of LR
◦ Is more stable ( depends on Sn and Sp not prevalence)??
◦ Can use different cut-off values eg not dependent on one cut
off value only
◦ Used in Bayesian reasoning
◦ Likelihood ratios can deal with tests with more than two
possible results (not just normal/abnormal).
29
31. ROC curves can be compared statistically to see if
added info is of any benefit
regression coefficients can also be made into scores
risk scores used to predict outcome (Diagnosis)
31
32. Multiple Tests
◦ Usually there is need for multiple tests
1. Parallel testing
2. Serial testing
Clinical prediction rules
◦ These are rules used to “predict” diagnostic outcome
◦ A modification of parallel testing when a combination of multiple
tests are used – some with positive and some with negative results.
◦ Usually includes history , physical examination and certain
laboratory tests
The independence assumption
32
33. Clinical Prediction Model for Differentiation of Disseminated
Histoplasma capsulatum and Mycobacterium avium Complex Infections in
Febrile Patients With AIDS .Gravis E,Vanden H etal ,JAIDS 2000;24:30-36.
Background: Disseminated infection with Histoplasma capsulatum and Mycobacterium
avium complex (MAC) in patients with AIDS are frequently difficult to distinguish
clinically.
Methods: We retrospectively compared demographic information, other opportunistic
infections, medications, symptoms, physical examination findings and laboratory
parameters at the time of hospital presentation for 32 patients with culture documented
disseminated histoplasmosis and 58 patients with disseminated MAC infection.
Results: Positive predictors of histoplasma infection by univariate analysis included
lactate dehydrogenase level, white blood cell (WBC) count, platelet count, alkaline
phosphatase level, and CD4 cell count. By multivariate logistic regression analysis,
those characteristics that remained significant included a lactate dehydrogenase value
500 U/L (risk ratio [RR], 42; 95% confidence interval [CI], 18.53–97.5; p < .001),
alkaline phosphatase 300 U/L (RR, 9.35; 95% CI, 2.61–33.48; p .008), WBC
4.5 × 106/L (RR, 21.29; 95% CI, 6.79–66.75; p .008), and CD4 cell count (RR,
0.958; 95% CI, 0.946–0.971; p .001).
Conclusions: A predictive model for distinguishing disseminated histoplasmosis
from MAC infection was developed using lactate dehydrogenase and alkaline phosphatase
levels as well as WBC count. This model had a sensitivity of 83%, a specificity
of 91%, and a misclassification rate of 13%. 33
35. Overfitting Bias – “Data snooped” cutoffs take advantage of
chance variations in derivations set making test look falsely
good.
Incorporation Bias – index test part of gold standard
(Sensitivity Up, Specificity Up)
Verification/Referral Bias – positive index test increases
referral to gold standard (Sensitivity Up, Specificity Down)
Double Gold Standard – positive index test causes
application of definitive gold standard, negative index test
results in clinical follow-up (Sensitivity Up, Specificity Up)
Spectrum Bias
◦ D+ sickest of the sick (Sensitivity Up)
◦ D- wellest of the well (Specificity Up)
Clinicians,probability and EBM - T.
Newman MD 35
36. STARD statement is what CONSORT is to Clinical Trials and what
STROBE is to Observational Studies
The objective is to improve the accuracy and completeness of
reporting of studies of diagnostic accuracy, to allow readers to
assess the potential for bias in the study (internal validity) and to
evaluate its generalisability (external validity).
The STARD statement consist of a checklist of 25 items and
recommends the use of a flow diagram which describe the design of
the study and the flow of patients.
Handouts attached
More on www.stard-statement.org
36
37. Conclusion
These tools are very relevant in our setting as we have
lots of unanswered questions regarding alternative ,
cost-effective and optimal strategies for patient
monitoring and treatment simplification
37
Editor's Notes
Parallel – all at once , and a positive result of any test is considered as evidence for diseaseSerial – consecutive testing , decision to order the next test in the series based on the results of the previous test Independence assumption – when multiple tests are used , ideally they should be independent – the information contributed by each test must must somewhat be independent of the information provided by the preceding tests. Such that the next one does not duplicate the previous ones.