Tests of diagnostic accuracy

Simba Takuva, MBChB, MSc, DipHIVMan
Data Analysis and Working Group Meeting
Epidemiology and Biostatistics Division
Clinical HIV Research Unit

1

Medicine is the science of uncertainty and the
art of probability

…E. Mumford

2

1. Acknowledgements
2. Introduction
3. Sensitivity and Specificity
4. Receiver Operator Characteristic (ROC) curves
5. Predictive values (PV)
6. Prevalence
7. Likelihood Ratios (LR)
8. Logistic Regression
9. Clinical prediction rules
10. Bias in Studies in Diagnostic Accuracy
11. Reporting studies of diagnostic accuracy
12. References

3

 Clinical Epidemiology : the essentials. 4th edition.
Fletcher & Fletcher. Lippincott Williams & Wilkins
 Thomas Newman :Lecture notes series -UCSF
 Paul Rheeder: Lecture notes - UP

4

 Examples of studies of diagnostic accuracy
◦ Diagnostic accuracy of CD4 cell count increase for virologic response after
initiating highly active antiretroviral therapy
Bisson, G P; Gross, R; Rollins, C; Bellamy, S; Weinstein, R; Friedman, H; et al
. AIDS, August 1, 2006, 20(12):1613-1619

Changes in total lymphocyte count as a surrogate for changes in CD4 count
following initiation of HAART: implications for monitoring in resource-limited
settings.Mahajan AP, Hogan JW, Snyder B, Kumarasamy N, Mehta K J Acquir
Immune Defic Syndr. 2004 May 1;36(1):567-75

◦ Validation of a WHO algorithm with risk assessment for the clinical management of
vaginal discharge in Mwanza, Tanzania.Mayaud P, ka-Gina G, Cornelissen J, Todd J,
Kaatano G , reference ?

5

The Evolution of Diagnostic reasoning

 Patient either has the disease or not : D+ or D-

 Test results are dichotomous
◦ Most tests have more than two possible answers
 Disease states are dichotomous
◦ Many diseases occur on a spectrum
◦ There are many kinds of “nondisease”!

Evaluating diagnostic tests
 Reliability
 Accuracy
 Usefulness

6

Sens and specificity

Logistic regression

PPV and NPV

ROC curves

Likelihood ratio

diagram from P. Rheeder :EBM notes 7

AIM: to use clinical and non clinical factors to cross
thresholds

Crossing test / treatment threshold

Do not test Test and treat on basis of Do not test
Do not treat test result Get on with treatment

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Likelihood of target disorder

8

Figure 1 : the relationship between a diagnostic test result and
occurrence of disease

Disease Disease
present (+) absent (-)

Test (+) True False
positives positives

Test (-) False True
negatives negatives

9

 fig 1 shows the relationship between a diagnostic test result and
occurrence of disease
 the goal of all studies aimed at describing the value of diagnostic tests
should be to obtain data for all four cells shown in fig 1
 a test‟s accuracy is considered in relation to some reference standard or „
Gold Standard‟
 some issues with tests of diagnostic accuracy :
1. lack of information on negative tests
2. lack of information on test results in the nondiseased
3. Lack of objective standards for disease
 all the above 3 issues lead to the concern that no new test can perform
better than an established gold standard unless special strategies are used

10

 describe how often the test is correct in the diseased
and non-diseased groups respectively
 a Sn test has a high true positive ratio ( TPR ) and is
good in detecting patients with the target disease
 a Sp test has a high true negative ratio ( TNR ) and is
good in detecting patients without the target disease

11

 Sn = TPR = p(T+|D+ ) = a/a+c
 Proportion of patients with disease who test positive
 SNOUTS : sensitive test - ,rules out ( high true
positive ratio )
Disease present Disease absent
(+) (-)
Test (+) True positives a False positives b

Test (-) False negatives c True negatives d

12

 Sp = TNR = p(T-|D- ) = d/b+d
 The proportion of patients without disease who test
negative
 SPINS : specific test + ,rules in ( high true negative
ratio ) Disease present Disease absent
(+) (-)


13

 it is obviously desirable to have a test that is both
highly sensitive and highly specific
 unfortunately , usually not possible – instead there is
a trade –off between the Sp and Sn
 especially true when data take on range of values – in
this case , the location of a cuttoff point is an
arbitrary decision
 as a result, for any given test result on a continous
scale, one characteristic i.e. Sn can only be increased
at the expense of the other i.e Sp

14

 expresses the relationship between Sn and Sp
 it is a popular summary measure of the discriminatory ability
of a clinical marker that can be used when there is a gold
standard
 the ROC plots Sn against 1-Sp ( True positivity vs False
positivity ) for all thresholds that could have been used to
define „test positive‟
 assessed by measuring the area under the curve (AUC)
 AUC ranges from 0.5 (no discriminatory ability) to 1.0
(perfect discriminatory ability)
 two diagnostic tests can be compared by calculating the
difference between the areas under their 2 ROC curves

16

Characteristics
1. shows how severe the trade –off between Sp and Sn is for a
test
2. the best cutt-off point is at or near the „shoulder „ of the
curve
3. the closer the curve follows the left hand border and then the
top border of the ROC space, the more accurate the test
4. the closer the curve follows the 45 degree diagonal of the
ROC space ,the less accurate the test
5. the slope of the tangent line at a cutt-off point gives the
likelihood ratio (LR) for the value of the test

19

Comparing diagnostic test performance
 Accuracy is measured by the AUC
◦ 0.90 to 1 = excellent
◦ 0.80 to 0.90 = good
◦ 0.70 to 0.80 = fair
◦ 0.60 to 0.70 = poor
◦ 0.50 to 0.60 = fail

20

 clinicians are more concerned with the following
question (than with Sp and Sn):

does the patient have the disease , given the results of a
test ?

21

 the predictive value (PV) is the probability of disease,
given the results of a test
 Only absolute diagnostic measure of diagnostic
accuracy
 it is also known as the posterior ( posttest )
probability - the probability of disease after the test
result is known
 Positive Predictive Value (PPV) is the probability of
disease in a patient with a positive (abnormal) test
result
 Negative Predictive Value (NPV) is the probability of
not having the disease when the test result is negative
(normal)

22

 PPV = a/a+b
 NPV = d/c+d
 P = a+c/(a+b+c+d)

Disease present Disease absent
(+) (-)


23

 Prevalence ( P ) is the proportion of persons in a
defined population at a given point in time having the
condition in question
 prevalence is also known as pretest (prior )
probability

24

Determinants of predictive value (PV)

 the formula relating these concepts is derived from
Baye‟s theorem of conditional probabilities :

PPV = Sn * P / (Sn* P) + (1-Sp)*(1-P)

25

 prevalence is an important determinant of the
interpretation of the result of a diagnostic test
 when the prevalence of disease in the population
tested is relatively high – the test performs well
 at lower prevalences, the PPV drops to nearly zero,
and the test is virtually useless
 As Sn and Sp fall, the influence of prevalence on PV
becomes more pronounced !

26

Pitfalls in the literature
 data from publications are often gathered in
university teaching hospitals were prevalence of
serious disease is relatively high – as a result
,statements about PPV of a test are applied in less
highly selected settings
 occasionally ,authors compare the perfomance of a
test in a number of diseased patients to an equal
number of undiseased patients – this is efficient for
Sn and Sp but means little for PPV because already
the investigators have artificially set the prevalence of
disease at 50%

27

 revisiting Baye‟s theorem :

The posttest probability (PPV)of disease is related to the pretest probability (prev )
and the test characteristics

 Baye‟s formula makes use of 2 concepts
1. Odds
2. Likelihood ratio

 The Likelihood Ratio (LR) is the probability of having a positive test
result when you have disease divided by the probability of having the
same result when you do not have disease ( it is an odds ratio )

pretest odds x LR = posttest odds

28

 Advantages of LR
◦ Is more stable ( depends on Sn and Sp not prevalence)??
◦ Can use different cut-off values eg not dependent on one cut
off value only
◦ Used in Bayesian reasoning
◦ Likelihood ratios can deal with tests with more than two
possible results (not just normal/abnormal).

29

NEJM 1975; 293: 257
30

 ROC curves can be compared statistically to see if
added info is of any benefit
 regression coefficients can also be made into scores
 risk scores used to predict outcome (Diagnosis)

31

 Multiple Tests
◦ Usually there is need for multiple tests
1. Parallel testing
2. Serial testing

 Clinical prediction rules

◦ These are rules used to “predict” diagnostic outcome
◦ A modification of parallel testing when a combination of multiple
tests are used – some with positive and some with negative results.
◦ Usually includes history , physical examination and certain
laboratory tests

 The independence assumption

32

Clinical Prediction Model for Differentiation of Disseminated
Histoplasma capsulatum and Mycobacterium avium Complex Infections in
Febrile Patients With AIDS .Gravis E,Vanden H etal ,JAIDS 2000;24:30-36.

Background: Disseminated infection with Histoplasma capsulatum and Mycobacterium
avium complex (MAC) in patients with AIDS are frequently difficult to distinguish
clinically.
Methods: We retrospectively compared demographic information, other opportunistic
infections, medications, symptoms, physical examination findings and laboratory
parameters at the time of hospital presentation for 32 patients with culture documented
disseminated histoplasmosis and 58 patients with disseminated MAC infection.
Results: Positive predictors of histoplasma infection by univariate analysis included
lactate dehydrogenase level, white blood cell (WBC) count, platelet count, alkaline
phosphatase level, and CD4 cell count. By multivariate logistic regression analysis,
those characteristics that remained significant included a lactate dehydrogenase value
500 U/L (risk ratio [RR], 42; 95% confidence interval [CI], 18.53–97.5; p < .001),
alkaline phosphatase 300 U/L (RR, 9.35; 95% CI, 2.61–33.48; p .008), WBC
4.5 × 106/L (RR, 21.29; 95% CI, 6.79–66.75; p .008), and CD4 cell count (RR,
0.958; 95% CI, 0.946–0.971; p .001).
Conclusions: A predictive model for distinguishing disseminated histoplasmosis
from MAC infection was developed using lactate dehydrogenase and alkaline phosphatase
levels as well as WBC count. This model had a sensitivity of 83%, a specificity
of 91%, and a misclassification rate of 13%. 33

Clinical Prediction Model for
Differentiation of Disseminated
Histoplasma capsulatum and
Mycobacterium avium Complex
Infections in Febrile Patients With AIDS

Graviss, Edward A.; Vanden Heuvel,
Elizabeth A.; Lacke, Christine E.;
Spindel, Steven A.; White, A. Clinton Jr;
Hamill, Richard J.
JAIDS Journal of Acquired Immune
Deficiency Syndromes. 24(1):30-36,
May 1, 2000.
doi:

FIG. 1. Receiver operating characteristic
(ROC) curve for individual variables and
full model. The solid diagonal line
indicates an area under the curve (AUC)
of 0.5, which corresponds to a random
chance at discrimination. LDH, lactate
dehydrogenase; WBC, white blood cells;
Alk Phos, alkaline phosphatase.

Copyright © 2009 JAIDS Journal of Acquired Immune Deficiency Syndromes. Published by
34
Lippincott Williams & Wilkins.
34

 Overfitting Bias – “Data snooped” cutoffs take advantage of
chance variations in derivations set making test look falsely
good.
 Incorporation Bias – index test part of gold standard
(Sensitivity Up, Specificity Up)
 Verification/Referral Bias – positive index test increases
referral to gold standard (Sensitivity Up, Specificity Down)
 Double Gold Standard – positive index test causes
application of definitive gold standard, negative index test
results in clinical follow-up (Sensitivity Up, Specificity Up)
 Spectrum Bias
◦ D+ sickest of the sick (Sensitivity Up)
◦ D- wellest of the well (Specificity Up)

Clinicians,probability and EBM - T.
Newman MD 35

 STARD statement is what CONSORT is to Clinical Trials and what
STROBE is to Observational Studies

 The objective is to improve the accuracy and completeness of
reporting of studies of diagnostic accuracy, to allow readers to
assess the potential for bias in the study (internal validity) and to
evaluate its generalisability (external validity).

 The STARD statement consist of a checklist of 25 items and
recommends the use of a flow diagram which describe the design of
the study and the flow of patients.

 Handouts attached
 More on www.stard-statement.org
36

Conclusion
 These tools are very relevant in our setting as we have
lots of unanswered questions regarding alternative ,
cost-effective and optimal strategies for patient
monitoring and treatment simplification

37

Tests of diagnostic accuracy

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Tests of diagnostic accuracy

Similar to Tests of diagnostic accuracy (20)

Recently uploaded

Recently uploaded (20)

Tests of diagnostic accuracy

Editor's Notes