MRCPsych - How To Analyse Diagnostic Test Studies (May09)

MRCPsych Teaching 2009
MRCPsych 2009

Critical Appraisal of Diagnostic Tests
Studies of Accuracy, Validity, Screening & Case finding

Alex J Mitchell
Consultant in Liaison Psychiatry
University of Leicester

Contents
MRCPsych 2009

1. Importance of understanding diagnostic tests
2. Concept of diagnostic tests: traits to diseases
3. Statistics of diagnostic tests
4. Clinical Value of diagnostic tests
5. Worked examples
6. Advances techniques

1. Importance of understanding diagnostic tests

What Is a Diagnostic Test in Psychiatry?
MRCPsych 2009

• CT/MRI
• CSF
• Blood tests eg TFTs
• SCAN/SCID/PSE/MINI
• Neuropsychological Testing
• MMSE
• HADS/BDI/CESD?
• Clinical Judgement
• Self-report

Why Is a HADS score not a diagnosis?
MRCPsych 2009

Why Is a HADS score not a diagnosis?
MRCPsych 2009

1. No core features
2. No symptom ranking
3. No functional assessment
4. Duration unclear
5. What if Missing items?
6. Imprecise

Defining Diagnostic Testing
MRCPsych 2009

• INTENTION
• Screening
– The systematic application of a test or inquiry, to identify individuals at sufficient risk of a
specific disorder to warrant further actions among those who have not sought medical help for
that disorder
• Case-Finding
– The selected application of a test or inquiry, to identify individuals with a suspected disorder
and exclude those without a disorder, usually in those who have sought medical help for that
disorder

• APPLICATION
• Targeted (High Risk)
– The highly selected application of a test or inquiry, to identify individuals at high risk of a
specific disorder by virtue of known risk factors

• Routine Screening
– The systematic application of a test or inquiry, to individuals without a known disorder (or who
have not sought medical help for that disorder)

Adapted from Department of Health. Annual report of the national screening committee. London: DoH, 1997.

Defining Diagnostic Testing
MRCPsych 2009

• COMPARATOR
• Accuracy
– The degree of approximation (veracity) to a robust comparator

• Validity
– The degree of approximation (veracity) to a criterion reference

• Precision
– The degree of predictability (low SD) in the measure

Aims of Detection
MRCPsych 2009

• Screening:
– Short; Easy; some false +ve (low SpS PPV), few false
–ve (High Sens, NPV)

• Diagnosis (case-finding)
– Accurate, Few false +ve or –ve

• Rating
– Simple, patient rated, correl. With QoL and other
outcomes

UK National Screening Committee Guidelines
MRCPsych 2009

• The condition should: • The screening program should:
• • Be an important health issue • • Show evidence that benefits of screening
• • Have a well-understood history, with a outweighing risks
detectable risk factor or disease marker • • Be acceptable to public and professionals
• • Have cost-effective primary preventions • • Be cost effective (and have ongoing
implemented. evaluation)
• • Have quality-assurance strategies in place.
• The screening tool should: • Adapted from: UK National Screening
• • Be a valid tool with known cut-off Committee Criteria for appraising the
• • Be acceptable to the public viability, effectiveness and appropriateness of
a screening programme
• • Have agreed diagnostic procedures.
• http://www.nsc.nhs.uk/pdfs/criteria.pdf
• The treatment should:
• • Be effective, with evidence of benefits of
early intervention
• • Have adequate resources
• • Have appropriate policies as to who should
be treated.

Development of Diagnostic Tests
MRCPsych 2009

Stage Type Purpose Description

Pre-clinical Development Development of the proposed tool or Here the aim is to develop a screening method that is likely to help in the detection of the
test underlying disorder, either in a specific setting or in all setting. Issues of acceptability of the
tool to both patients and staff must be considered in order for implementation to be
successful.
Phase Diagnostic validity Early diagnostic validity testing in a The aim is to evaluate the early design of the screening method against a known (ideally
I_screen selected sample and refinement of tool accurate) standard known as the criterion reference. In early testing the tool may be
refined, selecting most useful aspects and deleting redundant aspects in order to make the
tool as efficient (brief) as possible whilst retaining its value.

Phase Diagnostic validity Diagnostic validity in a representative The aim is to assess the refined tool against a criterion (gold standard) in a real world
II_screen sample sample where the comparator subjects may comprise several competing condition which
may otherwise cause difficulty regarding differential diagnosis.

Phase Implementation Screening RCT; clinicians using vs not This is an important step in which the tool is evaluated clinically in one group with access
III_screen using a screening tool to the new method compared to a second group (ideally selected in a randomized fashion)
who make assessments without the tool.

Phase Implementation Screening implementation studies using In this last step the screening tool /method is introduced clinically but monitored to discover
IV_screen real-world outcomes the effect on important patient outcomes such as new identifications, new cases treated
and new cases entering remission.

Theory of Diagnostic Tests
MRCPsych 2009

Cut-off value

Non-Depressed

Depressed
#
of
Individuals True -ve

True +ve

False -ve False +ve

Test
Result

Low Prevalence (Se Sp = same)
MRCPsych 2009

Cut-off value

Non-Depressed

Mj Depression
#
of
Individuals

False –ve False +ve
SMALL LARGE

Test
Result

High Prevalence (Se Sp = same)
MRCPsych 2009

Cut-off value

Non-Depressed Mj+Mn Depression

#
of
Individuals

False –ve False +ve
LARGE SMALL

Test
Result

2. Concepts of Diagnostic Tests:
Trait / Syndrome / Disease

Can This Help establish a syndrome?

Example: A Clear Disease
[#1] Point of Partial Rarity

Number
of
Individuals

No Disorder

True ‐ve
True ‐ve

True +ve
True +ve
Disorder

False +ve
False +ve False ‐ve
False ‐ve

Test Result

Example: A Probable Syndrome
[#2]
Number
of
Individuals

No Disorder

True ‐ve
True ‐ve

True +ve
True +ve
Disorder

False +ve
False ‐ve

MMSE Cognitive Score

Example: A Normally
Distributed Trait [#3]
Number
of
Individuals

No Disorder

True ‐ve
True ‐ve

True +ve
True +ve
Disorder

False +ve
False ‐ve

MMSE Cognitive Score

MRCPsych 2009

Example: Dementia

Disease?
Syndrome?
Trait?

Hubbert et al (2005) BMC
Geriatrics
MRCPsych 2009

MMSE scores for dementia (n=72)
and non-dementia (n=2735)

Huppert et al BMC Geriatrc 2005

MRCPsych 2009

Example: Depression

Disease
Syndrome
Trait

Mitchell, Coyne et al (2008)
110 MRCPsych 2009

100 Scores on the CES-D during Pregnancy, 3 and 12 months Post-partum in 947 Women

90

80

70

60

Early Pregnancy
50 3months Post-Partum
12months Post-Partum
40

30

20

10

0

Healthy Depressive Symptoms Mild Depression Moderate to Severe Depression

PHQ9 Linear distribution

35
MRCPsych 2009

30

PHQ9 (Major Depression)
25 PHQ9 (Minor Depression)
PHQ9 (Non-Depressed)

20

15

10

5

0

ve

n
en
n
ro

e
e
o

ve

n

en
n
ur

en

en
ne

x

t

n
gh

ee
Tw

re

Te
ve

n

ee
Si

ee
Ze

Fo

el
Fi

ev
Ni

te

te
O

fte
Th

Ei

nt
Se

Tw

irt

xt
ur

gh
El

Fi

ve
Th

Si
Fo

Ei
Se
Baker-Glen, Mitchell et al (2008)

0
500
1000
1500
2000
2500
3000
Ze
ro

O
ne
MRCPsych 2009
Tw
o
Th
re
e

Fo
ur

Fi
ve

Si
x
Se
ve
n

ei
gh
t
N
in
e

Te
n
El
ev
en
Tw
el
ve
Th
irt
ee
n
Fo
ur
te
en
Fi
fte
en
Si
Thompson et al (2001) n=18,414

xt
ee
Se n
ve
nt
ee
n
Ei
gh
te
en

3. Statistics of Diagnostic Tests: 2x2s

Reference Standard Reference Standard

Accuracy 2x2 Table Test
Disorder Present No Disorder

A/A + B
MRCPsych 2009
+ve A B PPV

Depression Depression Test
-ve C D
D/C + D
NPV

PRESENT ABSENT Total A/ A + C D/ B + D
Sn Sp

Test +ve True +ve False +ve PPV

Test -ve False -Ve True -Ve NPV

Sensitivity Specificity Prevalence

Accuracy 2x2 Table
MRCPsych 2009

Depression Depression
PRESENT ABSENT

Test +ve TP FP PPV

Test -ve FN TN NPV


Basic Measures of Accuracy
MRCPsych 2009

• Sensitivity (Se) a/(a + c) TP / (TP + FN)

• A measure of accuracy defined the proportion of patients with disease in whom
the test result is positive: a/(a + c)

• Specificity (Sp) d/(b + d) TN / (TN + FP)
• A measure of accuracy defined as the proportion of patients without disease in
whom the test result is negative

• Positive Predictive Value a/(a+b) TP / (TP + FP)
• A measure of rule-in accuracy defined as the proportion of true positives in
those that screen positive screening result, as follows

• Negative Predictive Value c/(c+d) TN / (TN + FN)
• A measure of rule-out accuracy defined as the proportion of true negatives in
those that screen negative screening result, as follows

Accuracy in words
MRCPsych 2009
• Sensitivity
– The chance of testing positive among those with the condition
– The chance of rejecting the null hypothesis among those that do not satisfy the null hypothesis
• Specificity
– The chance of testing negative among those without the condition
– The chance of accepting the null hypothesis among those that satisfy the null hypothesis
• Positive Predictive Value
– The chance of having the condition among those that test positive
– The chance of not satisfying the null hypothesis among those that reject the null hypothesis
• Negative Predictive Value
– The chance of not having the condition among those that test negative
– The chance of satisfying the null hypothesis among those that accept the null hypothesis
• Type I Error or α (alpha) or p-Value or false positive rate
– The chance of testing positive among those without the condition
– The chance of rejecting the null hypothesis among those that satisfy the null hypothesis
• Type II Error or β (beta) or false negative rate
– The chance of testing negative among those with the condition
– The chance of accepting the null hypothesis among those that do not satisfy the null hypothesis
• False Discovery Rate or q-Value
– The chance of not having the condition among those that test positive
– The chance of satisfying the null hypothesis among those that reject the null hypothesis
• False Omission Rate
– The chance of having the condition among those that test negative
– The chance of not satisfying the null hypothesis among those that accept the null hypothesis

Rule-in Accuracy
MRCPsych 2009

PRESENT ABSENT

(type I error) (discrimination)

Test -ve False –Ve True -Ve NPV

(type II
error)
(occurrence)

Rule-Out Accuracy
MRCPsych 2009

PRESENT ABSENT


Test -ve False –Ve True -Ve NPV
(type II error) (discrimination)

(occurrence)

Likelihood Ratios
MRCPsych 2009
• Likelihood Ratio for Positive Tests
• The chance of testing positive among those with the condition; divided by the
chance of testing positive among those without the condition
• Sensitivity / (1 - Specificity)
• [ TP / (TP + FN) ] / [ FP / (FP + TN) ]

• = PPV/Prevalence

• Likelihood Ratio for Negative Tests
• The chance of testing negative among those with the condition; divided by the
chance of testing negative among those without the condition
• Specificity (1 – Sensitivity)
• [ FN / (FN + TP) ] / [ TN / (TN + FP) ]

• = NPV/Prevalence

Summary Measures
MRCPsych 2009

• Youden's J
– Sensitivity + Specificity – 1

• Predictive Summary Index
– PPV + NPV – 1

• Overall accuracy (fraction correct)
– TP+TN / TP+FP+TN+FN

Reciprocal Measures
MRCPsych 2009

• Number Needed to Diagnose (NND)
– 1 / (Youden's J)

• Number Needed to Predict (NNP)
– 1 / (PSI)

• Number Needed to Screen (NNS)
– 1/(FC-FiC)

Receiver Operating Characteristic

Murphy JM, Berwick DM, Weinstein MC, Borus JF, Budman SH, Klerman GL 1987 : Performance of screening and diagnostic tests:
Application of Receiver Operating Characteristic ROC analysis. Arch Gen Psychiatry 44:550-555

Accuracy 2x2 Table
MRCPsych 2009

PRESENT ABSENT


Test -ve False -Ve True -Ve NPV


Test vs Major Depression
MRCPsych 2009

PRESENT ABSENT
Test +ve 500 1500 2000
PPV 33%

Test -ve 500 4500 5000
NPV 90%

1000 6000 7000

Sensitivity Specificity
Prevalence 14%
50% 75%

Test vs Major + Min Depression
MRCPsych 2009

PRESENT ABSENT
Test +ve 500 1500 2000
PPV 33%

Test -ve 500 500 1000
NPV 50%

1000 2000 3000

Prevalence 33%
50% 33%

4. Clinical Value of Diagnostic Tests

Added Value
MRCPsych 2009

• Definition 1:
– The additional ability of a test to rule-in or rule-out
compared with the baseline rate
– PPV minus Prevalence
– NPV minus prevalence

• Definition 2:
– The additional of a test to rule-in or rule-out compared
with the unassisted rate
– PPV test minus PPV no test (assuming equal prevalence)
– LR+ test minus LR+ no test
– AUC test minus AUC no test

0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Loss of energy

Diminished drive

Sleep disturbance
MRCPsych 2009
Concentration/indecision

Depressed mood

Anxiety

Diminished concentration

Insomnia

Diminished interest/pleasure

Psychic anxiety

Helplessness

Worthlessness

Hopelessness

Somatic anxiety

Thoughts of death

Anger

Excessive guilt

Psychomotor change

Indecisiveness

Decreased appetite

Psychomotor agitation

Psychomotor retardation

Decreased weight

Lack of reactive mood

Increased appetite
All Case Proportion

Hypersomnia
Depressed Proportion
Non-Depressed Proportion

Increased weight
Mitchell, Zimmerman et al MIDAS Database. Psychol Med 2007 Submitted

-0.10
0.00
0.10
0.20
0.30
0.40
0.50
A nge
r

A nxie
ty
Decr
ea s e
d app
eti te
MRCPsych 2009
Decr
ea s e
d we
ig ht

Depr
es sed
m oo
d
Dimin
is hed
c onc
entr
at ion

Dimin
is hed
dr ive
Dimin
is hed
int er
est /p
leasu
re

Exc e
ss ive
guilt

Help
le ss nes
s

Hope
le s snes
s

Hy pe
rsom
n ia
Inc re
ased
appe
t ite

Inc re
ased
w eig
ht

Indec
isiv e
ne ss

Ins om
nia
L ac k
of re
act iv
e mo
od

L os s
o f en
erg y

Ps ych
i c an
x iety
Ps ych
omot
or a g
i tatio
n
Ps ych
omot
or c h
ang e
Ps ych
o mot o
r ret a
rdatio
n
Sl eep
dis tu
rban
ce

Soma
ti c a
n x iety
Rule-In Added Value (PPV-Prev)

Thou
g
Rule-Out Added Value (NPV-Prev)

hts o
f dea
th

Wor t
hle s sne
ss

Accuracy of Tests: Visual
MRCPsych 2009

Very unlikely unlikely likely Very likely
Overall
10% - (22) -50% = 54%
CIDI (computer) Any Depression
PHQ-2
3% - (16) - 32% = 29% Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci

CIDI (computer) Any Depression

WHO5 (1+3)
3% - (16) - 32% = 29% Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci
CIDI (computer) Mj Depression

1 Question
3% - (37) - 63% = 60% Arroll B et al (2003) BMJ
CIDI (computer) Mj Depression
2 Questions
25% 75%
0% 32% - (37) - 96% = 64% 100%

1.00
Post-test Probability
MRCPsych 2009
0.90

0.80

0.70

0.60

0.50

0.40

Clinician Positive (Fallowfield et al, 2001)
0.30 Clinician Negative (Fallowfield et al, 2001)

Baseline Probability

0.20 HADS-D Positive (Mata-analysis)

HADS-D Negative (Meta-analysis)

0.10

Pre-test Probability
0.00
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1.00

Post-test Probability
Depression Present (Routine)

0.90 Depression Absent (Routine)

MRCPsych 2009 Depression Scales +ve (Median)

0.80 Depression Scales -ve (Median)

Prior Probability
0.70

0.60

0.50

PPV=0.41
0.40

0.30

0.20

0.10

NPV=0. 97 Pre-test Probability
0.00
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Prevalence of 0.15

5. Worked Examples of diagnostic tests

PostStroke Mj Depression vs NonMj
MRCPsych 2009

• Clinicians diagnosis using DSMIV vs SCAN/PSE

• Using the SCAN:
• 50 people with major depression
• 150 healthy people
• 50 with minor depression

Clinicians using DSMIV
MRCPsych 2009

• Clinicians diagnosed 52 cases with Mj depression
• The specificity of DSMIV was 95%

• Q. What was the sensitivity?
• Q. What was the prevalence?
• Q. What was the PPV?
• Q. What was the % correctly identified per every
100 screened?

Test vs Major Depression
MRCPsych 2009

On SCAN ABSENT
Test +ve ?? 52
(Clinician) PPV ??%

Test -ve ??
NPV ??%

50 200

Prevalence ??%
50% 95%

Symptoms Post- Post- Sensitivi No Post- Non Specifici PPV NPV Positive Negat Identificati NNS NND NNP
Stroke Stroke ty Stroke Depresse ty Utility ive on Index
MRCPsychDepressio
2009 Depressio Depressio d Stroke Index Utility
n by n with n by Patient Index
reference symptom reference without
standard standard symptom

Persistent 50 45 0.90 200 184 0.92 0.74 0.97 0.66 0.90 83.20 1.20 1.22 1.41
low mood

Loss of 50 48 0.96 200 156 0.78 0.52 0.99 0.50 0.77 63.20 1.58 1.35 1.96
interest
Loss of drive 50 40 0.80 200 120 0.60 0.33 0.92 0.27 0.55 28 3.57 2.50 3.90

Low energy 50 49 0.98 200 20 0.10 0.21 0.95 0.21 0.10 -44.80 -2.23 12.50 6.01

Insomnia 50 35 0.70 200 136 0.68 0.35 0.90 0.25 0.61 36.80 2.72 2.63 3.93

Poor 50 25 0.50 200 178 0.89 0.53 0.88 0.27 0.78 62.40 1.60 2.56 2.45
appetite
Suicidal 50 2 0.04 200 196 0.98 0.33 0.80 0.01 0.79 58.40 1.71 50 7.32
thoughts
Poor 50 28 0.56 200 114 0.57 0.25 0.84 0.14 0.48 13.60 7.35 7.69 11.93
concentratio
n
Poor 50 10 0.20 200 164 0.82 0.22 0.80 0.04 0.66 39.20 2.55 50 46.92
orientation

Anger 50 17 0.34 200 172 0.86 0.38 0.84 0.13 0.72 51.20 1.95 5 4.61

DSMIV 50 42 0.84 200 190 0.95 0.81 0.96 0.68 0.91 85.60 1.17 1.27 1.30
algorithm

6. Advanced Techniques

sROC
Real World Numbers
NND; NNS
Bivariate meta-analysis
Economics

MRCPsych 2009

PPV DT Distress = 55%; PPV Other Methods 65%

1.00

ROC Plot
0.90
MRCPsych 2009 Low Mood
Sensitivity

0.80 DSMIV

0.70
Low mood &
loss interest
0.60

0.50

0.40

0.30

0.20

0.10

0.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

1 - Specifity

Bivariate Diagnostic meta-analysis
MRCPsych 2009

Measure Basic Formula Strength Weakness Reciprocal Absolute Reciprocal Absolute
Benefit Benefit Formula
Youden Index sensitivity + specificity – 1 Relatively independent of Requires application of Number Needed to NND = 1/Youden
prevalence criterion (gold) standard) Diagnose
Not clinically interpretable Does not assess ratio of
false positives to
negatives
Predictive PPV + NPV – 1 Measures gain Dependent of prevalence Number Needed to NNP = 1/PSI
Summary Index Clinically applicable Places equal weight on Predict
rule-in and rule-out
accuracy

Overall Accuracy TP+TN / TP+FP+TN+FN Measures real number of Requires application of Number needed to Screen NNS= 1/Idemtification
(Fraction Correct) correct identifications vs criterion (gold) standard) Index
misidentifications
Can be easily converted
into a percentage

Further Reading
MRCPsych 2009

• David A Grimes, Kenneth F Schulz Uses and abuses of screening tests Lancet
2002; 359: 881–84

• Jonathan J Deeks, Douglas G Altman Diagnostic tests 4: likelihood ratios BMJ
VOLUME 329 17 JULY 2004

• Patrick M Bossuyt, Les Irwig, Jonathan Craig and Paul Glasziou Comparative
accuracy: assessing new tests against existing diagnostic pathways. BMJ
• 2006;332;1089-1092

• Reitsma JB et al Bivariate analysis of sensitivity and specificity produces
informative summary measures in diagnostic reviews. Journal of Clinical
Epidemiology 58 (2005) 982–990

MRCPsych - How To Analyse Diagnostic Test Studies (May09)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à MRCPsych - How To Analyse Diagnostic Test Studies (May09)

Similaire à MRCPsych - How To Analyse Diagnostic Test Studies (May09) (20)

Plus de Alex J Mitchell

Plus de Alex J Mitchell (20)

Dernier

Dernier (20)

MRCPsych - How To Analyse Diagnostic Test Studies (May09)