MRCPsych - How To Analyse Diagnostic Test Studies (May09)
1. MRCPsych Teaching 2009
MRCPsych 2009
Critical Appraisal of Diagnostic Tests
Studies of Accuracy, Validity, Screening & Case finding
Alex J Mitchell
Consultant in Liaison Psychiatry
University of Leicester
2. Contents
MRCPsych 2009
1. Importance of understanding diagnostic tests
2. Concept of diagnostic tests: traits to diseases
3. Statistics of diagnostic tests
4. Clinical Value of diagnostic tests
5. Worked examples
6. Advances techniques
4. What Is a Diagnostic Test in Psychiatry?
MRCPsych 2009
• CT/MRI
• CSF
• Blood tests eg TFTs
• SCAN/SCID/PSE/MINI
• Neuropsychological Testing
• MMSE
• HADS/BDI/CESD?
• Clinical Judgement
• Self-report
5. Why Is a HADS score not a diagnosis?
MRCPsych 2009
6. Why Is a HADS score not a diagnosis?
MRCPsych 2009
1. No core features
2. No symptom ranking
3. No functional assessment
4. Duration unclear
5. What if Missing items?
6. Imprecise
7. Defining Diagnostic Testing
MRCPsych 2009
• INTENTION
• Screening
– The systematic application of a test or inquiry, to identify individuals at sufficient risk of a
specific disorder to warrant further actions among those who have not sought medical help for
that disorder
• Case-Finding
– The selected application of a test or inquiry, to identify individuals with a suspected disorder
and exclude those without a disorder, usually in those who have sought medical help for that
disorder
• APPLICATION
• Targeted (High Risk)
– The highly selected application of a test or inquiry, to identify individuals at high risk of a
specific disorder by virtue of known risk factors
• Routine Screening
– The systematic application of a test or inquiry, to individuals without a known disorder (or who
have not sought medical help for that disorder)
Adapted from Department of Health. Annual report of the national screening committee. London: DoH, 1997.
8. Defining Diagnostic Testing
MRCPsych 2009
• COMPARATOR
• Accuracy
– The degree of approximation (veracity) to a robust comparator
• Validity
– The degree of approximation (veracity) to a criterion reference
• Precision
– The degree of predictability (low SD) in the measure
9. Aims of Detection
MRCPsych 2009
• Screening:
– Short; Easy; some false +ve (low SpS PPV), few false
–ve (High Sens, NPV)
• Diagnosis (case-finding)
– Accurate, Few false +ve or –ve
• Rating
– Simple, patient rated, correl. With QoL and other
outcomes
10. UK National Screening Committee Guidelines
MRCPsych 2009
• The condition should: • The screening program should:
• • Be an important health issue • • Show evidence that benefits of screening
• • Have a well-understood history, with a outweighing risks
detectable risk factor or disease marker • • Be acceptable to public and professionals
• • Have cost-effective primary preventions • • Be cost effective (and have ongoing
implemented. evaluation)
• • Have quality-assurance strategies in place.
• The screening tool should: • Adapted from: UK National Screening
• • Be a valid tool with known cut-off Committee Criteria for appraising the
• • Be acceptable to the public viability, effectiveness and appropriateness of
a screening programme
• • Have agreed diagnostic procedures.
• http://www.nsc.nhs.uk/pdfs/criteria.pdf
• The treatment should:
• • Be effective, with evidence of benefits of
early intervention
• • Have adequate resources
• • Have appropriate policies as to who should
be treated.
11. Development of Diagnostic Tests
MRCPsych 2009
Stage Type Purpose Description
Pre-clinical Development Development of the proposed tool or Here the aim is to develop a screening method that is likely to help in the detection of the
test underlying disorder, either in a specific setting or in all setting. Issues of acceptability of the
tool to both patients and staff must be considered in order for implementation to be
successful.
Phase Diagnostic validity Early diagnostic validity testing in a The aim is to evaluate the early design of the screening method against a known (ideally
I_screen selected sample and refinement of tool accurate) standard known as the criterion reference. In early testing the tool may be
refined, selecting most useful aspects and deleting redundant aspects in order to make the
tool as efficient (brief) as possible whilst retaining its value.
Phase Diagnostic validity Diagnostic validity in a representative The aim is to assess the refined tool against a criterion (gold standard) in a real world
II_screen sample sample where the comparator subjects may comprise several competing condition which
may otherwise cause difficulty regarding differential diagnosis.
Phase Implementation Screening RCT; clinicians using vs not This is an important step in which the tool is evaluated clinically in one group with access
III_screen using a screening tool to the new method compared to a second group (ideally selected in a randomized fashion)
who make assessments without the tool.
Phase Implementation Screening implementation studies using In this last step the screening tool /method is introduced clinically but monitored to discover
IV_screen real-world outcomes the effect on important patient outcomes such as new identifications, new cases treated
and new cases entering remission.
12. Theory of Diagnostic Tests
MRCPsych 2009
Cut-off value
Non-Depressed
Depressed
#
of
Individuals True -ve
True +ve
False -ve False +ve
Test
Result
13. Low Prevalence (Se Sp = same)
MRCPsych 2009
Cut-off value
Non-Depressed
Mj Depression
#
of
Individuals
False –ve False +ve
SMALL LARGE
Test
Result
14. High Prevalence (Se Sp = same)
MRCPsych 2009
Cut-off value
Non-Depressed Mj+Mn Depression
#
of
Individuals
False –ve False +ve
LARGE SMALL
Test
Result
17. Example: A Clear Disease
[#1] Point of Partial Rarity
Number
of
Individuals
No Disorder
True ‐ve
True ‐ve
True +ve
True +ve
Disorder
False +ve
False +ve False ‐ve
False ‐ve
Test Result
18. Example: A Probable Syndrome
[#2]
Number
of
Individuals
No Disorder
True ‐ve
True ‐ve
True +ve
True +ve
Disorder
False +ve
False +ve False ‐ve
False ‐ve
MMSE Cognitive Score
19. Example: A Normally
Distributed Trait [#3]
Number
of
Individuals
No Disorder
True ‐ve
True ‐ve
True +ve
True +ve
Disorder
False +ve
False +ve False ‐ve
False ‐ve
MMSE Cognitive Score
23. Mitchell, Coyne et al (2008)
110 MRCPsych 2009
100 Scores on the CES-D during Pregnancy, 3 and 12 months Post-partum in 947 Women
90
80
70
60
Early Pregnancy
50 3months Post-Partum
12months Post-Partum
40
30
20
10
0
Healthy Depressive Symptoms Mild Depression Moderate to Severe Depression
24. PHQ9 Linear distribution
35
MRCPsych 2009
30
PHQ9 (Major Depression)
25 PHQ9 (Minor Depression)
PHQ9 (Non-Depressed)
20
15
10
5
0
ve
n
en
n
ro
e
e
o
ve
n
en
n
ur
en
en
ne
x
t
n
gh
ee
Tw
re
Te
ve
n
ee
Si
ee
Ze
Fo
el
Fi
ev
Ni
te
te
O
fte
Th
Ei
nt
Se
Tw
irt
xt
ur
gh
El
Fi
ve
Th
Si
Fo
Ei
Se
Baker-Glen, Mitchell et al (2008)
25. 0
500
1000
1500
2000
2500
3000
Ze
ro
O
ne
MRCPsych 2009
Tw
o
Th
re
e
Fo
ur
Fi
ve
Si
x
Se
ve
n
ei
gh
t
N
in
e
Te
n
El
ev
en
Tw
el
ve
Th
irt
ee
n
Fo
ur
te
en
Fi
fte
en
Si
Thompson et al (2001) n=18,414
xt
ee
Se n
ve
nt
ee
n
Ei
gh
te
en
27. Reference Standard Reference Standard
Accuracy 2x2 Table Test
Disorder Present No Disorder
A/A + B
MRCPsych 2009
+ve A B PPV
Depression Depression Test
-ve C D
D/C + D
NPV
PRESENT ABSENT Total A/ A + C D/ B + D
Sn Sp
Test +ve True +ve False +ve PPV
Test -ve False -Ve True -Ve NPV
Sensitivity Specificity Prevalence
29. Basic Measures of Accuracy
MRCPsych 2009
• Sensitivity (Se) a/(a + c) TP / (TP + FN)
• A measure of accuracy defined the proportion of patients with disease in whom
the test result is positive: a/(a + c)
• Specificity (Sp) d/(b + d) TN / (TN + FP)
• A measure of accuracy defined as the proportion of patients without disease in
whom the test result is negative
• Positive Predictive Value a/(a+b) TP / (TP + FP)
• A measure of rule-in accuracy defined as the proportion of true positives in
those that screen positive screening result, as follows
• Negative Predictive Value c/(c+d) TN / (TN + FN)
• A measure of rule-out accuracy defined as the proportion of true negatives in
those that screen negative screening result, as follows
30. Accuracy in words
MRCPsych 2009
• Sensitivity
– The chance of testing positive among those with the condition
– The chance of rejecting the null hypothesis among those that do not satisfy the null hypothesis
• Specificity
– The chance of testing negative among those without the condition
– The chance of accepting the null hypothesis among those that satisfy the null hypothesis
• Positive Predictive Value
– The chance of having the condition among those that test positive
– The chance of not satisfying the null hypothesis among those that reject the null hypothesis
• Negative Predictive Value
– The chance of not having the condition among those that test negative
– The chance of satisfying the null hypothesis among those that accept the null hypothesis
• Type I Error or α (alpha) or p-Value or false positive rate
– The chance of testing positive among those without the condition
– The chance of rejecting the null hypothesis among those that satisfy the null hypothesis
• Type II Error or β (beta) or false negative rate
– The chance of testing negative among those with the condition
– The chance of accepting the null hypothesis among those that do not satisfy the null hypothesis
• False Discovery Rate or q-Value
– The chance of not having the condition among those that test positive
– The chance of satisfying the null hypothesis among those that reject the null hypothesis
• False Omission Rate
– The chance of having the condition among those that test negative
– The chance of not satisfying the null hypothesis among those that accept the null hypothesis
31. Rule-in Accuracy
MRCPsych 2009
Depression Depression
PRESENT ABSENT
Test +ve True +ve False +ve PPV
(type I error) (discrimination)
Test -ve False –Ve True -Ve NPV
(type II
error)
Sensitivity Specificity Prevalence
(occurrence)
33. Likelihood Ratios
MRCPsych 2009
• Likelihood Ratio for Positive Tests
• The chance of testing positive among those with the condition; divided by the
chance of testing positive among those without the condition
• Sensitivity / (1 - Specificity)
• [ TP / (TP + FN) ] / [ FP / (FP + TN) ]
• = PPV/Prevalence
• Likelihood Ratio for Negative Tests
• The chance of testing negative among those with the condition; divided by the
chance of testing negative among those without the condition
• Specificity (1 – Sensitivity)
• [ FN / (FN + TP) ] / [ TN / (TN + FP) ]
• = NPV/Prevalence
41. Added Value
MRCPsych 2009
• Definition 1:
– The additional ability of a test to rule-in or rule-out
compared with the baseline rate
– PPV minus Prevalence
– NPV minus prevalence
• Definition 2:
– The additional of a test to rule-in or rule-out compared
with the unassisted rate
– PPV test minus PPV no test (assuming equal prevalence)
– LR+ test minus LR+ no test
– AUC test minus AUC no test
42. 0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Loss of energy
Diminished drive
Sleep disturbance
MRCPsych 2009
Concentration/indecision
Depressed mood
Anxiety
Diminished concentration
Insomnia
Diminished interest/pleasure
Psychic anxiety
Helplessness
Worthlessness
Hopelessness
Somatic anxiety
Thoughts of death
Anger
Excessive guilt
Psychomotor change
Indecisiveness
Decreased appetite
Psychomotor agitation
Psychomotor retardation
Decreased weight
Lack of reactive mood
Increased appetite
All Case Proportion
Hypersomnia
Depressed Proportion
Non-Depressed Proportion
Increased weight
Mitchell, Zimmerman et al MIDAS Database. Psychol Med 2007 Submitted
43. -0.10
0.00
0.10
0.20
0.30
0.40
0.50
A nge
r
A nxie
ty
Decr
ea s e
d app
eti te
MRCPsych 2009
Decr
ea s e
d we
ig ht
Depr
es sed
m oo
d
Dimin
is hed
c onc
entr
at ion
Dimin
is hed
dr ive
Dimin
is hed
int er
est /p
leasu
re
Exc e
ss ive
guilt
Help
le ss nes
s
Hope
le s snes
s
Hy pe
rsom
n ia
Inc re
ased
appe
t ite
Inc re
ased
w eig
ht
Indec
isiv e
ne ss
Ins om
nia
L ac k
of re
act iv
e mo
od
L os s
o f en
erg y
Ps ych
i c an
x iety
Ps ych
omot
or a g
i tatio
n
Ps ych
omot
or c h
ang e
Ps ych
o mot o
r ret a
rdatio
n
Sl eep
dis tu
rban
ce
Soma
ti c a
n x iety
Rule-In Added Value (PPV-Prev)
Thou
g
Rule-Out Added Value (NPV-Prev)
hts o
f dea
th
Wor t
hle s sne
ss
44. Accuracy of Tests: Visual
MRCPsych 2009
Very unlikely unlikely likely Very likely
Overall
10% - (22) -50% = 54%
CIDI (computer) Any Depression
PHQ-2
3% - (16) - 32% = 29% Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci
CIDI (computer) Any Depression
WHO5 (1+3)
3% - (16) - 32% = 29% Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci
CIDI (computer) Mj Depression
1 Question
3% - (37) - 63% = 60% Arroll B et al (2003) BMJ
CIDI (computer) Mj Depression
2 Questions
25% 75%
0% 32% - (37) - 96% = 64% 100%
45. 1.00
Post-test Probability
MRCPsych 2009
0.90
0.80
0.70
0.60
0.50
0.40
Clinician Positive (Fallowfield et al, 2001)
0.30 Clinician Negative (Fallowfield et al, 2001)
Baseline Probability
0.20 HADS-D Positive (Mata-analysis)
HADS-D Negative (Meta-analysis)
0.10
Pre-test Probability
0.00
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
48. PostStroke Mj Depression vs NonMj
MRCPsych 2009
• Clinicians diagnosis using DSMIV vs SCAN/PSE
• Using the SCAN:
• 50 people with major depression
• 150 healthy people
• 50 with minor depression
49. Clinicians using DSMIV
MRCPsych 2009
• Clinicians diagnosed 52 cases with Mj depression
• The specificity of DSMIV was 95%
• Q. What was the sensitivity?
• Q. What was the prevalence?
• Q. What was the PPV?
• Q. What was the % correctly identified per every
100 screened?
50. Test vs Major Depression
MRCPsych 2009
Depression Depression
On SCAN ABSENT
Test +ve ?? 52
(Clinician) PPV ??%
Test -ve ??
NPV ??%
50 200
Sensitivity Specificity
Prevalence ??%
50% 95%
51. Symptoms Post- Post- Sensitivi No Post- Non Specifici PPV NPV Positive Negat Identificati NNS NND NNP
Stroke Stroke ty Stroke Depresse ty Utility ive on Index
MRCPsychDepressio
2009 Depressio Depressio d Stroke Index Utility
n by n with n by Patient Index
reference symptom reference without
standard standard symptom
Persistent 50 45 0.90 200 184 0.92 0.74 0.97 0.66 0.90 83.20 1.20 1.22 1.41
low mood
Loss of 50 48 0.96 200 156 0.78 0.52 0.99 0.50 0.77 63.20 1.58 1.35 1.96
interest
Loss of drive 50 40 0.80 200 120 0.60 0.33 0.92 0.27 0.55 28 3.57 2.50 3.90
Low energy 50 49 0.98 200 20 0.10 0.21 0.95 0.21 0.10 -44.80 -2.23 12.50 6.01
Insomnia 50 35 0.70 200 136 0.68 0.35 0.90 0.25 0.61 36.80 2.72 2.63 3.93
Poor 50 25 0.50 200 178 0.89 0.53 0.88 0.27 0.78 62.40 1.60 2.56 2.45
appetite
Suicidal 50 2 0.04 200 196 0.98 0.33 0.80 0.01 0.79 58.40 1.71 50 7.32
thoughts
Poor 50 28 0.56 200 114 0.57 0.25 0.84 0.14 0.48 13.60 7.35 7.69 11.93
concentratio
n
Poor 50 10 0.20 200 164 0.82 0.22 0.80 0.04 0.66 39.20 2.55 50 46.92
orientation
Anger 50 17 0.34 200 172 0.86 0.38 0.84 0.13 0.72 51.20 1.95 5 4.61
DSMIV 50 42 0.84 200 190 0.95 0.81 0.96 0.68 0.91 85.60 1.17 1.27 1.30
algorithm
52. 6. Advanced Techniques
sROC
Real World Numbers
NND; NNS
Bivariate meta-analysis
Economics
57. Measure Basic Formula Strength Weakness Reciprocal Absolute Reciprocal Absolute
Benefit Benefit Formula
Youden Index sensitivity + specificity – 1 Relatively independent of Requires application of Number Needed to NND = 1/Youden
prevalence criterion (gold) standard) Diagnose
Not clinically interpretable Does not assess ratio of
false positives to
negatives
Predictive PPV + NPV – 1 Measures gain Dependent of prevalence Number Needed to NNP = 1/PSI
Summary Index Clinically applicable Places equal weight on Predict
rule-in and rule-out
accuracy
Overall Accuracy TP+TN / TP+FP+TN+FN Measures real number of Requires application of Number needed to Screen NNS= 1/Idemtification
(Fraction Correct) correct identifications vs criterion (gold) standard) Index
misidentifications
Can be easily converted
into a percentage
58. Further Reading
MRCPsych 2009
• David A Grimes, Kenneth F Schulz Uses and abuses of screening tests Lancet
2002; 359: 881–84
• Jonathan J Deeks, Douglas G Altman Diagnostic tests 4: likelihood ratios BMJ
VOLUME 329 17 JULY 2004
• Patrick M Bossuyt, Les Irwig, Jonathan Craig and Paul Glasziou Comparative
accuracy: assessing new tests against existing diagnostic pathways. BMJ
• 2006;332;1089-1092
• Reitsma JB et al Bivariate analysis of sensitivity and specificity produces
informative summary measures in diagnostic reviews. Journal of Clinical
Epidemiology 58 (2005) 982–990