OHE presents a series of lunchtime seminars throughout the year. The most recent seminar, held in late April, considered the influence of cost-effectiveness and other factors on NICE decisions.
1. The Influence of Cost-Effectiveness and
Other Factors on NICE Decisions
OHE Lunchtime Seminar
London • 23 April 2013
2. The influence of cost-effectiveness
and other factors on NICE decisions
Helen Dakin1 and Nancy Devlin2
in collaboration with
Yan Feng2, Nigel Rice3, Phill O’Neill2 and David Parkin2
1University of Oxford, 2OHE, 3Unversity of York
4. NICE
Established 1999
Issues guidance to 'ensure quality and value for money'
For technology appraisals 'NHS is required to provide
funding and resources for medicines and treatments
recommended by NICE'
Important implications for patients, the NHS, industry,
decision making in other countries
What factors affect NICE decisions? How important is
cost effectiveness compared to ‘other factors’?
5. NICE’s stated threshold
Pre-2004: various statements suggest a threshold of around
£30,000
2004 methods guide & 2005 social value judgements
• '[NICE] should, generally, accept as cost effective those interventions
with an incremental cost-effectiveness ratio of less than £20,000 per
QALY and that there should be increasingly strong reasons for
accepting as cost effective interventions with an incremental cost-
effectiveness ratio of over £30,000 per QALY
2008 social value judgements
• NICE should explain its reasons when it decides that an intervention
with an ICER below £20,000 per QALY gained is not cost effective;
and when an intervention with an ICER of more than £20,000 to
£30,000 per QALY gained is cost effective
6. Previous studies on
determinants of NICE decisions
Study No.
Appraisals
Model Findings
Devlin &
Parkin (2004)
39 to May
2002
Logistic:
yes/no
• Threshold ≈£40,000/QALY
• Uncertainty & prevalence matter
Dakin et al
(2006)
73 to Dec
2003
mlogit: yes /
no / yes, but
• ICER, number trials or SRs, date, patient group
submissions and technology type matter
• ‘Yes, but’ and ‘no’ driven by different factors
Jena (2009) 86 to 2005 Linear
probability
model:
yes/no
• £1000 increase in ICER decreases probability
of ‘yes’ by 0.009
• Infectious disease and mental health decisions
more likely to be ‘no’
Mason &
Drummond
(2009)
38 cancer
appraisals
Oct 2008
Tabulation:
yes / no /
yes, but
• ‘No’ more likely after 2006: partly due to STA
process?
• Restrictions attributed to ICER, insufficient
evidence, uncertainty or methodology
7. Aims
Estimate NICE’s cost-effectiveness threshold
Identify the factors that affect or explain NICE’s decisions
Evaluate whether NICE’s threshold or decision-making has changed
over time [in progress]
9. The basic model
Incremental cost-effectiveness ratio (ICER)
• Hypothesised to be main driver of decisions
• ICER in £000s
Clinical evidence
• ‘NICE should not recommend an intervention […] if there is […] not
enough evidence’ (SVJ 2008)
• Hypothesis: NICE will reject if insufficient evidence
• Total number of patients in randomised trials = number RCTs x mean
patients/trial
Insights provided by stakeholders (Rawlins 2010)
• E.g. on whether QoL assessment adequately captures benefits (SVJ 2008)
• Hypothesis: increases odds of ‘yes’
• Patient group submission =1 if patient group submitted evidence or
opinion (proxy for stakeholder involvement/persuasion)
10. The basic model (cont.)
Only treatment
• Hypothesis: NICE is more likely to recommend if no alternatives
• =1 if there were no alternative treatments available for this patient group
Children
• Give ‘the benefit of the doubt’ given methodological challenges (Rawlins 2010)
• Hypothesis: paediatric treatments more likely to be recommended
• =1 if the decision specifically concerns children or adolescents
Publication date
• Evaluates whether NICE decisions are changing
• No prior hypothesis
• =Years since first NICE appraisal was published
Severity of underlying illness
• NICE states that is accept higher ICERs for serious conditions (Rawlins 2010)
• = Mean WHO DALY weight across conditions for this disease category
11. Additional variables explored
Pharmaceutical
• May reflect greater stakeholder involvement
• =1 for all drugs
Disease
• Interim analysis suggests NICE gives extra weight to cancer treatments
• 8 dummies =1 if the decision concerns that disease
• Diseases with <20 decisions with ICERs omitted
Probabilistic sensitivity analysis (PSA)
• Significant predictor of AWMSG decisions (Linley & Hughes 2012)
• =1 if the model has PSA
Broader perspective
• Reflects consideration of additional savings not captured in ICER
• =1 if non-NHS/PSS costs were analysed or discussed
12. Additional variables explored (cont.)
Appraisal committee
• Because committees weigh both quantitative evidence and value
judgements, the way they do this might differ systematically across
committees.
• Committees characterised by their Chairs and included as dummies
Innovation
• NICE says that it takes into account the innovative nature of the
technology
• Defined by us as: any molecule launched within 2 years of appraisal
AND in an ATC4 class that was created 5 years prior to appraisal.
This picks up new medicines, but also avoids limiting to first in class as
NICE does assess groups of similar medicines that are new and would
capture the spirit of viewing a medicines as innovative – e.g. new
diabetes medicines, TNFs, etc.
13. Additional variables explored (cont.)
Single technology appraisal (STA)
• Mason & Drummond found cancer STAs more likely to be ‘no’
• =1 if the STA process was used
Orphan
• Evaluate […] ‘orphan drugs’, in the same way as any other treatment
(SVJ, 2008; Littlejohns and Rawlins, 2009)
• Hypothesised to have no impact based on NICE statements
• =1 if the treatment has EMEA orphan status
Uncertainty
• Difference between the highest and lowest NE quadrant ICERs
• 2 dummies indicating whether the plausible ICER could be dominant,
or be dominated were explored, but dropped out of regression
• Other measures of uncertainty are problematic
14. Additional variables explored
on a subset of decisions
End of life
• Place special value on treatments prolonging life at the end of life,
providing that life is of reasonable quality (Rawlins, 2010; NICE 2009)
• =1 if met EoL criteria
• Only evaluated for decisions with preliminary guidance (FAD)
published after 5th Jan 2009
16. HTAinSite
Most data derived from HTAinSite: www.htainsite.com
Commercial database developed by OHE, Abacus and City
University
Provides extensive data on
all NICE and SMC appraisals
• Regularly updated
• Extracted and validated based
on established protocol
Access to web interface
available to subscribers
Academics can request
data for research by email
17. Appraisals versus decisions
We analyse binary yes/no choices (not yes/no/yes, but)
• Evidence, ICERs and other considerations often differ by subgroup for
which the technology is recommended and those for which it is
rejected
• Levels of restrictions differ enormously from 5% of patients to 80%
(O’Neill and Devlin 2010)
• Reflects HTAinSite protocol
NICE appraisals are divided into yes/no decisions concerning
whether or not to use one technology in one patient
subgroup with a certain condition
• Methods for subdividing appraisals governed by HTAinSite protocol
18. Collection of ICER data
Most guidance documents give multiple ICERs
• From manufacturer, assessment group and decision support unit
• Base case, subgroup analyses and sensitivity analyses
• Several comparators
HTAinSite records all ICERs mentioned in documentation
We developed a protocol to identify the ICERs informing each
decision
• Included only cost/QALY ICERs for the subgroup(s) in that decision
• DSU or ERG/TAG ICERs used in preference to manufacturer
• Vs NICE’s preferred comparator or next most effective treatment on the
frontier
• Exclude ICERs that NICE did not believe (based on considerations section)
19. Decisions with multiple ICERs
A decision may have >1 relevant ICER if
• It covers >1 subgroup with different ICERs
• Results of several analyses or comparators are given equal prominence
in guidance document
• NICE concluded the ICER was ‘between X and Y’, <A or >B
Taking the mean or midpoint would ignore uncertainty & make
assumptions about how NICE uses ICER data
We therefore randomly sampled from the list of ICERs
• Each ICER was given equal weight
• For ranges, we sampled from the full list of ICERs from other decisions
• Drew 100 iterations, each with different ICERs for each decision
• Analyses repeated on each iteration; results combined with Rubin’s
Rules
21. Outline of econometric methods
Used logistic regression to predict the effect of ICER and
other variables on the log-odds of NICE saying ‘yes’
Adjusted standard errors to allow for clustering of decisions
within appraisals
Analysed in Stata version 12
22. Modelling strategy
Stage 0: Estimation of ICER-only model where ICER alone
predicts recommendations
Stage 1: Evaluate a 'basic model' with the variables expected to
have most impact on NICE decisions
Stage 2: Remove non-significant variables from basic model one
at a time
Stage 3: Evaluate impact of adding additional variables
Stage 4: Alternative specifications of basic model parameters [to
come]
Stage 5: Sensitivity and subgroup analyses [to come]
23. Model selection
Our methods for dealing with decisions with ≥2 ICERs
involve generating 100 datasets with different ICER values
We combine coefficients across datasets using Rubin’s Rules
AIC and pseudo-R2 cannot be pooled across datasets
We therefore choose between models based on prediction
accuracy
• We assume that model predicts a ‘yes’ if predicted log-odds ≥0
• Categorise decisions into true/false positives and true/false negatives
to get the % of decisions correctly predicted
24. Additional variables explored
on a subset of decisions
End of life
• Place special value on treatments prolonging life at the end of life,
providing that life is of reasonable quality (Rawlins 2010; NICE 2009)
• =1 if met EoL criteria
• Only evaluated for decisions with preliminary guidance (FAD)
published after 5th Jan 2009
26. Numbers of decisions and appraisals
240 appraisals
published by 31st
December 2011 E1 & E2: 11 terminated
appraisals & 12 decisions
without other restrictions
excluded
229 appraisals
comprising 763
decisions
E3a: 162 decisions based on
grounds other than cost-
effectiveness
E3b-c: 92 decisions based on
cost-effectiveness but without
available, quantified cost/QALY
510 decisions included
in models with ICERs
28. Number of ICERs
510 decisions have available quantified cost/QALY ICERs of which:
198 have 2 to 40 ICERs
31 have ICER range
ICERs for these appraisals
are randomly drawn in
100 datasets
29. ICER data
NE: more
costly,
more
effective
SE: less
costly,
more
effective
NW:
more
costly, less
effective
SW: less
costly, less
effective
Quadrants
vary across
decisions
Number
decisions
All 418 33 31 6 22
Yes (%) 282 (67%) 33 (100%) 0 (0%) 5 (83%) 13 (59%)
No (%) 136 (33%) 0 (0%) 31 (100%) 1 (17%) 9 (41%)
Mean
ICER
All £34,207 Dominant Dominated £5,760 -
Yes £17,450 Dominant - £5,544 -
No £68,952 - Dominated £6,839 -
Average ICER is >3 times higher for ‘no’ decisions than ‘yes’
Dominance perfectly predicts NICE recommendations
Subsequent analyses focus on the NE quadrant decisions
30. Impact of ICER ranking on
recommendations
Decisions with high ICERs are more likely to be rejected, but
there are many exceptions
£0
£45,500
£2,500
£5,000
£7,500
£60,000
£70,000
£100,000
£500,000
£10,000
£12,500
£15,000
£17,500
£20,000
£22,500
£25,000
£27,500
£30,000
£32,500
£35,000
£37,500
£40,000
£45,000
£50,000
Blue = recommended; red = rejected
31. Proportion of decisions below different
thresholds that are rejected
50% of decisions with ICERs >£20,000 are rejected
32. Sensitivity and specificity at different
thresholds
ROC analysis suggests ICER strongly predicts decisions (AUC 0.85)
Sensitivity and specificity both equal 77% if Rc ~£30,000/QALY
% correctly classified plateaus at 81-82% between Rc 36k & £54k,
peaking at £47,743/QALY
Specificity, sensitivity and classification calculated using roctab
33. Plotting probability of rejection against
ICER
Inflection points at ~£20,000 and
~£50,000/QALY?
Rawlins & Culyer estimated
that the relationship was of
this shape and that 'inflexion A
occurs at around £5,000-
£15,000/QALY and inflexion B
at around £25,000-
£35,000/QALY'
Decisions are grouped into categories with similar ICER;
proportion of decisions in each category that were recommended
plotted against mid-point of each category
Curved line shows approximate best fit by eye
35. Results of the basic model (1)
Variable Definition Odds ratio (SE)
ICER Cost-effectiveness ratio (£’000s) 0.938 (0.915, 0.962)*
Total Pts in RCTs
Total number of patients randomised in
all RCTs for this decision
0.99999 (0.99996, 1.00002)
Only treatment =1 if there are no alternative treatments 2.263 (0.448, 11.448)
Children =1 if concerns treatments for children 3.774 (0.274, 52.026)
Patient group
submission
=1 if ≥1 patient group(s) made a
submission
0.929 (0.097, 8.912)
Publication date
Years since first NICE appraisal was
published
1.061 (0.947, 1.188)
Severity
Mean DALY weight for conditions in this
disease category
0.435 (0.031, 6.022)
* p<0.05
Every £1000 increase in ICER reduces odds of yes by 6.2%
No other variables are statistically significant
36. Results of the basic model (2)
At average levels for all covariates, a decision would have a
50% chance of rejection if its ICER were £45,118/QALY
37. Results of the basic model (3)
Based on a 50% cut-off, 81.63% of decisions are correctly
classified
• However, at this cut-off, sensitivity (94% of ‘yes’ decisions correctly
predicted) is higher than specificity (56% of ‘no’ decisions correct)
Actual
recommendation
No Yes Total
Predicted
recommendation
No 79 (56%) 17 (6%) 96
Yes 61 (44%) 271 (94%) 332
Total 140 288 428
38. Impact of removing variables from
basic model
Removing any variable except severity worsens prediction
accuracy compared with basic model (81.63%)
Variable removed % correctly classified after
removing variable
Total Patients in RCTs 81.53%
Only treatment 81.25%
Children 81.61%
Patient group submission 81.63%
Publication date 81.62%
Severity 81.84%
All variables except ICER 81.44%
39. Effect of adding variables to basic model
Variable Definition % correct Odds ratio
Pharmaceutical = 1 for drugs 81.62% 0.903 (p=0.81)
PSA = 1 if the model has PSA 81.78% 0.672 (p=0.45)
Broader
perspective
= 1 if non-NHS/PSS costs were analysed or
discussed
81.57% 0.666 (p=0.46)
STA = 1 if the STA process was used 81.74% 0.659 (p=0.23)
Orphan = 1 if Tx has EMEA orphan status 81.94% 1.415 (p=0.55)
Range of ICERs
Difference between the lowest and highest
NE quadrant ICER for this decision in £’000s
82.56% 0.987 (p=0.15)
Cancer
Dummy =1 if decision is for this disease
82.29% 2.025 (p=0.10)
Cardiovascular 81.70% 0.869 (p=0.75)
Central nervous
system
81.41% 0.433 (p=0.30)
Endocrine 81.58% 0.648 (p=0.45)
Infectious 81.69% 1.122 (p=0.89)
Mental health 81.50% 0.411 (p=0.49)
Musculoskeletal 81.99% 3.317 (p=0.02)
Respiratory 82.55% 0.855 (p=0.002)
40. Model including all variables increasing
prediction accuracy
Omitted severity and added STA, PSA, orphan, ICER range,
cancer, cardiovascular disease, infectious disease,
musculoskeletal & respiratory to basic model as these
improved model fit
Correctly classified 84.20% of NICE decisions
• Specificity was higher than basic model, but sensitivity was lower
Actual recommendation
No Yes Total
Predicted
recommendation
No 25 (66%) 6 (8%) 32
Yes 13 (34%) 80 (92%) 93
Total 39 86 125
41. Comparison of model predictions
Allowing for other factors has little impact on curve or threshold
ICER only: £45,449
Basic model: £45,118
Basic model minus severity, plus
STA, PSA, orphan, ICER range,
cancer, CVD, infection,
musculoskeletal & respiratory :
£41,808
42. Comparing thresholds across diseases
NICE decisions and thresholds appear to vary substantially
across diseases
43. End of life
The impact of end of life was evaluated for the 133 decisions
with draft guidance since Jan 2009
Adding end-of-life variable to basic model improves prediction
accuracy for these decisions from 84.23% to 85.12%
• More than any other variable explored except ICER range and
respiratory
Odds of NICE saying ‘yes’ are 3.37 (95% CI: 0.64, 17.86)
times higher if meets end-of-life criteria (p=0.153)
44. End of life (2)
Threshold appears to be higher (>£50,000) post-2009
End-of-life treatments have higher ICER
Not end of life: £53,534
End of life: £67,646
46. Conclusions (1)
ICER is by far the strongest predictor of NICE decisions
• Excluding those decisions based on clinical grounds or lack of evidence
ICER alone explains 82% of NICE decisions
Other variables significantly affecting NICE decisions include
• Whether for respiratory disorder (less chance of ‘yes’)
• Whether for musculoskeletal disorder (more chance of ‘yes’)
Variables improving predictions, but not statistically significant
• End of life – matches EoL guidance;
• PSA; orphan; uncertainty
• Cardiovascular disease, cancer, infection
• Committee; innovation
47. Conclusions (2)
Odds of a ‘yes’ decrease by ~6% for every £1000 increase in
ICER
50% of decisions with ICERs >£20,000/QALY are ‘no’
Specificity and sensitivity are equalised at £30,000/QALY
threshold
Our ‘best’ model suggests that the average decision with an
ICER of £42,000 has a 50% chance of being rejected
48. Next steps
Conduct sensitivity and subgroup analyses to explore how
results vary with alternative specification of models and
variables
• Further exploration of variations over time, e.g. subgrouping appraisals
by time periods
• Cross-validation to be conducted on the best model
Step function model, assuming that NICE rejects all
treatments above a certain threshold ICER?
• Additional variables increase/decrease the threshold, not the log-odds
Multi-part models of decision-making
49. Multi-part models of decision-making
Current analyses exclude decisions not based on cost-
effectiveness grounds
Some of the variables (e.g. clinical evidence or only treatment)
may predict these decisions
Decisions to reject/recommend on other grounds may occur
before ICER evidence is considered
Could explore these earlier steps in 2- or 3-part models
Rejected on clinical grounds
Recommended on clinical grounds
Consider cost-effectiveness
Rejected based on lack of clinical evidence
50. Discussion points
Which threshold is correct?
• ICER at which there is a 50% chance of rejection?
• ICER that maximises specificity or sensitivity?
• ICER above which there is a 50% chance of rejection?
Are the multi-part models realistic? Which is best?
Is there any way that we could better measure
innovation, severity and/or uncertainty?
Is the % of decisions correctly classified the best way
to select models?
• Can we pool AIC across datasets?
Is it reasonable to select models on prediction
accuracy without a validation sample?
How should we present the data?
51. Acknowledgments
We would like to thank:
A consortium of 12 companies that provided a
research grant to facilitate the initial data collection
and modelling
HTAinSite and (in particular) Carmel Guarnieri and
Zoe Philips for providing the data used in this
analysis
Members of HERC, ScHARR and HESG for their
comments on our earlier work