This paper provides a structured way of thinking the design and construction of a composite indicator whose purpose is to facilitate ranking EU countries by Health System Performance and/or assess their progress over time on complex and multi-dimensional health issues.
Patna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Kaouthar lbiati-health-composite-indicator
1. DEPARTMENT OF SOCIAL POLICY
Policy Question
What would be the best way to create the overall
summary indicator? Do you agree with the
weighting mechanism currently in place to
combine indicators AND the methodology used to
form the overall summary indicator?
Author Kaouthar Lbiati (MD, MSc)
Academic Year 2015/2016
Note: For the sake of simplification, we will refer to the summary composite indicator as composite indicator or by its
acronym “CI”
2. Objectives
The purpose of this paper is to provide guidance on the construction
process of a Composite Indicator, whilst meeting quality objectives, to
facilitate ranking countries participating in The Euro Report Card by Health
System Performance and/or assessing their progress over time on complex
health issues.
The Euro Report Card objectives are to: i) evaluate and promote
standardized methods of healthcare delivery across the European Union
“EU”, ii) identify the most effective processes and isolate the inefficient
ones, iii) promote more informed decision-making on a European level
across a set of norms for health provision.
Throughout this paper, the arguments and recommendations exposed are
developed from the perspective of the Policy Advisor to the Health
Minister of Sweden. After introducing you to our thinking process, and
describing the issues associated with the summary index currently in use,
we will adopt a step-by-step approach to describe a new construction path
and justify the methodological choices we made at each step. We stress the
need for multivariate analysis, we deal with the problem of missing data
and with the techniques used to bring into a common unit the indicators that
are of very different nature. We explore different methodologies for
weighting and aggregating (of indicators into a composite). Finally, we
3. provide guidance on how to handle the uncertainty and test the robustness
of the Composite Index.
The Thinking Process
The premises of a random index are as follows: i) the quality of the
composite indicators is not only a function of the quality of its underlying
data but also of that of the methodological process used to build the
composite indicator itself, ii) measurement indicators composing a
summary indicator ought to be appropriate, robust and reflective of health
systems performance, iii) properties such as attribution and temporality, iv)
fitness for purpose and tangible benefits from measurement.
The performance measuring model we propose reflects not only the
characteristics of real-world performance system (s) but also the multi-
dimensionality and multi-scale representation of the same issue. Clearly the
ability of this composite to represent multidimensional concepts largely
depends on the quality and accuracy of its components.
According to Smith et al, 2002, a CI summarizes multi-dimensional issues,
in view of supporting decision-makers, it is easier to interpret than trying to
find a trend in many separate indicators. However, if poorly constructed or
misinterpreted or if dimensions of performance that are difficult to measure
are ignored, it could send misleading policy messages or draw simplistic
policy conclusions.
4. We appreciate synergies and conflicts might appear while performance is
measured both on a national and at the EU scale. However, we believe that
the methodological choices we made could be implemented by all 27-EU
countries which in turn will help improve the overall performance of their
healthcare system and standardize quality and health outcomes at the EU
level in the short and long term.
Issues with the current scoring
1- The Re-scaling method currently used, which is based on the range
rather than the standard deviation, leads to many distortions. Thus,
preventing accurate direct comparisons between countries.
Details of the Euro Report Card:
“The Euro Report Card assesses performance of the 12 countries on a number of performance
domains. These include coverage, quality, efficiency and equity. The Report Card includes a number
of indicators for each of these domains, and reports each countries performance on that domain.
Where a country is missing information they are assigned the lowest value from the existing data.
You are provided with two copies of the Report Card – one with the actual numbers where available
and the other one with symbols indicating the relative value of the indicator (where the full circle
indicates that the country is in the highest 33rd percent, the half circle indicates the country is
between the 33rd and 66th percentile, and the empty circle indicates that the country is in the
lowest 33rd percent).
For each of the domains being reported on the Report Card, a composite indicator is created to
assess relative performance across all indicators. This composite is a simple average of the
attainment on each indicator. In order to construct this average, all values of the indicators are
transformed so they lie between 0 and 100 (with 100 indicating the highest value and 0 the lowest).
Finally, these average scores for each domain are used to construct an overall composite measure of
overall performance, and countries are ranked according to this final measure”
5. 2- Scale effect (i.e different measurement units in which an indicator
can be expressed) and/or outliers (extremes values) are not taken into
account
3- One of the assumptions of the current scoring is the presence of a
centralised health information system whereby countries reliably
gather, analyse, evaluate and report data. Yet, Countries may differ by
the level of detail provided. Under the current scoring, the minimum
value from existing data is to be reported when data is missing. Our
view is that this may encourage under-reporting or misreporting.
Moreover, we do have serious concerns with missing and/or poor
quality data being given 100% weight within the CI.
4- Some of the current set of sub-indicators nested within the EU card
present distortions: collinearity (risk of double counting), lack of
attribution (bias), and/or fitness for purpose (reflect inaccurately the
issue)
The New Construction Pathway
No model is a priori better than another, provided that internal coherence is
always assured, as each model serves different interests (Tools for
Composite Indicators Building, 2005). In our proposal, the construction line
of a composite indicator involves the following stages: Multivariate
Analysis, Selection of Data, Data Editing, Normalisation, Weighting,
Aggregation, and Sensitivity/Uncertainty analysis.
1- Multivariate Analysis - Principal Component Factor Analysis
6. Description and objectives
i) investigate the structure of the composite by grouping information
along one dimension or more, i.e. along indicators and along
constituencies (e.g. regions, municipalities etc.), ii) check whether the
dimensions in the present score card are well balanced, iii) extract
statistical correlations between the indicators using Principal
Components Analysis (PCA).
Principal components factor analysis is most preferred in the development
of CI (Nicoletti et al. 2000), as it has the virtue of simplicity and allows the
construction of weights representing the information content of indicators.
The disadvantage however is that it is sensitive to the presence of outliers,
which may introduce a spurious variability in the data.
The objective of the PCA is to find linear combinations among the raw
variables in order to perform a reduction and produce principal components
that are uncorrelated. The lack of correlation in the principal components is
a useful property because it means that the principal components are
measuring different “statistical dimensions”. However, it is not strictly
necessary from a technical point of view for highly collinear variables to be
excluded. In the presence of this pattern, one may want to make the weight
for a given indicator inversely proportional to the arithmetic mean of the
coefficients of determination for each bivariate correlation that includes the
given indicator. Also, one should introduce a rule of thumb to define a
threshold beyond which the correlation is a symptom of double counting.
7. 2- Selection of data and Data Editing
As stated above, some of the current set of sub-indicators nested within the
EU card to measure performance in the four domains; coverage, quality,
efficiency and equity present distortions. Therefore, an update of the Report
Card is useful.
In the coverage section, we think of the use of “Proportion of Births
attended by skilled health personnel” to be futile as more robust maternal
health indicators, such as “Infant Mortality,” are already present on the
current Report Card. Removing this metric eliminates risk of double
counting.
In the quality section, replace “GPs per 100,000” metric with “measure of
avoidable hospital admissions due to asthma, COPD, and CHF”. We think
this will better reflect the effectiveness of primary care to manage chronic
diseases. Data to be age-sex standardized, with equal weighting applied to
each component measure (asthma, COPD, and CHF).
In the coverage section, replace “Direct access to specialist” metric with a
composite measure of average waiting times for elective surgeries (hip and
knee replacement(s) and cataract). We believe our proposal takes better into
account the Role of “gatekeeping”, as an organizational feature of health
systems, which would have biased “Direct access to specialist” rates.
We also propose to move “Out-of-pocket payments as a percentage of
Total Health Expenditure” from the coverage to the equity section.
8. We welcome any further suggestions in the selection/reduction of the
number of indicators. However, once EU countries agree upon a final
structure, we do not support the “opt-out” option (of specific indicators).
Our view is that, for a benchmarking, the end result should consider having
an equal number of indicators in each category for all countries.
3- Data Editing - Multiple Imputation
According to Frenck 2010 and Olejaz et al. 2012, nearly all datasets used
for global-health metrics and evaluation have substantial issues with
missing data. The problem of having incomplete information is due to
multiple reasons. For instance, the absence of proper registration and
accountability systems, under-reporting, lack of standard approach for
addressing this challenge (even in the World Health Report 2000, there is
no clear methodology). Yet, we believe transparency in the reporting of
missing data should be a key priority for all EU-countries. Incentives to
collect data should be offered and cost effective data collection systems
implemented across the EU. We do not support the use of penalties as a
means of sanctions. Instead, we propose audit and assistance.
Since, there is no basis upon which to judge whether data are missing at
random or systematically, when there are reasons to assume a non-random
missing pattern, then this pattern must be explicitly modelled and included
in the analysis. There are two generic approaches for dealing with missing
data: i) case deletion implies either to remove the country or the indicator
from the analysis, ii) Imputation.
9. The advantages of imputation include the minimisation of bias and the use
of ‘expensive to collect’ data that would otherwise be discarded. There are
two types of imputations: single imputation and multiple imputation (using
the Markov Chain Monte Carlo (MCMC) method). Single Imputation is
known to underestimate the variance, because it reflects partially the
imputation uncertainty. Thus, it does not fully allow assessing the
robustness of the composite index. The Multiple Imputation method
instead, provides several values (from the predictive distribution of the
missing data), for each missing value, effectively representing the
uncertainty due to imputation. As such, we favour the use of Multiple
Imputation over Single Imputation.
4- Normalization- Standardization-Z-scores
There are a number of normalization methods available, such as ranking,
standardization, re-scaling, categorical scales, balance of opinions among
others. The elected normalization method should take into account both; the
data properties and objectives of the composite indicator.
An indicator with extreme values will have intrinsically a greater effect on
the composite indicator. In the presence of extreme values, normalisation
methods that are based on standard deviation (or distance from the mean)
are preferred. However, we think of an “extremely good” result on few
indicators to be better than a lot of average scores. Since our intention here
is to reward exceptional behaviour, we recommend not to correct for this
feature during the aggregation step, e.g. by excluding the best and worst
10. sub-indicator scores from the inclusion in the index. Instead, we encourage
assigning differential weights.
As per the scale effect, a proper normalisation method should be applied to
remove the scale effect from all indicators simultaneously. This will ensure
a fair comparison of the performance and control for variations across EU
countries. Ranking each indicator across countries is a method which is
insensitive to outliers. Nevertheless, drawing conclusions about differences
in performance would be daunting. Other methods like re-scaling are
sensitive to extreme values which can have a distortion effect on the
transformed indicator. Whereas the categorical scales using percentiles
scores (current method), do not allow to track improvements year by year.
We recommend a standardization method using Z-scores as this has been
used for WHO index of health system performance (SPRG, 2001). It is the
most commonly used because it converts all indicators to a common scale
with an average of zero and standard deviation of one. The average of zero
means that it avoids introducing aggregation-distortions stemming from
differences in indicators means. The scaling factor is the standard deviation
of the indicator across the countries. It assumes though a normal
distribution.
5- Weighting- Budget Allocation “BA” and data envelopment analysis
“DEA”
No agreed methodology exists to weight individual indicators. Weights are
essentially value judgements about the relative importance of different
performance indicators within the CI and about the relative opportunity cost
11. of achieving those performance measures or outcome of interest. Weights
usually have an important impact on the results of the CI especially
whenever higher weight is assigned to indicators on which some countries
excel or fail.
In our model, we have followed a participatory approach; a combination of
the weighting method of Budget Allocation and Data Envelopment
Analysis. Both are based on experts’ opinion and not on statistical
manipulations. Experts’ opinions are likely to increase the legitimacy of the
composite and create a consensus for policy action. These methods bring
together experts that have a wide spectrum of knowledge (clinicians,
statisticians, information technology, quality data…etc), experience and
concerns, so as to ensure that a proper weighting system is found.
In the BA method, experts are given a “budget” of N points, to be
distributed over a number of indicators, “paying” more for those indicators
whose importance they want to stress (Moldan and Billharz, 1997). Budget
Allocation method is likely to produce inconsistencies for a number of
indicators higher than 10 (reason why we proposed to reduce the number of
indicators in section “Data editing”). And if we did not use the “Multiple
Regression” which can handle a larger number of indicators, it is because
this method implies the existence of a dependant variable that is not in the
form of a composite indicator.
As per DEA, experts are asked to locate the target in the efficiency frontier
which is used as benchmark (Korhonen et al. 2001). The weighted
performance indicator in this case is the ratio of the distance between the
12. origin and the actual observed point and that of the projected point in the
frontier. The limit of this method is how one determines the efficiency
frontier?
Weighting methods may also address other issues: i) reflect the underlying
data quality of the indicators (account for the missing values), ii) reward
performance, iii) overcome the statistical problem of double counting when
two or more indicators are measuring the same behaviour, iv) induce
behaviour change among care providers.
As stated previously, our model addresses the issue of missing data through
multiple imputation and collinearity between indicators (by making the
weight for a given indicator inversely proportional to the arithmetic mean
of the coefficient(s) of determination for each bivariate correlation that
includes the given indicator). We also encourage assigning differential
weight based on the “desirability” of rewarding outstanding performance.
6- Aggregation - Geometric Aggregation
Aggregation is defined as the process of aggregating the information
conveyed by the different dimensions into a Composite Index. There are
several methods: Linear Aggregation, Geometric Aggregation and the
Multi-Criteria Analysis; all with different properties (i.e compensability)
and implications (on the theoretical meaning of the use of the weights).
As per the compensability; that is if deficits in one dimension can be offset
by surplus in another dimension, this feature is constant in the Linear
Aggregation whereas in Geometric Aggregation it is partial and lower when
the score is low. As a result, a country would be more interested in
13. increasing those indicators with the lowest score in order to have the
highest chance to improve its position in the ranking if the aggregation is
Geometric.
Furthermore, for the weights to be interpreted as “importance coefficients”
(place the greatest weight beside the most important “dimension”), non-
compensatory aggregation procedures must be used to construct composite
indicators (Podinovskii, 1994). While it is true that the Multi-Criteria
Analysis is a non-linear techniques with no possibility of trade-off between
indicators, however the focus point of its calculation are the weights.
Since the weighting methods we opted for in our model are subjective and
that the ultimate goal is to encourage countries to constantly improve their
ranking, we recommend the use of Geometric Aggregation.
7- Uncertainty Analysis (UA) and/or Sensitivity Analysis (SA)
Both give a useful insight on the full process of creation of the CI including
the contribution to the indicators’ quality definition and an assessment of
the reliability of final countries’ ranking (volatility). They also increase
transparency.
Uncertainty analysis (UA) focuses on how uncertainty in the input factors
propagates through the structure of the CI and affects the composite
indicator values whilst sensitivity analysis (SA) studies how much each
individual source of uncertainty contributes to the output variance. In the
field of building CI, UA is more often adopted than SA (Jamison and
Sandbu, 2001; Freudenberg, 2003)
14. It is commonly admitted that all steps of composite indicator building can
introduce uncertainty. In our model, the uncertainties of higher order
are expert selection and weighing scheme. These can be translated into a set
of “scalar input factors”, to be sampled from their distributions.
As argued by practitioners (Saltelli et al., 2000a, EPA, 2004), robust,
“model-free” computational techniques for Sensitivity Analysis should be
used for non-linear models. Composite Indicator model is a non-linear
model. Among the computational existing models, our recommendation is
Monte-Carlo-based approaches to robustness analysis which consists on
performing multiple evaluations of the model with k randomly selected
model input factors.
The iterative use of uncertainty and sensitivity analysis during the
development of a composite indicator can contribute to enhance its
robustness and transparency.
Conclusions
This paper provides a structured way of thinking the design and
construction of a composite indicator whose purpose is to facilitate ranking
EU countries by Health System Performance and/or assess their progress
over time on complex and multi-dimensional health issues.
We have analysed the drawbacks of the method currently in use and shown
its limitations in meeting the expectations of providing policy makers with
regard to providing an accurate and sound measure to allow timely and
appropriate responsiveness.
15. Bringing indicators into the same unit, handling the issue of missing data,
defining the relative importance of the indicators within the composite
indicator and the compensability rule between indicators and testing the CI
robustness for the uncertainties of high order are paramount actions in the
process pathway. Finally, one should not consider Composite Indicator as a
goal per se, they are, at least in our view, a good starting point for an
informed discussion.
References
1. Smith et al, 2002
2. Tools for Composite Indicators Building, 2005. The Applied Statistics
Group, Institute for the Protection and Security of the Citizen
Econometrics and Statistical Support to Antifraud Unit. I-21020 Ispra
(VA) Italy
3. Nicoletti et al. 2000
4. Frenck 2010 and Olejaz et al. 2012
5. Moldan and Billharz, 1997
6. Korhonen et al. 2001
7. Podinovskii, 1994
8. Jamison and Sandbu, 2001;
9. Freudenberg, 2003
10. Saltelli et al., 2000a, EPA, 2004