Poster presented at SAPC 2013 at SAPC conference, Nottingham on 3/07/2013.
David A. Springate, Evangelos Kontopantelis, David Reeves
Primary Care Research group, Centre for Biostatistics, University of Manchester
Publications in healthcare research using electronic medical records (EMR) databases are increasing at an exponential rate. Because of the amount of data available in large EMR databases, it has been suggested that they may be able to provide results of equal validity to randomised controlled trials. EMR studies rely on clinical codes (such as Read codes) to provide standardised and expressive means for medical professionals to record clinical information. The validity of these studies is dependent on (among other things) the validity of the clinical codes that are used to define the population of interest and their disease conditions.
Clinical codes should be held to scrutiny in the same way as other methods since if the inclusion/exclusion criteria for a given condition is invalid then so will be the rest of the study. Also, it should be possible to replicate a given study (e.g. in a different EMR database) based on the information provided in the original paper, not possible if the lists of clinical code definitions are not provided. Furthermore, access to historical code-lists allows researchers and clinicians to make incremental improvements to disease and other definitions, building on and avoiding unnecessary replication of previous work
There is currently no obligation to publish clinical code lists and no centralised repository to hold them. Consequently, the vast majority of database studies do not publish their clinical codes and as such are impossible to be fully validated or replicated. To illustrate this, we looked at 45 UK case-control EMR database studies indexed on PubMed and found that only five had any record of any clinical codes in their methodology sections. Of these five, only two published code lists in online appendices and only one provided a full set of codes that would allow for proper replication of the study.
We have built an online repository where researchers can deposit their clinical codes at the time of publication in a standardised way, as well as download historical code lists from previous studies. We have uploaded a complete set of Read codes for all versions of the Quality and Outcomes Framework and encourage all code lists published by major medical organisations to be deposited. Reproducibility and validity of EMR database studies would be greatly aided if deposition of all clinical codes was a prerequisite for publication of all future database studies. The ability to build on code lists from historical studies during the development of new code lists will also ease a considerable bottleneck in database study design, removing the need for a huge deal of “reinventing of the wheel” each time a new EMR-based study is undertaken.
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Improving validity and reproducibility of primary care database studies: An online clinical codes repository
1. PosterTemplatefromwww.manchester.ac.uk/photographics
Funded by the National Institute for Health Research (NIHR) School for Primary Care Research.
This is a summary of independent research funded by the NIHR. The views expressed are those
of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
Background
Centre for Primary Care: www.population-health.manchester.ac.uk/primarycare
Centre for Biostatistics: www.population-health.manchester.ac.uk/biostatistics
For example, in 45 UK PCD case-control studies into diabetes:
Only 5 reported ANY clinical codes in the paper
Only 2 published lists of codes in an online appendix
Only 1 Provided a full set of code lists for the study
So, in 44 out of 45 studies, given the original paper, there is no
way of independently assessing the validity of how diseases,
exposures, treatments or outcomes were defined or of
replicating the research.
1. Shephard et al. Family Practice 2011; 28:4
2. Brookhart et al. Medical Care 2010; 48:6
3. Jordon et al. Family Practice 2004; 21:4.
4. Smeeth et al. Family Practice 2006; 23:5
• All historical code-lists
published on
clinicalcodes.org are freely
downloadable
• The repository will host the
full set of Read codes from
all versions of the Quality
and Outcomes Framework
(QOF) from 2004 – 2012
• Other clinical codes (e.g.
SNOMED, ICD-10) can
also be deposited
• Can also be used to host
database-specific coding
information (e.g. CPRD
consultation types)
• Codes from other
Electronic medical record
database studies can also
be hosted
The repository would be most effective if…
• Deposition of all clinical codes was a prerequisite for publication of all
future PCD studies
• Funding bodies included deposition as a pre-condition for funding
• Authors of historical PCD studies were encouraged to deposit their
codes post-publication
We are working to automate up/download via an API
Challenges
Improving validity and reproducibility
of primary care database studies:
An online clinical codes repository
University of Manchester Centre for Primary Care / Centre for Biostatistics
David A. Springate, Evangelos Kontopantelis, David Reeves and Ivan Olier
Knowledge of the clinical codes used in a PCD study is critical to
determining its validity because…
1. The ways in which populations, diseases, exposures, treatments and
outcomes are defined through code lists can have a large impact on the
results
2. If a code list for a study is unavailable, that study is impossible to
properly validate or replicate
However, there is currently no…
• Obligation on researchers to publish clinical codes lists by journals or
research councils
• Centralised repository to hold archived clinical code lists
This means that the vast majority of published PCD studies do not
publish their clinical codes and so are impossible to properly validate.
The problem
We are developing an online repository where researchers can freely
deposit their lists of clinical codes at the time of publication in a
standardised way, along with links to the original paper and meta-data on
the codes used.
www.ClinicalCodes.org
• Clinical codes can be held to scrutiny and peer-review in the same way
as any other research methods
• Replication of previously published studies (e.g. in different databases) is
facilitated
• Access to historical code-lists allows researchers and clinicians to make
incremental improvements to disease (and other) definitions, building on
and avoiding unnecessary replication of previous work
• Clinical code lists can become a resource for future research in their own
right (e.g. tracking disease definitions through time)
Advantages
david.springate@manchester.ac.uk
www.clinicalcodes.org – online from July 2013
@medcodes – follow us to keep up with updates to the repository
Contact and references
• Large Primary Care Databases (eg CPRD, Qresearch, THIN) are
increasingly used to address a wide range of research questions1
• Much research has been done into establishing the validity of statistical
analysis of PCD data, e.g. the role of confounding variables2, GP
recording quality3 and selection bias4
• BUT PCDs also rely on clinical codes (such as Read codes) to provide
standardised means for medical professionals to record clinical
information. The validity of PCD studies depends upon the validity of the
clinical codes used to define the population of interest, their disease
conditions, exposures, treatments and outcomes
New primary care
database (PCD) studies
are being published at an
exponential rate