Talk from Kees van Bochove, The Hyve at SCOPE Summit, Real World Data track, Jan 26, 2017, Miami
A large open source initiative for standardisation and epidemiological analysis for real world data is OHDSI: Observational Health Data Sciences and Informatics. OHDSI leverages the OMOP common data model for observational data, and provides data analysis tools for a broad range of use cases. This talk will explain OMOP and OHDSI with case study IMI EMIF, in which health data from over 50 million patients from 13 national and regional European registries is brought together.
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national European health data registries: the IMI EMIF project
1. Open source community for
“Real World Data” Analysis
JANUARY 26, 2017, SCOPE SUMMIT, MIAMI
Kees van Bochove, CEO & Founder, The Hyve – @keesvanbochove
With thanks to Patrick Ryan, Nigel Hughes & Bart Vannieuwenhuyse from Janssen for slides & feedback!
2. 2
Agenda
1. Introduction: The Hyve & Open Source
2. What’s OHDSI & what can it do for you?
3. Under the hood: OMOP Data Model & Mapping Process
4. Showcase: OHDSI data analytics tools
5. The application of OMOP and OHDSI in IMI EMIF
5. Interdisciplinary team
so5ware engineers, data scien1sts, project managers & staff; exper1se in
bioinforma1cs, medical informa1cs, so5ware engineering, biosta1s1cs etc.
5
6. Open Source
u Source code openly accessible and reusable for everyone
u Enables pre-competitive collaboration: both academics and
industry can use and enhance it
u Transparency: verification (scientific as well as IT security) can be
done by anyone, no ‘black box’
7. 7
3 Health Data Areas The Hyve is active in
u Translational Research Data
(‘Clinical & bioinformatics data’)
u Population Health Data
(‘Real world data’)
u Personal Health Data
(‘Mobile & sensors data’)
Example (RWD) projects:
10. 10
What is OHDSI to you?
u OHDSI is a scientific community to develop best
practices for observational research studies
u OHDSI is a data network bringing together data from
over 650 million patients worldwide to execute studies
u OMOP is an open data model and OHDSI is a suite of
open source software tools for analysis (epidemiology,
but also e.g. inclusion/exclusion criteria feasibility)
12. 12
Questions OHDSI can answer
Clinical
characterization
Population-level
effect estimation
Patient-level
prediction
Which treatment did
patients choose after
diagnosis?
Which patients chose
which treatments?
How many patients
experienced the
outcome after treatment?
Does one treatment
cause the outcome more
than an alternative?
Does treatment cause
outcome?
What is the probability I
will develop the disease?
What is the probability I will
experience the outcome?
14. 14
How are patients with major depressive
disorder treated in real world data (250M)?
http://bit.ly/2jYCGkI
15. 15
Informing Clinical Trial Design
u Designing and testing inclusion/exclusion criteria for trials
u Performing observational studies as a basis for choosing
effective randomized clinical trial designs and targets
u Elucidating real world use of medicines and treatments
for safety purposes
17. 17
OMOP & OHDSI Tools - Overview
u OMOP: Common Data Model for observational healthcare data:
persons, drugs, procedures, devices, conditions etc.
u OHDSI: Large-scale analytics tools for observational data
An open source community, a.o. developing:
u Tools to support the ETL / mapping process into OMOP (White Rabbit etc.)
u Tools to perform analytics: e.g. Achilles for data profiling, Calypso for
feasibility assessment à now being integrated into ATLAS
www.omop.org
www.ohdsi.org
18. 18
OMOP Common Data Model v5.0
v OMOP =
Observational
Medical
Outcomes
Partnership
v CDM = Common
Data Model
v SQL Tables
21. 21
Mapping the source data to OMOP CDM
ETL design ETL implementation
White Rabbit
Source data inventarisation
Rabbit in a Hat
Map source tables to
CDM structure
Toolsused
Usagi
Map source terms
to CDM ontologies
(vocabulairies)
syntactic mapping semantic mapping
ETL verification
Achilles
Review database
profiles
Review data
quality assesment
(Achilles Heel)
22. 22
Output from White Rabbit
Tab “Overview”: fields for each table
Tab “Medication”: per table values in fields and frequencies
=Medication name
24. 24
v All coded items (gender, race etc) need to be mapped
v Mapping of Medication, Diagnosis, procedures values to
appropriate ontology (RXNorm, ICD-9 etc)
Map terms to target vocabularies
NHANES Gender code
NHANES Gender
description
Equivalent OMOP
SOURCE_CODE
OMOP
SOURCE_CODE_DESCRIP
TION
SOURCE_TO_CONCEPT_M
AP_ID
. missing U UNKNOWN 8551
1 Male M MALE 8507
2 Female F FEMALE 8532
30. 30
What can I do with OHDSI tools?
u Explore & QC the mapped data
u Build cohort definitions using concept sets
u Look at patient profiles
u Run and evaluate queries for clinical study
feasibility assesment
40. To become the trusted
European hub for health
care data intelligence,
enabling new insights into
diseases and treatments
EMIF vision
40
Discover
Assess
Reuse
41. The real story of the treatments in clinical practice
41
The value of healthcare data for secondary uses in clinical research and development — Gary K. Mallow, Merck, HIMSS 2012
1 2 3 4 5 6 7 8 9
1,000
10,000
100,000
1 million
Years
#PatientExperiences/Records
The “burning platform” for life sciences
Pharma-owned highly controlled clinical trials
data
Clinical practice, patients, payers and providers
own the data
Product
Launch
R&D
Phase IV
Challenge
Today, Pharma doesn’t have ready access to this
data, yet insights for safety, CER and other areas are
within this clinical domain, which includes medical
records, pharmacy, labs, claims, radiology etc.
42. Data available through EMIF consortium
§ Large variety in “types” of data
§ Data is available from more than 53 million subjects from seven
EU countries, including
Primary care data
sets
Hospital data Administrative data Regional record-
linkage systems
Registries and
cohorts (broad and
disease specific)
Biobanks
>25,000
subjects in
AD cohorts
>90,000
subjects in
metabolic cohorts
43. 43
EMIF Platform Design
Data
access
Module
Data
access
Module
Extract
Site Y
Site Z
Extract
CommonOntology/De-identification
EMIF
platform
solution
Governance
Data owners Researchers
User
admin
User
admin
Remote
user 1
Remote
user 2
Data Sources
1° care
Hospital
Admin
Regional
Registries
& cohorts
Biobanks
2° care
Paediatric
47. 47
Automatic Mapping of Drug Concepts to
the RxNorm Vocabulary
Maxim Moinat* [1], Lars Pedersen [2], Jolanda Strubel [1], Marinel Cavelaars [1], Kees van Bochove [1], Peter Rijnbeek [3], Michel van
Speybroeck [4], Martijn Schuemie [4]
[1] The Hyve, Utrecht, The Netherlands
The Hyve, Cambridge, United States
[2] Aarhus University Hospital, Aarhus, Denmark
[3] Erasmus MC, Rotterdam, The Netherlands
[4] Janssen Pharmaceuticals, Inc.
*E-mail: maxim@thehyve.nl.
1. Background
Mapping source concepts to the standard concepts in the OMOP vocabularies is one of the most
time-consuming tasks during the transformation to the OMOP Common Data Model. Drug mapping is in
particular challenging, because different components have to be mapped: ingredient, dose form and
strength.
As part of the European Medical Information Framework (EMIF) project, Danish population health data are
mapped to the OMOP CDM, including the local drug codes. The Hyve assists in creating a script to
automatically map a set of 4754 drugs to the RxNorm vocabulary. The input data contains ATC codes,
dosage forms, numerical strengths and strength units. Two examples are shown in Figure 1.
The mapping procedure presented here is based on the drug mapping for the Japan Medical Data Center
Claims DatabaseI
.
We empower scientists by building
on open source software
2. Mapping Procedure
The mapping uses the RxNorm hierarchy and consists of four steps (see Figure 2).
1. Drugs are mapped to RxNorm Ingredient via the 5th
level ATC code. The OMOP
relationship ‘ATC - RxNorm’ is used for this purpose.
2. Dose form is added to the ingredient level, to map to Clinical Drug Form level.
3. The information on drug strength (including unit) is added to map to Clinical
▲ Figure 1: Examples of input data. Example 1 is successfully
mapped automatically. Example 2 consists of two ingredients and has
an ATC concept that could not be mapped to a RxNorm concept.
➢ Risperdal
➢ N05AX08
➢ Filmovertrukne tabletter
➢ 0.5
➢ MG
Example 1
Example 2
➢ Fortzaar
➢ C09DA06
➢ depottabletter
➢ 100 + 25 mg
Risperidone 0.5 MG Oral Tablet
(RxNorm Clinical Drug)
Condesartan and diuretics
(ATC code)
Mapped to
48. 48
Use of OMOP/OHDSI provides EMIF with:
u A uniform way to perform suitability and feasibility
queries across multiple diverse European data sources
u An entry point to quickly initiate and perform
observational studies within one or more data sources
u Direct insight & dashboarding of data for data owners
(e.g. national registries, hospitals)
49. The goal is patient benefit
49
Prof. Johan van der Lei
Erasmus MC University Medical Center
“We need to learn from experience and find ways
to unite the large volumes of data in Europe. At
the end of the day, we are in this for better health
care.”
Co-coordinator EMIF-Platform
EMIF-Platform