💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
eTRIKS Data Harmonization Service Platform
1. eTRIKS Data Harmonization Service Platform
Putting Standards IntoAction
TranSMART Foundation Annual Meeting
San Diego 25 October 2016
Ibrahim Emam
Data Science Institute
Imperial College London
2. The eTRIKS Project
> 50 safety and efficacy IMI projects
generating or aggregating pre-clinical and clinical data for analysis, across many different
disease areas.
eTRIKS to provide a common infrastructure and service
to support pre-competitive (& competitive) cross-institutional Translational Research
Service Project
~80% of project activities driven by demand from supported IMI projects
Members:
10 Pharma, 3 Academic, 1 standards, 2 Commercial Suppliers
3. RNA Profiling microarray
Protein Profiling CyTOF Clinical Cohort B data
Metabolic Profiling
Animal model data
Clinical Cohort A data
DATA
SOURCES
ANALYSIS
ad-hoc INTEGRATION
RESULTS
Data integration and harmonization challenge
4. Losing the big picture
Focus on technology vs. biology
Silos of curation expertise
Obstacles To Proper Data Integration
Standards don‘t go beyond
recommendations
Focus on hypothesis-driven analysis
TranSMART?
Flexible for data analysis but rather rigid for data management
5. Towards FAIR Data Principles
Find-ability
Accessibility
Interoperability
Reusability
META DATA PRESERVATION IS KEY
STANDARDS TO THE RESCUE
7. Domain Aware Platform
Facilitate adoption of community standards
Bring curation know-how to the data owners
eTRIKS Harmonization Service Goals
Enlisting standards in eHS platform
Reward Data Generators
Data Exploratory
Losing the big picture
Focus on technology vs. biology
Silos of curation expertise
Standards don‘t go beyond
recommendations
Focus on hypothesis-driven
analysis
The Challenges
10. eHS Platform Modules
Harmonized Data Repository (BioSpeak-db)
…for storing, exploring and integrating harmonized TR data
eTRIKS Curation Modules
…for servicing and managing metadata standards across projects
Data harmonization workflows
…for data standardization, curation and analytical platforms integration
11. Key Software Technologies
Back-end
ASP.NET core Web API
Storage
MySQL and MongoDB
Front-end
AngularJS
Data visualization
DC.js (Dimensional Charting)
Based on D3.js and crossfilter.js
13. Standalone data
collection UI
Standard
CDISC clinical
datasets
Local
DB
On-site Data Registry
with data-exploratory
capabilities
MDS instance
Standalone data
collection UI
Standard
CDISC clinical
datasets
Local
DB
On-site Data Registry
with data-exploratory
capabilities
Neurologists
Participating
Centre 1
MDS instance
Standalone
data
collection UI
Standard
CDISC
clinical
datasets
Local
DB
Neurologist
Participating
Centre 2
Neurologist
Participating
Centre 3
Direct shared studies / sub-
studies anonymized data
between centres 1 & 2
Submitted anonymized data to the
central Harmonized Repository
Central
Harmonized Data
Repository
Platform Deployment Plan
14. eHS Metadata Framework
Approach To Data Harmonization
Multi-faceted integrated framework describing different resources at
different levels of granularity (a meta-model)
Study Design
Datasets
Observations
15. Assays Samples Subjects Clinical Assessments
Arms Epochs ObjectivesVisits Factors
Design
Elements
Molecular data Study Design Demographics
Structural Metadata
DATASET MODEL
Contextual Metadata
TRANSLATIONAL
RESEARCH MODEL
Descriptive Metadata
OBSERVATION MODEL
Laboratory FindingsTarget annotation
ObservationObservation Descriptor
Dataset Templates
Domain
model
CDISC-SDTM
ISA-TAB
Observation Definition
16. eHS Approach to data harmonization
Step 1
Structural organization of data into standard dataset templates
(Syntactic harmonization)
Step 2
Content standardization – controlled vocabularies
(Semantic harmonization)
Step 3
Domain–aware Integration
17. Structured Datasets
Harmonization Modules
CDSIC SHARE API
Ontology Look-up service
Harmonization module
BiospeakAPI
Project Metadata
Clinical dataMolecular data
Assays
Samples
Study Context
Subjects
Clinical
Assessments
Observa ons Observa ons
Harmonized
Observations
Structuration Harmonization
Observation
Descriptors
Standard Dataset
Templates & controlled
vocabularies
Harmonized Observations
and TR elements
TR Project
unstructured
data files
Exploration
Integration
eHS Application
1 2 3
The eTRIKS Harmonization Service
Workflow & Architecture
24. Dataset filters
4- Dataset export
Export criteria based on the
selected observation, filters
and groups explored in the
viewer
Create research-focused
datasets in preparation for
analysis
Save datasets and tag them
with meta-data
Maintaining the association
between study design
elements, relationship
between subjects and model
types, samples, and assays
Support dataset publications
26. tranSMART integration : the eTRIKS transmart master tree
Focus the usage of
tranSMART as a platform
for focused research-
driven analytics and
hypothesis driven data
analysis
Reduces the curation
effort usually required to
build and map all of a
study’s data to a
tranSMART tree
One data source, many flexible tree representations
Users decide ‘what’ goes into the tree and ‘how’ they
want it structured to best fit with their analytical plans
5- EXSTENSIONS
27. Availability
Currently in testing phase with different internal eTRIKS
projects
Demo site
http://ehs.biospeak.solutions/sandbox/app
http://ehs.biospeak.solutions/blog/
Docker container distribution scheduled for release in
November
Public github repository will follow
28. Dilshan Silva, Imperial College
Chen Ze, Imperial College
Florian Guitton, Imperial College
Philippe Rocca-Serra, Oxford e-Research Centre, Oxford University
Francisco Bonachella Capdevilla, J & J
Dorina Bratfalean, CDISC
Paul Houston, CDISC
Yike Guo, Imperial College
Acknowledgments
Data is left fragmented at the sources and attempts for data integration are solely ad-hoc processes that get executed en-route to a data analysis
the usual curation exercise of mapping data to ontologies and thinking that’s it we can do data integration now
THE OLNY TIME data integration or harmonization us dealt with is as sort of the necessary evil to get to publication figure with the nice graph or the cool visualization.
The biggest challenges facing the projects boils down to the absence of a metadata framework that is tailored to the span and the scope of a TR investigation
So yes projects might solve their integration problems, get to do some cool analytics and publish their results in papers, but then what about all the data in there that wasn’t necessarily focused on in that particular analysis exercise? What aobut the the relationhsips between all these data soruces given that the study design had a context and a common design that tied in together all these data in the firs place
The biggest challenges facing the projects boils down to the absence of a metadata framework that is tailored to the span and the scope of translational research
Larry Smarr
Director of the California Institute for Telecommunications and Information Technology (Calit2)
Exploring the events vs the biomarkers relationship
Domain aware --- metadata model
Transmart rigidity with the tree design
Flexible for data analysis but rather rigid for data management
Green
This is how we can bring the data in
Orange
This is how we can enable cross study and cross domain integrations
Purple
This is how we can enable harmonization of data / observation
hypothesis – free data explorer Interactive multiple-coordinated visualizations used for filtering and grouping