BioSHaRE conference July 28th, 2015, Milan - Latest tools and services for data sharing
Stream 1: Tools for data sharing analysis and enhancement
Opal is a software application to manage study data, and includes a feature enabling data harmonisation and data integration across studies. As such, Opal supports the development and implementation of processing algorithms required to transform study-specific data into a common harmonised format. Moreover, when connected to a Mica web interface, Opal allows users to seamlessly and securely search distributed datasets across several Opal instances.
Opal is freely available for download at www.obiba.org and is provided under the GPL3 open source licence. All studies or networks of studies using the Opal software for data storage, data management or data harmonisation must mention Opal in manuscripts, presentations, or other works made public and include a web link to the Maelstrom Research website (www.maelstrom-research.org).
Mica is a software application developed to create web portals for individual epidemiological studies or for study consortia. Features supported by Mica include a standardised study catalogue, study-specific and harmonised variable data dictionary browsers, online data access request forms, and communication tools (e.g. forums, events, news).
When used in conjunction with the Opal software, Mica also allows authenticated users (i.e. with username and password) to perform distributed queries on the content of study databases hosted on remote servers, and retrieve summary statistics of that content.
Mica is a Java-based, cross-platform, client-server application and comes along with the following two clients: the administrators' user interface and a content management system (Drupal) used to render the catalogue content on the study or consortium.
Mica is freely available for download at www.obiba.org and is provided under the GPL3 open source license.
Repositories unleashing data and Jisc projectsJisc RDM
Similaire à BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research (20)
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
BioSHaRE: Opal and Mica: a software suite for data harmonization and federation - Vincent Ferretti - Ontario Institute for Cancer Research
1. A SOFTWARE SUITE FOR DATA
HARMONIZATION AND FEDERATION
Vincent Ferretti
Ontario Institute for Cancer Research
2. The Maelstrom Research Software Suite
Software development started in 2007
$3,800,000 CAD of investment so far
Onyx
Opal
Mica DataSHIELD
Collection
Storage
Management
Harmonization
Publication Analysis
3. Some User’s Stories
Name Type Activities Tools
The Canadian Longitudinal
Study on Aging (CLSA)
Single study
50,000 participants
Collection, management,
portal
The Canadian Partnership for
tomorrow project (CPTP)
Study consortium
5 studies, 300,000
participants
Collection, harmonization,
portal
BBMRI-LPC
Network
>30 studies
Cataloguing
Maelstrom Research Research project
Cataloguing,
harmonization
Interconnect Network
Cataloguing,
(harmonization, federated
data analysis)
BioSHaRE Network
Cataloguing,
harmonization, federated
data analysis DataSHIELD
Onyx
Opal
Mica
Onyx
Opal
Mica
Mica
Opal
Mica
Mica
Opal
Mica
4. 1 - Data Harmonization with Opal
The Canadian Partnership for Tomorrow Project (CPTP)
5 cohorts with baseline data on ~ 300,000 participants
• 5 Different legislations, questionnaires, data access policies, languages,
etc.
Project’s objectives
• To create harmonized datasets across the 5 cohorts
• To create a data portal to browse harmonized datasets and request
access to them
Phase 1
The baseline Health and Risk Factor
questionnaire (CoreQx)
• 716 harmonized variables
5. Opal Software
A database application for integrating and storing data from
multiple and heterogeneous sources
• Used by studies to create central data repositories
6. Metadata in Opal
Projects -> tables -> variables
Tables are defined by a customizable dictionaries in Excel format
Variables are annotated with an arbitrary number of attributes
Controlled vocabularies - Taxonomies - (e.g. ICD-10)
Maelstrom Research variable classification
More than 130 terms in 17 classes (e.g. Reproduction, Physical Measures)
Variable Name Attribute Name Attribute Value
Cancer_type Diseases Neoplasm
Asthma_ever Diseases Respiratory system (J00-J99)
Ever_smoke Question label [EN] Have you ever smoked?
[FR] Avez-vous déjà fumé?
Ever_smoke Health
behaviors
Tobacco
7.
8. Data Derivation
Opal derive new variables by executing custom JavaScript code
Useful for data validation, curation and harmonisation
User-friendly interfaces for
recoding variables
JavaScript API for more
advanced derivation
9. JavaScript code executed by Opal when needed
Derived data is not persisted – Views or Virtual tables
12. Deriving the CoreQx datasets with Opal
How to query and access these harmonized datasets?
13. The Mica Software
Software to create web data portals for individual studies or for
study consortia
Study catalogue
• MR Standard description of
longitudinal studies
• Publication workflow
Datasets
• Data dictionaries, data
harmonization,
• database federation
Data Access
• Online forms, requests
management workflow with
roles
Data Persistence
MongoDB
Opal Server
Mica Server
Mica2
New client-server architecture
22. 2 - Advanced Cataloguing with Mica
Maelstrom-research.org
Maelstrom Research web site is powered by Mica
Includes a catalogue of international networks and studies with
annotated dictionaries
Current version
• 6 Networks
• 129 Studies
• 222 datasets
• 182,622 Variables
25. 3- Data Analysis
The BioSHaRE Healthy Obese Project
10 studies from 7 European countries
200,000 subjects
The HOP dataset - 103 harmonized
variables
How to analyze these datasets
» without pooling data
» without accessing individual-level
data?
32. More Information
www.maelstrom-research.org
www.obiba.org
Code available at github.com/obiba
Let us know and acknowledge Maelstrom Research if you are using
our software, it’s important for our funding and our ability to
provide support
33. Acknowledgement
Yannick Marcon and his software developer team
The Maelstrom Research scientific team
The research leading to these results has received funding from the
European Union Seventh Framework Programme (FP7/2007-2013) under
grant agreement n°261433 (Biobank Standardisation and Harmonisation
for Research Excellence in the European Union - BioSHaRE-EU)