SlideShare a Scribd company logo
1 of 29
Enabling simultaneous analysis of multiple
cohort studies without accessing the full
dataset: a BRISSKit use case
Dr Jonathan Tedds jat26@le.ac.uk @jtedds
Senior Research Fellow,
Health Informatics & Interdisciplinary Research Group,
Department of Health Sciences (University of Leicester)

PI #BRISSKit http://www.brisskit.le.ac.uk
http://www.astrogrid.org
(April 2008 1st public release)
Data Reuse: asking new questions

•

Hubble Space Telescope
Papers based upon reuse of archived observations now exceed those based on the
use described in the original proposal.
– http://archive.stsci.edu/hst/bibliography/pubstat.html

•

See also work by Piwowar & Vision re life sciences: “Data reuse and the open data
citation advantage”
– http://peerj.com/preprints/1/
Why open? an Open Enterprise Report
Science as
•

•

•

•

As a first step towards this intelligent
openness, data that underpin a journal article
should be made concurrently available in an
accessible database
We are now on the brink of an achievable aim:
for all science literature to be online, for all of
the data to be online and for the two to be
interoperable. [p.7]
Royal Society June 2012, Science as an Open
Enterprise,
http://royalsociety.org/policy/projects/science
-public-enterprise/report/
Issues linking data to the scientific record:
–
–
–

•

Data persistence
Data and metadata quality
Attribution and credit for data producers

Geoffrey Boulton (Edinburgh), Lead author:
– “Science has been sleepwalking into crisis of
replicability...and of the credibility of science”
– “Publishing articles without making the data
available is scientific malpractice”
BRISSKit

context:

The I4Health goal of applying knowledge engineering to close the
‘ICT gap’ between research and healthcare (Beck, T. et al 2012)
Biomedical Research Infrastructure Software Service Kit
A vision for cloud-based open source research applications
#BRISSKit
http://www.brisskit.le.ac.uk
http://www.brisskit.le.ac.uk
BRISSKit USPs












Integrated support for core research processes

Well-established mature open source applications as
protoyped in Cardiovascular, Respiratory, Cancer
Theme Biobank: UK customised
A platform for seamless management and integration
between applications
An API allows integration with existing clinical systems
Easy set up, use and administration through browser
(including on mobile devices)
Capability of being hosted in any compliant cloud
provider including UHL (NHS information governance)
www.brisskit.le.ac.uk
Email: brisskit@le.ac.uk
BRISSKit Community & Hack Event, Oct 2012
http://www.brisskit.le.ac.uk/node/35
BRISSKit Information Governance
& Security Management Work Stream
- Dr Andrew Burnham leading
1.

Information Governance Toolkit - analysis of Department of Health (DoH/NHS) IGT
requirements vs. BRISSKit organisation/project and services/tools
a) Hosted Secondary Use Team/project (Hosted IGT)
b) Acute Trust (Acute Trust IGT)

2.

IG Training Tool (NHS – University is registered)

3.

Pseudonymisation requirements

4.

Data Management Plan

5.

IT Security & standards – Penetration Testing & Security Testing

6.

Other NHS Standards/Requirements:
- Care Records Guarantee
- NHS Constitution
- NHS Records Management
- Patient Safety DSCN 14/2009, 18/2009
The semantic bridge
OBiBa Onyx
Records participant
consent, questionnaire
data and primary
specimen IDs

Bio-ontology!

i2b2
Cohort selection and
data querying

?
BRISSKit and Bio Banking
• Deploy solutions in international bio banking
initiatives
• Investment through Prof Paul Burton (Health
Sciences at Leicester/Bristol) & international
collaborations
• Building on strong informatics expertise at
University of Leicester in partnership with the
University Hospitals Leicester Trust
• Cardiovascular, Respiratory & Lifestyle BRUs
• Cancer Theme Biobank
• Genomics etc
Contemporary biobanking:
meeting the “data”
challenge
Large data sets, why bother?

•Sample size
•Depth of phenotyping
•Quality of measurement
All critical
How big is BIG?
 The direct effect of a gene
• 2,000 cases minimum, 10,000 cases better

 Environmental and life-style factors
• Highly context specific: from hundreds to tens of
thousands of cases

 Gene-lifestyle and gene-gene “interactions”
• Absolute minimum 10,000, usually need at least
20,000, a comprehensive platform needs at least
50,000
• Scientifically fundamental
The bottom line

• Scientific harmonization
• Restriction on access to individual level data
• Streamlined access to multiple data sets

 Central to the integrative aims of P3G, PHOEBE,
BioSHaRE-eu etc
 Also fundamental to the aims of potential BRISSKit
users

18

 Effective data access is crucial
 Effective joint analysis is essential too (integration)
 Fundamental challenges
Horizontally partitioned data
Data
Data

1958BC

Data
KORAGEN

PREVEND

Data
FINRISK

Joint central
analysis

How can we undertake a full joint
analysis using multiple data sources
if the data cannot physically be
pooled?
 Ethico-legal constraints
 Physical size of the data objects
 Intellectual property issues
DataSHIELD: a novel solution
Take analysis to data not
data to analysis
One step analyses: simple
Iterative analyses: parallel
processes linked together by
entirely non-identifying
summary statistics
Typically produces
mathematically identical
results to fitting a single
model to all the data held
in one pooled data set
Horizontal
DataSHIELD
Data computer

Opal
Finrisk

R
Data computer
Opal
Prevend

Web services

Web services BioSHaRE Web services
web site

R
Web services

R

Analysis
Computer

Data computer

Opal
1958BC

R
Horizontal
DataSHIELD
Data computer

Opal includes
• DataSHIELD
• DataSHaPER
• Researcher ID

Opal
Finrisk

R
Data computer
Opal
Prevend

Web services

Web services BioSHaRE Web services
web site

R
Web services

R

Analysis
Computer

Data computer

Opal
1958BC

R
Horizontal
DataSHIELD
Work in progress:
• Embed Opal in
BRISSKit
Data computer

• ALSPAC
• MRC e-HIRCS
• +more…

Opal
Finrisk

R
Data computer
Opal
Prevend

Web services

Web services BioSHaRE Web services
web site

R
Web services

R

Analysis
Computer

Data computer

Opal
1958BC

R
Opal
1958BC
Opal
1958BC

BRISSKit gains
• DataSHIELD
• DataSHaPER
• Researcher ID
Opal gains
• Direct interface
with more tools
• I2B2 functionality
•Potential for enhanced
user interface

Opal
1958BC

BRISSKit gains
• DataSHIELD
• DataSHaPER
• Researcher ID
Opal gains
• Direct interface
with more tools
• I2B2 functionality
•Potential for enhanced
user interface

Opal
1958BC

BRISSKit gains
• DataSHIELD
• DataSHaPER
• Researcher ID

Everybody gains
• Enhanced combined
functionality
- better science
• Bigger user group
- greater portability
• Greater potential to
become a sustainable
standard
Opal gains
• Direct interface
with more tools
• I2B2 functionality
•Potential for enhanced
user interface

Opal
1958BC

BRISSKit gains
• DataSHIELD
• DataSHaPER
• Researcher ID

Everybody gains
• Enhanced combined
functionality
- better science
• Bigger user group
- greater portability
• Greater potential to
become a sustainable
standard
Enhanced joint analysis with
• Ethico-legal constraints
e.g.US/Europe biobanks
• Intellectual property issues
e.g. H3AFRICA
The bottom line

• Scientific harmonization
• Restriction on access to individual level data
• Streamlined access to multiple data sets

 Central to the integrative aims of P3G, PHOEBE,
BioSHaRE-eu etc
 Also fundamental to the aims of potential BRISSKit
users

29

 Effective data access is crucial
 Effective joint analysis is essential too (integration)
 Fundamental challenges

More Related Content

What's hot

20160719 23 Research Data Things
20160719 23 Research Data Things20160719 23 Research Data Things
20160719 23 Research Data ThingsKatina Toufexis
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data ThingsKatina Toufexis
 
What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?Varsha Khodiyar
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemWolfgang Kuchinke
 
A Blueprint for the Research Data Landscape
A Blueprint for the Research Data LandscapeA Blueprint for the Research Data Landscape
A Blueprint for the Research Data LandscapeSayeed Choudhury
 
THOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing ElsevierTHOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing ElsevierMaaike Duine
 
DataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy IssuesDataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy IssuesDataONE
 
Building a collaborative RDM community, research data network
Building a collaborative RDM community, research data networkBuilding a collaborative RDM community, research data network
Building a collaborative RDM community, research data networkJisc RDM
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsRobert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
THOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier LinkingTHOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier LinkingMaaike Duine
 
Emerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networksEmerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networksNational Institute of Informatics
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?Robert Grossman
 
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar SeriesKatina Toufexis
 
Workflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterWorkflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterVarsha Khodiyar
 
John morrissey c3 dis fair working data.pptx
John morrissey c3 dis fair working data.pptxJohn morrissey c3 dis fair working data.pptx
John morrissey c3 dis fair working data.pptxARDC
 

What's hot (20)

20160719 23 Research Data Things
20160719 23 Research Data Things20160719 23 Research Data Things
20160719 23 Research Data Things
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data Things
 
What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data Ecosystem
 
UWA Research Week 2016
UWA Research Week 2016UWA Research Week 2016
UWA Research Week 2016
 
A Blueprint for the Research Data Landscape
A Blueprint for the Research Data LandscapeA Blueprint for the Research Data Landscape
A Blueprint for the Research Data Landscape
 
THOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing ElsevierTHOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing Elsevier
 
DataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy IssuesDataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy Issues
 
Building a collaborative RDM community, research data network
Building a collaborative RDM community, research data networkBuilding a collaborative RDM community, research data network
Building a collaborative RDM community, research data network
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
THOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier LinkingTHOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier Linking
 
Emerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networksEmerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networks
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
ORCID Principles
ORCID PrinciplesORCID Principles
ORCID Principles
 
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
 
Workflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterWorkflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopter
 
John morrissey c3 dis fair working data.pptx
John morrissey c3 dis fair working data.pptxJohn morrissey c3 dis fair working data.pptx
John morrissey c3 dis fair working data.pptx
 

Similar to Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case

A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchRobert Grossman
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...Peter Löwe
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands Vivien Bonazzi
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
BioMed Central's open data initiatives
BioMed Central's open data initiativesBioMed Central's open data initiatives
BioMed Central's open data initiativesiainh_z
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondonPaul Agapow
 
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECAProject
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
Introduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshopIntroduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshopAaike De Wever
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptxvijayapraba1
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Access methods for analysing sensitive data (amased)
Access methods for analysing sensitive data (amased)Access methods for analysing sensitive data (amased)
Access methods for analysing sensitive data (amased)Jisc
 
Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive trackGeorge Komatsoulis
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET
 

Similar to Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case (20)

A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Simon hodson
Simon hodsonSimon hodson
Simon hodson
 
BioMed Central's open data initiatives
BioMed Central's open data initiativesBioMed Central's open data initiatives
BioMed Central's open data initiatives
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
eTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, LondoneTRIKS at Pharma IT 2017, London
eTRIKS at Pharma IT 2017, London
 
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Introduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshopIntroduction to Data Management Planning at Alien Challenge COST workshop
Introduction to Data Management Planning at Alien Challenge COST workshop
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Access methods for analysing sensitive data (amased)
Access methods for analysing sensitive data (amased)Access methods for analysing sensitive data (amased)
Access methods for analysing sensitive data (amased)
 
Big Data
Big Data Big Data
Big Data
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Komatsoulis internet2 executive track
Komatsoulis internet2 executive trackKomatsoulis internet2 executive track
Komatsoulis internet2 executive track
 
Nicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShowNicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShow
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 

More from London School of Hygiene and Tropical Medicine

More from London School of Hygiene and Tropical Medicine (20)

Preparing to submit your thesis at LSHTM
Preparing to submit your thesis at LSHTMPreparing to submit your thesis at LSHTM
Preparing to submit your thesis at LSHTM
 
Your research is more than a thesis: Make the most of research data and other...
Your research is more than a thesis: Make the most of research data and other...Your research is more than a thesis: Make the most of research data and other...
Your research is more than a thesis: Make the most of research data and other...
 
Enhance your rese​arch impact through open science
Enhance your rese​arch impact through open scienceEnhance your rese​arch impact through open science
Enhance your rese​arch impact through open science
 
Information Security and GDPR
Information Security and GDPRInformation Security and GDPR
Information Security and GDPR
 
GDPR and Research Data Management
GDPR and Research Data ManagementGDPR and Research Data Management
GDPR and Research Data Management
 
Towards Open Research: practices, experiences, barriers and opportunities
Towards Open Research: practices, experiences, barriers and opportunitiesTowards Open Research: practices, experiences, barriers and opportunities
Towards Open Research: practices, experiences, barriers and opportunities
 
Data Journals and repositories: Getting academic credit for data sharing
Data Journals and repositories: Getting academic credit for data sharingData Journals and repositories: Getting academic credit for data sharing
Data Journals and repositories: Getting academic credit for data sharing
 
Crowd sourcing and high resolution satellite imagery in public health
Crowd sourcing and high resolution satellite imagery in public healthCrowd sourcing and high resolution satellite imagery in public health
Crowd sourcing and high resolution satellite imagery in public health
 
Determining the relationship between physical environment and weight status u...
Determining the relationship between physical environment and weight status u...Determining the relationship between physical environment and weight status u...
Determining the relationship between physical environment and weight status u...
 
i-Sense: an early-warning sensing systems for infectious diseases
i-Sense: an early-warning sensing systems for infectious diseasesi-Sense: an early-warning sensing systems for infectious diseases
i-Sense: an early-warning sensing systems for infectious diseases
 
Internet-based surveillance of illness: the FluSurvey platform
Internet-based surveillance of illness: the FluSurvey platformInternet-based surveillance of illness: the FluSurvey platform
Internet-based surveillance of illness: the FluSurvey platform
 
An overview of the MyHeart Counts app
An overview of the MyHeart Counts appAn overview of the MyHeart Counts app
An overview of the MyHeart Counts app
 
Electronic data collection for a modular household survey in Ethiopia
Electronic data collection for a modular household survey in EthiopiaElectronic data collection for a modular household survey in Ethiopia
Electronic data collection for a modular household survey in Ethiopia
 
Mobile-Based Experience Sampling for Behaviour Research
Mobile-Based Experience Sampling for Behaviour ResearchMobile-Based Experience Sampling for Behaviour Research
Mobile-Based Experience Sampling for Behaviour Research
 
Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
RDM Training for health researchers: An institutional perspective
RDM Training for health researchers: An institutional perspectiveRDM Training for health researchers: An institutional perspective
RDM Training for health researchers: An institutional perspective
 
Research Data Readiness in UK Institutions: Digital Curation Centre’s 2015 Su...
Research Data Readiness in UK Institutions: Digital Curation Centre’s 2015 Su...Research Data Readiness in UK Institutions: Digital Curation Centre’s 2015 Su...
Research Data Readiness in UK Institutions: Digital Curation Centre’s 2015 Su...
 
Research data services at the University of Oxford
Research data services at the University of OxfordResearch data services at the University of Oxford
Research data services at the University of Oxford
 
Research Data Management at The University of Edinburgh
Research Data Management at The University of EdinburghResearch Data Management at The University of Edinburgh
Research Data Management at The University of Edinburgh
 
Research data management at UAL
Research data management at UALResearch data management at UAL
Research data management at UAL
 

Recently uploaded

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

Enabling simultaneous analysis of multiple cohort studies: A BRISSKit use case

  • 1. Enabling simultaneous analysis of multiple cohort studies without accessing the full dataset: a BRISSKit use case Dr Jonathan Tedds jat26@le.ac.uk @jtedds Senior Research Fellow, Health Informatics & Interdisciplinary Research Group, Department of Health Sciences (University of Leicester) PI #BRISSKit http://www.brisskit.le.ac.uk
  • 2.
  • 4. Data Reuse: asking new questions • Hubble Space Telescope Papers based upon reuse of archived observations now exceed those based on the use described in the original proposal. – http://archive.stsci.edu/hst/bibliography/pubstat.html • See also work by Piwowar & Vision re life sciences: “Data reuse and the open data citation advantage” – http://peerj.com/preprints/1/
  • 5. Why open? an Open Enterprise Report Science as • • • • As a first step towards this intelligent openness, data that underpin a journal article should be made concurrently available in an accessible database We are now on the brink of an achievable aim: for all science literature to be online, for all of the data to be online and for the two to be interoperable. [p.7] Royal Society June 2012, Science as an Open Enterprise, http://royalsociety.org/policy/projects/science -public-enterprise/report/ Issues linking data to the scientific record: – – – • Data persistence Data and metadata quality Attribution and credit for data producers Geoffrey Boulton (Edinburgh), Lead author: – “Science has been sleepwalking into crisis of replicability...and of the credibility of science” – “Publishing articles without making the data available is scientific malpractice”
  • 6. BRISSKit context: The I4Health goal of applying knowledge engineering to close the ‘ICT gap’ between research and healthcare (Beck, T. et al 2012)
  • 7. Biomedical Research Infrastructure Software Service Kit A vision for cloud-based open source research applications #BRISSKit http://www.brisskit.le.ac.uk
  • 9. BRISSKit USPs       Integrated support for core research processes Well-established mature open source applications as protoyped in Cardiovascular, Respiratory, Cancer Theme Biobank: UK customised A platform for seamless management and integration between applications An API allows integration with existing clinical systems Easy set up, use and administration through browser (including on mobile devices) Capability of being hosted in any compliant cloud provider including UHL (NHS information governance)
  • 11. BRISSKit Community & Hack Event, Oct 2012 http://www.brisskit.le.ac.uk/node/35
  • 12. BRISSKit Information Governance & Security Management Work Stream - Dr Andrew Burnham leading 1. Information Governance Toolkit - analysis of Department of Health (DoH/NHS) IGT requirements vs. BRISSKit organisation/project and services/tools a) Hosted Secondary Use Team/project (Hosted IGT) b) Acute Trust (Acute Trust IGT) 2. IG Training Tool (NHS – University is registered) 3. Pseudonymisation requirements 4. Data Management Plan 5. IT Security & standards – Penetration Testing & Security Testing 6. Other NHS Standards/Requirements: - Care Records Guarantee - NHS Constitution - NHS Records Management - Patient Safety DSCN 14/2009, 18/2009
  • 13. The semantic bridge OBiBa Onyx Records participant consent, questionnaire data and primary specimen IDs Bio-ontology! i2b2 Cohort selection and data querying ?
  • 14. BRISSKit and Bio Banking • Deploy solutions in international bio banking initiatives • Investment through Prof Paul Burton (Health Sciences at Leicester/Bristol) & international collaborations • Building on strong informatics expertise at University of Leicester in partnership with the University Hospitals Leicester Trust • Cardiovascular, Respiratory & Lifestyle BRUs • Cancer Theme Biobank • Genomics etc
  • 15. Contemporary biobanking: meeting the “data” challenge
  • 16. Large data sets, why bother? •Sample size •Depth of phenotyping •Quality of measurement All critical
  • 17. How big is BIG?  The direct effect of a gene • 2,000 cases minimum, 10,000 cases better  Environmental and life-style factors • Highly context specific: from hundreds to tens of thousands of cases  Gene-lifestyle and gene-gene “interactions” • Absolute minimum 10,000, usually need at least 20,000, a comprehensive platform needs at least 50,000 • Scientifically fundamental
  • 18. The bottom line • Scientific harmonization • Restriction on access to individual level data • Streamlined access to multiple data sets  Central to the integrative aims of P3G, PHOEBE, BioSHaRE-eu etc  Also fundamental to the aims of potential BRISSKit users 18  Effective data access is crucial  Effective joint analysis is essential too (integration)  Fundamental challenges
  • 19. Horizontally partitioned data Data Data 1958BC Data KORAGEN PREVEND Data FINRISK Joint central analysis How can we undertake a full joint analysis using multiple data sources if the data cannot physically be pooled?  Ethico-legal constraints  Physical size of the data objects  Intellectual property issues
  • 20. DataSHIELD: a novel solution Take analysis to data not data to analysis One step analyses: simple Iterative analyses: parallel processes linked together by entirely non-identifying summary statistics Typically produces mathematically identical results to fitting a single model to all the data held in one pooled data set
  • 21. Horizontal DataSHIELD Data computer Opal Finrisk R Data computer Opal Prevend Web services Web services BioSHaRE Web services web site R Web services R Analysis Computer Data computer Opal 1958BC R
  • 22. Horizontal DataSHIELD Data computer Opal includes • DataSHIELD • DataSHaPER • Researcher ID Opal Finrisk R Data computer Opal Prevend Web services Web services BioSHaRE Web services web site R Web services R Analysis Computer Data computer Opal 1958BC R
  • 23. Horizontal DataSHIELD Work in progress: • Embed Opal in BRISSKit Data computer • ALSPAC • MRC e-HIRCS • +more… Opal Finrisk R Data computer Opal Prevend Web services Web services BioSHaRE Web services web site R Web services R Analysis Computer Data computer Opal 1958BC R
  • 25. Opal 1958BC BRISSKit gains • DataSHIELD • DataSHaPER • Researcher ID
  • 26. Opal gains • Direct interface with more tools • I2B2 functionality •Potential for enhanced user interface Opal 1958BC BRISSKit gains • DataSHIELD • DataSHaPER • Researcher ID
  • 27. Opal gains • Direct interface with more tools • I2B2 functionality •Potential for enhanced user interface Opal 1958BC BRISSKit gains • DataSHIELD • DataSHaPER • Researcher ID Everybody gains • Enhanced combined functionality - better science • Bigger user group - greater portability • Greater potential to become a sustainable standard
  • 28. Opal gains • Direct interface with more tools • I2B2 functionality •Potential for enhanced user interface Opal 1958BC BRISSKit gains • DataSHIELD • DataSHaPER • Researcher ID Everybody gains • Enhanced combined functionality - better science • Bigger user group - greater portability • Greater potential to become a sustainable standard Enhanced joint analysis with • Ethico-legal constraints e.g.US/Europe biobanks • Intellectual property issues e.g. H3AFRICA
  • 29. The bottom line • Scientific harmonization • Restriction on access to individual level data • Streamlined access to multiple data sets  Central to the integrative aims of P3G, PHOEBE, BioSHaRE-eu etc  Also fundamental to the aims of potential BRISSKit users 29  Effective data access is crucial  Effective joint analysis is essential too (integration)  Fundamental challenges

Editor's Notes

  1. Hubble Space Telescope (HST) in operation since 1990Observations are made on the basis of prposals, data is collected and made available to the proposers; data is stored at the Space Telescope Science Institute and made available after an embargo.Each year approx 200 proposals are selected from a field of 1,000; leading to c. 20,000 individual observationsThere are now more research papers published on the basis of ‘reuse’ of the archived data than those based on the use described in the original proposal.
  2. SAOE report looked at the changing conduct of science.Key recommendations are the research data, the data underpinning research findings should be as openly available as possible.