SlideShare une entreprise Scribd logo
1  sur  26
Analyzing Big Data in Medicine with
Virtual Research Environments and
Microservices
Ola Spjuth <ola.spjuth@farmbio.uu.se>
Department of Pharmaceutical Biosciences
Science for Life Laboratory
Uppsala University
Today: We have access to high-throughput
technologies to study biological phenomena
New challenges: Data management and
analysis
• Storage
• Analysis methods, pipelines
• Scaling
• Automation
• Data integration, security
• Predictions
• …
European Open Science Cloud (EOSC)
• The vast majority of all data in the world (in fact up to 90%) has been
generated in the last two years.
• Scientific data is in direct need of openness, better handling, careful
management, machine actionability and sheer re-use.
• European Open Science Cloud: A vision of a future infrastructure to
support Open Research Data and Open Science in Europe
– It should enable trusted access to services, systems and the re-use
of shared scientific data across disciplinary, social and geographical
borders
– research data should be findable, accessible, interoperable and re-
usable (FAIR)
– provide the means to analyze datasets of huge sizes
4http://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud
Contemporary Big Data analysis in
bioinformatics
• High-Performance Computing with shared storage
– Linux, Terminal, batch queue
• Problems/challenges
– Access to resources is limited
– Dependency management for tools is cumbersome, need help from
system administrators to install software
– Privacy-related issues
– Difficult to share/integrate data
– Accessibility issues
• A common approach: Internet-based services
– Retrieve data
– Analysis tools
5
Workflows
6
Service-Oriented Architectures (SOA) in
the life sciences
• Standardize
– Agree on e.g. interfaces, data formats,
protocols etc.
• Decompose and compartmentalize
– Experts (scientists) should provide
services – do one thing and do it well
– Achieve interoperability by exposing
data and tools as Web services
• Integrate
– Users should access and integrate
remote services
API
Scientist
service
Scientist
consume
Service-Oriented Architectures (SOA) in
the life sciences, ~2005
Scientist
downtime
API
changed
Not maintained
Difficult to sustain,
unreliable solutions
API
API
API
Cloud Computing
• Cloud computing offers advantages over
contemporary e-infrastructures in the life sciences
– On-demand elastic resources and services
– No up-front costs, pay-per-use
• A lot of businesses (and software development)
moving into the cloud
– Vibrant ecosystem of frameworks and tools, including for
big data
• High potential for science
Virtual Machines and Containers
Virtual machines
• Package entire systems (heavy)
• Completely isolated
• Suitable in cloud environments
Containers:
• Share OS
• Smaller, faster, portable
• Docker!
10
MicroServices
• Similar to Web services: Decompose functionality into smaller, loosely
coupled services communicating via API
– “Do one thing and do it well”
• Preferably smaller, light-weight and fast to instantiate on demand
• Easy to replace, language-agnostic
– Suitable for loosely coupled teams (which we have in science)
– Portable - easy to deploy and scale
– Maximize agility for developers
• Suitable to deploy as containers in cloud environments
Scaling microservices
12
http://martinfowler.com/articles/microservices.html
13
Shipping
containers?
Orchestrating containers
14
Kubernetes: Orchestrating containers
• Origin: Google
• A declarative language for
launching containers
• Start, stop, update, and manage
a cluster of machines running
containers in a consistent and
maintainable way
• Suitable for microservices
Containers
Scheduled and packed containers on nodes
Virtual Research Environment (VRE)
• Virtual (online) environments for research
– Easy and user-friendly access to computational resources, tools and
data, commonly for a scientific domain
• Multi-tenant VRE – log into shared system
• Private VRE
– Deploy on your favorite cloud provider
16
• Horizon 2020-project, €8 M, 2015-2018
– “standardized e-infrastructure for the processing, analysis and information-
mining of the massive amount of medical molecular phenotyping and
genotyping data generated by metabolomics applications.”
• Enable users to provision their own virtual infrastructure (VRE)
– Public cloud, private cloud, local servers
– Easy access to compatible tools exposed as microservices
– Will in minutes set up and configure a complete data-center (compute
nodes, storage, networks, DNS, firewall etc)
– Can achieve high-availability, scalability and fault tolerance
• Use modern and established tools and frameworks supported by industry
– Reduce risk and improve sustainability
• Offer an agile and scalable environment to use, and a straightforward
platform to extend
http://phenomenal-h2020.eu/
Users should not see this…
Deployment and user access
Launch on reference installation
Launch on public cloud
Private VRE
In-house deployment scenarios
MRC-NIHR Phenome Centre
• Medium-sized
IT-infrastructure
• Dedicated IT-
personnel
• Users: ICL staff
Hospital environment
• Dedicated
server
• No IT-personnel
• User: Clinical
researcher
Private VRE
Build and test
tools, images,
infrastructure
Docker Hub
PhenoMeNal
Jenkins
PhenoMeNal
Container Hub
Development: Container lifecycle
Source code repositories
Two proof of concepts so far
Kultima group Pablo Moreno
Implications
• Improve sustainability
– Not dependent on specific data centers
• Improve reliability and security
– Users can run their own service environments (VREs) within isolated
environments
– High-availability and fault tolerance
• Scalability
– Deploy in elastic environments
• Agile development
– Automate “from develop to deploy”
• Agile science
– Simple access to discoverable, scalable tools on elastic compute
resources with no up-front costs
• NB: Many problems of interoperability remains!
– Data
– APIs
– etc.
24
Ongoing research on VREs
25
Data
federation
Compute
federation
Privacy
preservation
Workflows
Big Data
frameworks
Data management and
modeling
Acknowledgements
Wesley Schaal
Jonathan Alvarsson
Staffan Arvidsson
Arvid Berg
Samuel Lampa
Marco Capuccini
Martin Dahlö
Valentin Georgiev
Anders Larsson
Polina Georgiev
Maris Lapins
26
AstraZeneca
Lars Carlsson
Ernst Ahlberg
University Vienna
David Kreil
Maciej Kańduła
SNIC Science Cloud
Andreas Hellander
Salman Toor
Caramba.clinic
Kim Kultima
Stephanie Herman
Payam Emami
ToxHQ team
Barry Hardy
Thomas Exner
Joh Dokler
Daniel Bachler

Contenu connexe

Tendances

Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Frederic Desprez
 

Tendances (20)

Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
 
Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)
 
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis  GannonKeynote IEEE International Workshop on Cloud Analytics. Dennis  Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
 
A4 r overview deck_1.7
A4 r overview deck_1.7A4 r overview deck_1.7
A4 r overview deck_1.7
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
ieee cloud 2015 keynote talk
ieee cloud 2015 keynote talkieee cloud 2015 keynote talk
ieee cloud 2015 keynote talk
 
e-Infrastructure available for research, using the right tool for the right job
e-Infrastructure available for research, using the right tool for the right jobe-Infrastructure available for research, using the right tool for the right job
e-Infrastructure available for research, using the right tool for the right job
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonDoing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis Gannon
 
Reproducible Research and the Cloud
Reproducible Research and the CloudReproducible Research and the Cloud
Reproducible Research and the Cloud
 
A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...
A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...
A First Attempt at Describing, Disseminating and Reusing Methodological Knowl...
 
Ariadne: Lifecycles
Ariadne: LifecyclesAriadne: Lifecycles
Ariadne: Lifecycles
 
Genomics Applications in the Cloud with the DNAnexus Platform
Genomics Applications in the Cloud with the DNAnexus PlatformGenomics Applications in the Cloud with the DNAnexus Platform
Genomics Applications in the Cloud with the DNAnexus Platform
 
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data ManagementD4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
 
containers2016
containers2016containers2016
containers2016
 
Accelerating your research with Microsoft Azure
Accelerating your research with Microsoft AzureAccelerating your research with Microsoft Azure
Accelerating your research with Microsoft Azure
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Low cost robotic tape library systems Using Open source Technology
Low cost robotic tape library systems Using Open source TechnologyLow cost robotic tape library systems Using Open source Technology
Low cost robotic tape library systems Using Open source Technology
 
Science DMZ
Science DMZScience DMZ
Science DMZ
 
IDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on CloudIDB-Cloud Providing Bioinformatics Services on Cloud
IDB-Cloud Providing Bioinformatics Services on Cloud
 

En vedette

Big data mita se on 10 casea
Big data mita se on 10 caseaBig data mita se on 10 casea
Big data mita se on 10 casea
ASML
 
satllite image processing
satllite image processingsatllite image processing
satllite image processing
avhadlaxmikant
 
satellite image processing
satellite image processingsatellite image processing
satellite image processing
avhadlaxmikant
 

En vedette (14)

Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...Containers for sensor web services, applications and research @ Sensor Web Co...
Containers for sensor web services, applications and research @ Sensor Web Co...
 
Big data -strategia
Big data  -strategiaBig data  -strategia
Big data -strategia
 
Big data mita se on 10 casea
Big data mita se on 10 caseaBig data mita se on 10 casea
Big data mita se on 10 casea
 
Docker @ Data Science Meetup
Docker @ Data Science MeetupDocker @ Data Science Meetup
Docker @ Data Science Meetup
 
satllite image processing
satllite image processingsatllite image processing
satllite image processing
 
New sources of big data for precision medicine: are we ready?
New sources of big data for precision medicine: are we ready?New sources of big data for precision medicine: are we ready?
New sources of big data for precision medicine: are we ready?
 
Precision Medicine in the Big Data World
Precision Medicine in the Big Data WorldPrecision Medicine in the Big Data World
Precision Medicine in the Big Data World
 
Geoscience satellite image processing
Geoscience satellite image processingGeoscience satellite image processing
Geoscience satellite image processing
 
satellite image processing
satellite image processingsatellite image processing
satellite image processing
 
Satellite image Processing Seminar Report
Satellite image Processing Seminar ReportSatellite image Processing Seminar Report
Satellite image Processing Seminar Report
 
Satellite image processing
Satellite image processingSatellite image processing
Satellite image processing
 
Big Data In Medicine
Big Data In Medicine Big Data In Medicine
Big Data In Medicine
 
GIS presentation
GIS presentationGIS presentation
GIS presentation
 
Image processing ppt
Image processing pptImage processing ppt
Image processing ppt
 

Similaire à Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

Cyverse: Extensible Cyberinfrastructure for Life Science
Cyverse: Extensible Cyberinfrastructure for Life ScienceCyverse: Extensible Cyberinfrastructure for Life Science
Cyverse: Extensible Cyberinfrastructure for Life Science
EMBL Australia Bioinformatics Resource
 
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the CloudSynergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
Citrix
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
Chris Dwan
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2
Alex Hardisty
 

Similaire à Analyzing Big Data in Medicine with Virtual Research Environments and Microservices (20)

Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
 
e-infrastructural needs to support informatics
e-infrastructural needs to support informaticse-infrastructural needs to support informatics
e-infrastructural needs to support informatics
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
 
Cloud computing in biomedicine intel talk
Cloud computing in biomedicine intel talkCloud computing in biomedicine intel talk
Cloud computing in biomedicine intel talk
 
eROSA Stakeholder WS1: EOSC Architecture
eROSA Stakeholder WS1: EOSC ArchitectureeROSA Stakeholder WS1: EOSC Architecture
eROSA Stakeholder WS1: EOSC Architecture
 
EGI Services
EGI Services EGI Services
EGI Services
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific Research
 
Australian Ecosystems Science Cloud
Australian Ecosystems Science CloudAustralian Ecosystems Science Cloud
Australian Ecosystems Science Cloud
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
 
Desktop as a Service supporting Environmental 'Omics
Desktop as a Service supporting Environmental 'OmicsDesktop as a Service supporting Environmental 'Omics
Desktop as a Service supporting Environmental 'Omics
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Cyverse: Extensible Cyberinfrastructure for Life Science
Cyverse: Extensible Cyberinfrastructure for Life ScienceCyverse: Extensible Cyberinfrastructure for Life Science
Cyverse: Extensible Cyberinfrastructure for Life Science
 
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the CloudSynergy 2014 - Syn122 Moving Australian National Research into the Cloud
Synergy 2014 - Syn122 Moving Australian National Research into the Cloud
 
Climb stateoftheartintro
Climb stateoftheartintroClimb stateoftheartintro
Climb stateoftheartintro
 
Taverna workflows in the cloud
Taverna workflows in the cloudTaverna workflows in the cloud
Taverna workflows in the cloud
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2
 

Plus de Ola Spjuth

Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression DatasetsCombining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
Ola Spjuth
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...
Ola Spjuth
 

Plus de Ola Spjuth (14)

Automating cell-based screening with open source, robotics and AI
Automating cell-based screening with open source, robotics and AIAutomating cell-based screening with open source, robotics and AI
Automating cell-based screening with open source, robotics and AI
 
Towards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imagingTowards automated phenotypic cell profiling with high-content imaging
Towards automated phenotypic cell profiling with high-content imaging
 
Towards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery LabsTowards Automated AI-guided Drug Discovery Labs
Towards Automated AI-guided Drug Discovery Labs
 
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression DatasetsCombining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
 
Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...
 
The case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesThe case for cloud computing in Life Sciences
The case for cloud computing in Life Sciences
 
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in SwedenStorage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
 
Agile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discoveryAgile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discovery
 
Enabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceEnabling Translational Medicine with e-Science
Enabling Translational Medicine with e-Science
 
Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...
 
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
 
Building a flexible infrastructure with Bioclipse, open source, and federated...
Building a flexible infrastructure with Bioclipse, open source, and federated...Building a flexible infrastructure with Bioclipse, open source, and federated...
Building a flexible infrastructure with Bioclipse, open source, and federated...
 
Accessing and scripting CDK from Bioclipse
Accessing and scripting CDK from BioclipseAccessing and scripting CDK from Bioclipse
Accessing and scripting CDK from Bioclipse
 

Dernier

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 

Dernier (20)

IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 

Analyzing Big Data in Medicine with Virtual Research Environments and Microservices

  • 1. Analyzing Big Data in Medicine with Virtual Research Environments and Microservices Ola Spjuth <ola.spjuth@farmbio.uu.se> Department of Pharmaceutical Biosciences Science for Life Laboratory Uppsala University
  • 2. Today: We have access to high-throughput technologies to study biological phenomena
  • 3. New challenges: Data management and analysis • Storage • Analysis methods, pipelines • Scaling • Automation • Data integration, security • Predictions • …
  • 4. European Open Science Cloud (EOSC) • The vast majority of all data in the world (in fact up to 90%) has been generated in the last two years. • Scientific data is in direct need of openness, better handling, careful management, machine actionability and sheer re-use. • European Open Science Cloud: A vision of a future infrastructure to support Open Research Data and Open Science in Europe – It should enable trusted access to services, systems and the re-use of shared scientific data across disciplinary, social and geographical borders – research data should be findable, accessible, interoperable and re- usable (FAIR) – provide the means to analyze datasets of huge sizes 4http://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud
  • 5. Contemporary Big Data analysis in bioinformatics • High-Performance Computing with shared storage – Linux, Terminal, batch queue • Problems/challenges – Access to resources is limited – Dependency management for tools is cumbersome, need help from system administrators to install software – Privacy-related issues – Difficult to share/integrate data – Accessibility issues • A common approach: Internet-based services – Retrieve data – Analysis tools 5
  • 7. Service-Oriented Architectures (SOA) in the life sciences • Standardize – Agree on e.g. interfaces, data formats, protocols etc. • Decompose and compartmentalize – Experts (scientists) should provide services – do one thing and do it well – Achieve interoperability by exposing data and tools as Web services • Integrate – Users should access and integrate remote services API Scientist service Scientist consume
  • 8. Service-Oriented Architectures (SOA) in the life sciences, ~2005 Scientist downtime API changed Not maintained Difficult to sustain, unreliable solutions API API API
  • 9. Cloud Computing • Cloud computing offers advantages over contemporary e-infrastructures in the life sciences – On-demand elastic resources and services – No up-front costs, pay-per-use • A lot of businesses (and software development) moving into the cloud – Vibrant ecosystem of frameworks and tools, including for big data • High potential for science
  • 10. Virtual Machines and Containers Virtual machines • Package entire systems (heavy) • Completely isolated • Suitable in cloud environments Containers: • Share OS • Smaller, faster, portable • Docker! 10
  • 11. MicroServices • Similar to Web services: Decompose functionality into smaller, loosely coupled services communicating via API – “Do one thing and do it well” • Preferably smaller, light-weight and fast to instantiate on demand • Easy to replace, language-agnostic – Suitable for loosely coupled teams (which we have in science) – Portable - easy to deploy and scale – Maximize agility for developers • Suitable to deploy as containers in cloud environments
  • 15. Kubernetes: Orchestrating containers • Origin: Google • A declarative language for launching containers • Start, stop, update, and manage a cluster of machines running containers in a consistent and maintainable way • Suitable for microservices Containers Scheduled and packed containers on nodes
  • 16. Virtual Research Environment (VRE) • Virtual (online) environments for research – Easy and user-friendly access to computational resources, tools and data, commonly for a scientific domain • Multi-tenant VRE – log into shared system • Private VRE – Deploy on your favorite cloud provider 16
  • 17. • Horizon 2020-project, €8 M, 2015-2018 – “standardized e-infrastructure for the processing, analysis and information- mining of the massive amount of medical molecular phenotyping and genotyping data generated by metabolomics applications.” • Enable users to provision their own virtual infrastructure (VRE) – Public cloud, private cloud, local servers – Easy access to compatible tools exposed as microservices – Will in minutes set up and configure a complete data-center (compute nodes, storage, networks, DNS, firewall etc) – Can achieve high-availability, scalability and fault tolerance • Use modern and established tools and frameworks supported by industry – Reduce risk and improve sustainability • Offer an agile and scalable environment to use, and a straightforward platform to extend http://phenomenal-h2020.eu/
  • 18. Users should not see this…
  • 19.
  • 20. Deployment and user access Launch on reference installation Launch on public cloud Private VRE
  • 21. In-house deployment scenarios MRC-NIHR Phenome Centre • Medium-sized IT-infrastructure • Dedicated IT- personnel • Users: ICL staff Hospital environment • Dedicated server • No IT-personnel • User: Clinical researcher Private VRE
  • 22. Build and test tools, images, infrastructure Docker Hub PhenoMeNal Jenkins PhenoMeNal Container Hub Development: Container lifecycle Source code repositories
  • 23. Two proof of concepts so far Kultima group Pablo Moreno
  • 24. Implications • Improve sustainability – Not dependent on specific data centers • Improve reliability and security – Users can run their own service environments (VREs) within isolated environments – High-availability and fault tolerance • Scalability – Deploy in elastic environments • Agile development – Automate “from develop to deploy” • Agile science – Simple access to discoverable, scalable tools on elastic compute resources with no up-front costs • NB: Many problems of interoperability remains! – Data – APIs – etc. 24
  • 25. Ongoing research on VREs 25 Data federation Compute federation Privacy preservation Workflows Big Data frameworks Data management and modeling
  • 26. Acknowledgements Wesley Schaal Jonathan Alvarsson Staffan Arvidsson Arvid Berg Samuel Lampa Marco Capuccini Martin Dahlö Valentin Georgiev Anders Larsson Polina Georgiev Maris Lapins 26 AstraZeneca Lars Carlsson Ernst Ahlberg University Vienna David Kreil Maciej Kańduła SNIC Science Cloud Andreas Hellander Salman Toor Caramba.clinic Kim Kultima Stephanie Herman Payam Emami ToxHQ team Barry Hardy Thomas Exner Joh Dokler Daniel Bachler

Notes de l'éditeur

  1. Idea with SOA (~2005) Achieve interoperability by exposing data and functionality as Web services Experts (scientists) should set up and host their own Web services Users should integrate a multitude of distributed services, connect into workflows (e.g. Taverna), and share (parts of) workflows What happened? Users could not rely on Web services (downtime, API changes, abandoned) and they could not be mirrored Workflows never gained widespread popularity Today, stable web services mainly remain at large data and tool providers (EBI, NCBI etc)
  2. Drop applications into VMs running Docker in different clouds.