SlideShare a Scribd company logo
1 of 19
Download to read offline
Christophe Blanchet, Clément Gauthey
Infrastructure Distributed for Biology
IDB-IBCP CNRS FR3302 - LYON - FRANCE
http://idee-b.ibcp.fr
IDB acknowledges co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552)
and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001)
Providing Bioinformatics Services
on Cloud
C. Blanchet and C. Gauthey
EGI CF13, Manchester, 9 April 2013
Infrastructure Distributed for Biology - IDB
CNRS-IBCP FR3302, Lyon, FRANCE
EGI CF13, Manchester, 9 April 2013
Bioinformatics Today
• Biological data are big data
• 1512 online databases (NAR Database Issue 2013)
• Institut Sanger, UK, 5 PB
• Beijing Genome Institute, China, 4 sites, 10 PB
➡ Big data in lot of places
• Analysing such data became difficult
• Scale-up of the analyses : gene/protein to complete genome/
proteome, ...
• Lot of different daily-used tools
• That need to be combined in workflows
• Usual interfaces: portals,Web services, federation,...
➡ Datacenters with ease of access/use
• Distributed resources
• Experimental platforms: NGS, imaging, ...
• Bioinformatics platforms
➡ Federation of datacenters
ADN
BI
M
ADN
A
ADN
BI CC
BI
ADN
ADN
EGI CF13, Manchester, 9 April 2013
Sequencing Genomes
source: www.politigenomics.com/next-generation-sequencing-informatics
Complete genome sequencing
become a lab commodity with
NGS (cheap and efficient)
source: www.genomesonline.org
EGI CF13, Manchester, 9 April 2013
Infrastructures in Biology
Lot of tools
and web services
to treat and vizualize
lot of data
EGI CF13, Manchester, 9 April 2013
The scene
• Bioinformatics services providers
• Is it easy to deploy lot of (incompatible) tools ?
• To make them connected to public databases ?
• To limit transfer of huge data ?
• To provide users with their own computing resources ?
• With their own isolated storage ?
• Scientists
• Is it easy to access/use these tools ?
• To adapt to your usage ?
• To get your/other tools deployed on a datacenter ?
• To combine them ?
• To get my own computing/storage resources ?
EGI CF13, Manchester, 9 April 2013
IDB’s Cloud
• Cloud workbench for Biology
• 13 turnkey bioinformatics appliances (as of Apr. 2013)
• Running since Sept. 2011, opened to Biology community
• Lyon, FRANCE
• Powered by
• StratusLab
• Compute nodes, Block storage
• +900 cores, +4TB RAM, 36TB vdisks
• Mainly Intel SandyBridge servers with 32c 128GB
• Bigmen servers with 64c 768GB
• VMs from 1 to 64c, 512MB to 760GB RAM
• + Openstack
• Object storage (Swift)
• +200 TB redundant & scalable storage
EGI CF13, Manchester, 9 April 2013
Driven throught a simple web interface
EGI CF13, Manchester, 9 April 2013
Integrate Bioinformatics Tools in Cloud
BLAST
GOR4
FastA
SSearch
Abyss
ClustalW
Bioinformatics
Tools
Ray
BWA
PhyML RedHat,
CentOS
Debian,
Ubuntu
Suse
Linux
Virtual machines
Create
new
Appliance
Bioinformatics Marketplace
NGSStructure Galaxy ARIA (…)Sequence
• Appliances are virtual machines
• small : few GB, easy to convert in most virtualization formats
• Installed and pre-configured with common bioinformatics tools
• e.g. BLAST, Clustalw,ARIA, MEME, HMMer, TopHat, BWA, Samtools, etc.
EGI CF13, Manchester, 9 April 2013
Bioinformatics Appliances
EGI CF13, Manchester, 9 April 2013
Select your bioinformatics tools
EGI CF13, Manchester, 9 April 2013
Run Bioinformatics Cloud Instances
Bioinformatics Marketplace
NGSStructure Galaxy ARIA (…)Sequence
IBCP's Cloud
Resources
BLAST,
Clustal,
etc.
PaaS
Workers
VM CNS
SharedFS
launch jobs
sshIaaS
Master & Storage
VM ARIA
Portal
Launch
Instances
EGI CF13, Manchester, 9 April 2013
Manage your Cloud Instances
EGI CF13, Manchester, 9 April 2013
UNIPROT
PDB
EMBL
PROSITE
Genomes
Public
Data sources
Bioinformatics
Cloud
BLAST,
Clustal,
etc.
PaaS
Workers
VM CNS
SharedFS
launch jobs
sshIaaS
Master & Storage
VM ARIA
Portal
shared
(NFS)
User
Persistent data
pdisk
(iSCSI)
Biological Data in Cloud
Upload your data
Get your results
scp http/S3
scp http/S3
EGI CF13, Manchester, 9 April 2013
Example:‘biocompute’ Appliance
• Use your own instance(s)
• With pre-installed
standard bioinformatics
tools
• BLAST, FastA, SSearch,HMM,...
• ClustalW2, Clustal-Omega, Muscle,..
• Bowtie(2), BWA, samtools, ...
• MEME, R, etc.
• Connected to public
reference data
• Uniprot, EMBL, genomes, PDB, etc.
• Automaticaly shared to theVMs
EGI CF13, Manchester, 9 April 2013
Example: Galaxy portal for NGS analyses
• Analyse NGS data
• portal Galaxy is widely used in the community
• connected to large public data: sequences and indexes
• large user data (GBs)
• Preserve workflows and results (persistent storage)
EGI CF13, Manchester, 9 April 2013
Example: Proteomics
• Motivation
• Collaboration with a mass spectroscopy platform
• Running out of space on their local resources
• Protein identification
• Mass experimental data
• Reference databases : nr, Swiss-Prot
• Reference screening tools:
OMSSA, X!Tandem
• User interface
• Remote display
• NX
• Reference GUIs
• SearchGUI
• PeptidShaker
source: PeptideShaker site
EGI CF13, Manchester, 9 April 2013
Conclusion
• Provide turnkey bioinformatics appliances
• Standard tools and pipelines
• Interoperability: ready to run on cloud
• Easier to transfer appliances than data (GB vs TB)
• Provide a cloud infrastructure tightly connected
to existing bioinformatics infrastructure
• Public IDB’s bioinformatics cloud
• Linked to public biological databases
• In collaboration with the French Bioinformatics Institute
• Ease the usage by scientists
• Usual bioinformatics gateways
• Persistent and large ubiquitous storage
• Web interface for cloud management
EGI CF13, Manchester, 9 April 2013
Perspectives
• Define good practices to provide academic
community and industry with bioinformatics services!
• French Bioinformatics Institute - IFB
• Goals are to provide core bioinformatics resources to the
national and international life science research community in
key fields such as genomics, proteomics, systems biology, etc.
• Aims at building a national academic cloud devoted to
Bioinformatics, inspired by the model evaluated through the
IDB’s cloud.
• European ELIXIR infrastructure
• To build a sustainable European infrastructure for biological
information, supporting life science research and its
translation
• IFB will be the French representative in ELIXIR.
EGI CF13, Manchester, 9 April 2013
• Acknowledgment
• StratusLab members
• co-funding by the European Community's Seventh
Framework Programme (INFSO-RI-261552) and
by the French National Research Agency's Arpege
Programme (ANR-10-SEGI-001).
Questions ?
http://idee-b.ibcp.fr

More Related Content

What's hot

2019 05-21 egi and eosc - final
2019 05-21 egi and eosc - final2019 05-21 egi and eosc - final
2019 05-21 egi and eosc - finalEOSC-hub project
 
LCG project description
LCG project descriptionLCG project description
LCG project descriptionlouisponcet
 
No specimen left behind: Collections digitisation at the NHM, London*
No specimen left behind:  Collections digitisation at the NHM, London*No specimen left behind:  Collections digitisation at the NHM, London*
No specimen left behind: Collections digitisation at the NHM, London*Vince Smith
 
Frictionless Data Exchange
Frictionless Data ExchangeFrictionless Data Exchange
Frictionless Data ExchangeEOSCpilot .eu
 
Making Research Data Repositories visible – the re3data Registry of Research ...
Making Research Data Repositories visible – the re3data Registry of Research ...Making Research Data Repositories visible – the re3data Registry of Research ...
Making Research Data Repositories visible – the re3data Registry of Research ...Karlsruhe Institute of Technology (KIT)
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?openminted_eu
 
D4science-II Codata
D4science-II CodataD4science-II Codata
D4science-II CodataFAO
 
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...FAO
 

What's hot (12)

2019 05-21 egi and eosc - final
2019 05-21 egi and eosc - final2019 05-21 egi and eosc - final
2019 05-21 egi and eosc - final
 
LCG project description
LCG project descriptionLCG project description
LCG project description
 
No specimen left behind: Collections digitisation at the NHM, London*
No specimen left behind:  Collections digitisation at the NHM, London*No specimen left behind:  Collections digitisation at the NHM, London*
No specimen left behind: Collections digitisation at the NHM, London*
 
Karolina Zawada: Toruń University’s Open Access Data Project – the new role f...
Karolina Zawada: Toruń University’s Open Access Data Project – the new role f...Karolina Zawada: Toruń University’s Open Access Data Project – the new role f...
Karolina Zawada: Toruń University’s Open Access Data Project – the new role f...
 
Frictionless Data Exchange
Frictionless Data ExchangeFrictionless Data Exchange
Frictionless Data Exchange
 
Making Research Data Repositories visible – the re3data Registry of Research ...
Making Research Data Repositories visible – the re3data Registry of Research ...Making Research Data Repositories visible – the re3data Registry of Research ...
Making Research Data Repositories visible – the re3data Registry of Research ...
 
Hybrid Cloud for CERN
Hybrid Cloud for CERN Hybrid Cloud for CERN
Hybrid Cloud for CERN
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
 
De castro sonex work group
De castro sonex work groupDe castro sonex work group
De castro sonex work group
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?
 
D4science-II Codata
D4science-II CodataD4science-II Codata
D4science-II Codata
 
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...D4Science:An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
 

Similar to Providing Bioinformatics Services on Cloud

The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubBjörn Backeberg
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair" OpenAIRE
 
EGI Engage: Impact & Results
EGI Engage: Impact & ResultsEGI Engage: Impact & Results
EGI Engage: Impact & ResultsEGI Federation
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and CeremonyArchiver
 
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...OpenAIRE
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver
 
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Archiver
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudOla Spjuth
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Blue BRIDGE
 
Embl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildishEmbl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildishArchiver
 
Progress of the Helix Nebula Science Cloud PCP Project
Progress of the Helix Nebula Science Cloud PCP ProjectProgress of the Helix Nebula Science Cloud PCP Project
Progress of the Helix Nebula Science Cloud PCP ProjectHelix Nebula The Science Cloud
 
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...OpenAIRE
 
Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?Blue BRIDGE
 
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructureeROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructuree-ROSA
 
National scale research computing and beyond pearc panel 2017
National scale research computing and beyond   pearc panel 2017National scale research computing and beyond   pearc panel 2017
National scale research computing and beyond pearc panel 2017Gregory Newby
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 

Similar to Providing Bioinformatics Services on Cloud (20)

The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
EGI Engage: Impact & Results
EGI Engage: Impact & ResultsEGI Engage: Impact & Results
EGI Engage: Impact & Results
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
 
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
 
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
Embl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildishEmbl ebi use-cases_-_t.wildish
Embl ebi use-cases_-_t.wildish
 
Progress of the Helix Nebula Science Cloud PCP Project
Progress of the Helix Nebula Science Cloud PCP ProjectProgress of the Helix Nebula Science Cloud PCP Project
Progress of the Helix Nebula Science Cloud PCP Project
 
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...OpenAIRE-Connect: Open Science as a Service for repositories and research com...
OpenAIRE-Connect: Open Science as a Service for repositories and research com...
 
Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?
 
Jorge gomes
Jorge gomesJorge gomes
Jorge gomes
 
Jorge gomes
Jorge gomesJorge gomes
Jorge gomes
 
Jorge gomes
Jorge gomesJorge gomes
Jorge gomes
 
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructureeROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
 
National scale research computing and beyond pearc panel 2017
National scale research computing and beyond   pearc panel 2017National scale research computing and beyond   pearc panel 2017
National scale research computing and beyond pearc panel 2017
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 

Recently uploaded

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Providing Bioinformatics Services on Cloud

  • 1. Christophe Blanchet, Clément Gauthey Infrastructure Distributed for Biology IDB-IBCP CNRS FR3302 - LYON - FRANCE http://idee-b.ibcp.fr IDB acknowledges co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001) Providing Bioinformatics Services on Cloud C. Blanchet and C. Gauthey EGI CF13, Manchester, 9 April 2013 Infrastructure Distributed for Biology - IDB CNRS-IBCP FR3302, Lyon, FRANCE
  • 2. EGI CF13, Manchester, 9 April 2013 Bioinformatics Today • Biological data are big data • 1512 online databases (NAR Database Issue 2013) • Institut Sanger, UK, 5 PB • Beijing Genome Institute, China, 4 sites, 10 PB ➡ Big data in lot of places • Analysing such data became difficult • Scale-up of the analyses : gene/protein to complete genome/ proteome, ... • Lot of different daily-used tools • That need to be combined in workflows • Usual interfaces: portals,Web services, federation,... ➡ Datacenters with ease of access/use • Distributed resources • Experimental platforms: NGS, imaging, ... • Bioinformatics platforms ➡ Federation of datacenters ADN BI M ADN A ADN BI CC BI ADN ADN
  • 3. EGI CF13, Manchester, 9 April 2013 Sequencing Genomes source: www.politigenomics.com/next-generation-sequencing-informatics Complete genome sequencing become a lab commodity with NGS (cheap and efficient) source: www.genomesonline.org
  • 4. EGI CF13, Manchester, 9 April 2013 Infrastructures in Biology Lot of tools and web services to treat and vizualize lot of data
  • 5. EGI CF13, Manchester, 9 April 2013 The scene • Bioinformatics services providers • Is it easy to deploy lot of (incompatible) tools ? • To make them connected to public databases ? • To limit transfer of huge data ? • To provide users with their own computing resources ? • With their own isolated storage ? • Scientists • Is it easy to access/use these tools ? • To adapt to your usage ? • To get your/other tools deployed on a datacenter ? • To combine them ? • To get my own computing/storage resources ?
  • 6. EGI CF13, Manchester, 9 April 2013 IDB’s Cloud • Cloud workbench for Biology • 13 turnkey bioinformatics appliances (as of Apr. 2013) • Running since Sept. 2011, opened to Biology community • Lyon, FRANCE • Powered by • StratusLab • Compute nodes, Block storage • +900 cores, +4TB RAM, 36TB vdisks • Mainly Intel SandyBridge servers with 32c 128GB • Bigmen servers with 64c 768GB • VMs from 1 to 64c, 512MB to 760GB RAM • + Openstack • Object storage (Swift) • +200 TB redundant & scalable storage
  • 7. EGI CF13, Manchester, 9 April 2013 Driven throught a simple web interface
  • 8. EGI CF13, Manchester, 9 April 2013 Integrate Bioinformatics Tools in Cloud BLAST GOR4 FastA SSearch Abyss ClustalW Bioinformatics Tools Ray BWA PhyML RedHat, CentOS Debian, Ubuntu Suse Linux Virtual machines Create new Appliance Bioinformatics Marketplace NGSStructure Galaxy ARIA (…)Sequence • Appliances are virtual machines • small : few GB, easy to convert in most virtualization formats • Installed and pre-configured with common bioinformatics tools • e.g. BLAST, Clustalw,ARIA, MEME, HMMer, TopHat, BWA, Samtools, etc.
  • 9. EGI CF13, Manchester, 9 April 2013 Bioinformatics Appliances
  • 10. EGI CF13, Manchester, 9 April 2013 Select your bioinformatics tools
  • 11. EGI CF13, Manchester, 9 April 2013 Run Bioinformatics Cloud Instances Bioinformatics Marketplace NGSStructure Galaxy ARIA (…)Sequence IBCP's Cloud Resources BLAST, Clustal, etc. PaaS Workers VM CNS SharedFS launch jobs sshIaaS Master & Storage VM ARIA Portal Launch Instances
  • 12. EGI CF13, Manchester, 9 April 2013 Manage your Cloud Instances
  • 13. EGI CF13, Manchester, 9 April 2013 UNIPROT PDB EMBL PROSITE Genomes Public Data sources Bioinformatics Cloud BLAST, Clustal, etc. PaaS Workers VM CNS SharedFS launch jobs sshIaaS Master & Storage VM ARIA Portal shared (NFS) User Persistent data pdisk (iSCSI) Biological Data in Cloud Upload your data Get your results scp http/S3 scp http/S3
  • 14. EGI CF13, Manchester, 9 April 2013 Example:‘biocompute’ Appliance • Use your own instance(s) • With pre-installed standard bioinformatics tools • BLAST, FastA, SSearch,HMM,... • ClustalW2, Clustal-Omega, Muscle,.. • Bowtie(2), BWA, samtools, ... • MEME, R, etc. • Connected to public reference data • Uniprot, EMBL, genomes, PDB, etc. • Automaticaly shared to theVMs
  • 15. EGI CF13, Manchester, 9 April 2013 Example: Galaxy portal for NGS analyses • Analyse NGS data • portal Galaxy is widely used in the community • connected to large public data: sequences and indexes • large user data (GBs) • Preserve workflows and results (persistent storage)
  • 16. EGI CF13, Manchester, 9 April 2013 Example: Proteomics • Motivation • Collaboration with a mass spectroscopy platform • Running out of space on their local resources • Protein identification • Mass experimental data • Reference databases : nr, Swiss-Prot • Reference screening tools: OMSSA, X!Tandem • User interface • Remote display • NX • Reference GUIs • SearchGUI • PeptidShaker source: PeptideShaker site
  • 17. EGI CF13, Manchester, 9 April 2013 Conclusion • Provide turnkey bioinformatics appliances • Standard tools and pipelines • Interoperability: ready to run on cloud • Easier to transfer appliances than data (GB vs TB) • Provide a cloud infrastructure tightly connected to existing bioinformatics infrastructure • Public IDB’s bioinformatics cloud • Linked to public biological databases • In collaboration with the French Bioinformatics Institute • Ease the usage by scientists • Usual bioinformatics gateways • Persistent and large ubiquitous storage • Web interface for cloud management
  • 18. EGI CF13, Manchester, 9 April 2013 Perspectives • Define good practices to provide academic community and industry with bioinformatics services! • French Bioinformatics Institute - IFB • Goals are to provide core bioinformatics resources to the national and international life science research community in key fields such as genomics, proteomics, systems biology, etc. • Aims at building a national academic cloud devoted to Bioinformatics, inspired by the model evaluated through the IDB’s cloud. • European ELIXIR infrastructure • To build a sustainable European infrastructure for biological information, supporting life science research and its translation • IFB will be the French representative in ELIXIR.
  • 19. EGI CF13, Manchester, 9 April 2013 • Acknowledgment • StratusLab members • co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and by the French National Research Agency's Arpege Programme (ANR-10-SEGI-001). Questions ? http://idee-b.ibcp.fr