SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Christophe Blanchet, Clément Gauthey
Infrastructure Distributed for Biology
IDB-IBCP CNRS FR3302 - LYON - FRANCE
http://idee-b.ibcp.fr
IDB acknowledges co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552)
and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001)
IDB-Cloud
Providing Bioinformatics
Services on Cloud
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Bioinformatics Today
• Biological data are big data
• 1512 online databases (NAR Database Issue 2013)
• Institut Sanger, UK, 5 PB
• Beijing Genome Institute, China, 4 sites, 10 PB
➡ Big data in lot of places
• Analysing such data became difficult
• Scale-up of the analyses : gene/protein to complete genome/
proteome, ...
• Lot of different daily-used tools
• That need to be combined in workflows
• Usual interfaces: portals,Web services, federation,...
➡ Datacenters with ease of access/use
• Distributed resources
• Experimental platforms: NGS, imaging, ...
• Bioinformatics platforms
➡ Federation of datacenters
ADN
BI
M
ADN
A
ADN
BI CC
BI
ADN
ADN
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Sequencing Genomes
source: www.politigenomics.com/next-generation-sequencing-informatics
Complete genome sequencing
become a lab commodity with
NGS (cheap and efficient)
source: www.genomesonline.org
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Infrastructures in Biology
Lot of tools
and web services
to treat and vizualize
lot of data
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
The scene
• Bioinformatics services
providers
• Is it easy to deploy lot of
(incompatible) tools ?
• To make them connected to
public databases ?
• To limit transfer of huge data ?
• To provide users with their own
computing resources ?
• With their own isolated storage ?
• Scientists
• Is it easy to access/use these
tools ?
• To adapt to your usage ?
• To get your/other tools deployed
on a datacenter ?
• To combine them ?
• To get my own computing/
storage resources ?
ADN
ADN
BI
M
ADN
BI
ADN
ADN
BI CC
BI
ADN
ADN
ADN
Bioinformatics Center
Scientists
Computer
Resources
French biologists
have access to
regional resources
(RENABI)
Availability? Yes
Engineers
No
Compatible?
Usually one
cluster for
all use
Yes
No ?
tool
X ?
installation
time
RENABI GRISBI www.grisbio.fr
i
i
GR
SB
- GRISBI -
Bioinformatics
French Grid
© RENABI GRISBI - www.grisbio.fr
RENABI-GO APLIBIO
PRABI
RENABI-SO
IBISA
PF-2008
RENABI-NE
RENABI GRISBI
• Groupe de réflexion sur l’organisation et les
technologies:
	

 e.g. gLite, DIET, GridWay, BioMaj,ActiveCircle, Caringo, HDFS,
XtreemFS, dCache, …
• Infrastructure distribuée de Bioinformatique
• Soutien financier par RENABI , IBISA 2008-2011,
Institut des Grilles 2009-2010
• Ressources informatiques:
• dans les PFs 2600 coeurs, 310 To stockage
• déjà sur GRISBI 860 coeurs, 26 To stockage
• 5 centres régionaux RENABI
• PFs de production en Bioinformatique
• Labellisées RIO / IBISA
• 9 sites, 7 CNRS, 2 INRA
• ~70 membres enregistrés
• Collaboration avec les infrastructures informatiques
nationales: Institut des Grilles, Grid5000 GENCI,
Mésocentres
=> Pour structurer la communauté et proposer
des réponses aux besoins des biologistes
563 c
90 TB
444 c
62 TB
376 c
50 TB
304 c
32 TB
876 c
75 TB
www.grisbio.fr
RENABI GRISBI www.grisbio.fr
Satisfactions des besoins
gLite GRISBI
Banques internationales ~ oui biomaj NFS
Espace personnel ~ oui XtreemFS ?
Espace commun ~ oui
Accès simple au stockage non XtreemFS ?
Distribution des calculs WMS
Intégration cluster l’existant ~ oui CE-gateway
Déploiement des logiciels SWAREA ++ temps humain
Workflow/pipeline ~ DAG
Gestion des identités et accès vo.renabi.fr Shibboleth/LDAP
Interface facile à utiliser ~ CLI « commandes GR »
Interface publique: accès anonyme sur portail
et web services
non ? certificats robot, myproxy ?
➡ Logiciel gLite répond au besoin en puissance de calcul
➡ Modes d’accès et de gestion des données sont moins adaptés
aux usages de la communauté
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Cloud computing ?
Created by Sam Johnston
License: Creative Commons
9
StratusLab Project
Goal
§Create comprehensive, open-source,
IaaS cloud distribution
EU FP7 project
§1 June 2010—31 May 2012 (2 years)
§6 partners from 5 countries
§Budget : 3.3 M€ (2.3 M€ EC)
Contacts
§Site web: http://stratuslab.eu/
§Twitter: @StratusLab
§Support: support@stratuslab.eu
CNRS (FR) UCM (ES)
GRNET (GR) SIXSQ (CH)
TID (ES) TCD (IE)
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
IDB’s Cloud
• Cloud workbench for Biology
• 13 turnkey bioinformatics appliances (as of Apr. 2013)
• Running since Sept. 2011, opened to Biology community
• Lyon, FRANCE
• Powered by
• StratusLab
• Compute nodes, Block storage
• +900 cores, +4TB RAM, 36TB vdisks
• Mainly Intel SandyBridge servers with 32c 128GB
• Bigmen servers with 64c 768GB
• VMs from 1core-1GB to 64cores-768GB RAM
• + Openstack
• Object storage (Swift)
• +200 TB redundant & scalable storage
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Driven throught a simple web interface
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Integrate Bioinformatics Tools in Cloud
BLAST
GOR4
FastA
SSearch
Abyss
ClustalW
Bioinformatics
Tools
Ray
BWA
PhyML RedHat,
CentOS
Debian,
Ubuntu
Suse
Linux
Virtual machines
Create
new
Appliance
Bioinformatics Marketplace
NGSStructure Galaxy ARIA (…)Sequence
• Appliances are virtual machines
• small : few GB, easy to convert in most virtualization formats
• Installed and pre-configured with common bioinformatics tools
• e.g. BLAST, Clustalw,ARIA, MEME, HMMer, TopHat, BWA, Samtools, etc.
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Bioinformatics Appliances
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Select your bioinformatics tools
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Run Bioinformatics Cloud Instances
Bioinformatics Marketplace
NGSStructure Galaxy ARIA (…)Sequence
IBCP's Cloud
Resources
BLAST,
Clustal,
etc.
PaaS
Workers
VM CNS
SharedFS
launch jobs
sshIaaS
Master & Storage
VM ARIA
Portal
Launch
Instances
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Manage your Cloud Instances
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
UNIPROT
PDB
EMBL
PROSITE
Genomes
Public
Data sources
Bioinformatics
Cloud
BLAST,
Clustal,
etc.
PaaS
Workers
VM CNS
SharedFS
launch jobs
sshIaaS
Master & Storage
VM ARIA
Portal
shared
(NFS)
User
Persistent data
pdisk
(iSCSI)
Biological Data in Cloud
Upload your data
Get your results
scp http/S3
scp http/S3
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Biological examples
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Common bioinformatics node
• ‘Biocompute’ appliance
• Use your own instance(s)
• With pre-installed
standard bioinformatics
tools
• BLAST, FastA, SSearch,HMM,...
• ClustalW2, Clustal-Omega, Muscle,..
• Bowtie(2), BWA, samtools, ...
• MEME, R, etc.
• Connected to public
reference data
• Uniprot, EMBL, genomes, PDB, etc.
• Automaticaly shared to theVMs
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Structural Biology
• TOwards StruCtural AssignmeNt Improvement
• To improve the determination of protein structures based on
Nuclear Magnetic Resonance (NMR) information with ARIA
software
• Large computational needs.
• A NMR laboratory will not specially invest in building a cluster of
about 100 nodes to be able to run such NMR structure calculations.
• Flexibility of the cloud to deploy the different required
bioinformatics tools can accelerate such a procedure.
• Commercial interest in providing such tools to structural biologists
on a “pay as you go” basis.
• Endorsers:
Institut Pasteur Paris
and CNRS IBCP
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
IaaS deployment of ARIA
Shared
Storage
Intermediate
results
CNS
CNS
CNS
CNS
CNS
CNS
CNS
CNS
...
(20-100)
Structure
preparation
(8x)
ARIA
Final
results
Input data: 10s MB
Results: GB
Read
Write
Virtual
Cluster
Workers
VM CNS
Master & Storage
VM ARIA SharedFS
launch jobs
ssh
Significant increase in the
number of calculated protein
conformations improves the
statistics on the NMR
conformations and can help
to overcome the ambiguity
bottleneck.
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Galaxy portal for NGS analyses
• Analyse NGS data
• portal Galaxy is widely used in the community
• connected to large public data: sequences and indexes
• large user data (GBs)
• Preserve workflows and results (persistent storage)
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Proteomics desktop
• Motivation
• Collaboration with a mass spectroscopy platform
• Running out of space on their local resources
• Protein identification
• Mass experimental data
• Reference databases : nr, Swiss-Prot
• Reference screening tools:
OMSSA, X!Tandem
• User interface
• Remote display
• NX
• Reference GUIs
• SearchGUI
• PeptidShaker
source: PeptideShaker site
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Conclusion
• Provide turnkey bioinformatics appliances
• Standard tools and pipelines
• Interoperability: ready to run on cloud
• Easier to transfer appliances than data (GB vs TB)
• Provide a cloud infrastructure tightly connected to
existing bioinformatics infrastructure
• Public IDB’s bioinformatics cloud
• Linked to public biological databases
• In collaboration with the French Bioinformatics Institute
• Ease the usage by scientists
• Usual bioinformatics gateways
• Persistent and large ubiquitous storage
• Web interface for cloud management
• Access on a registration basis and standard use
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
Perspectives
• Define good practices to provide academic community and
industry with bioinformatics services!
• French Bioinformatics Institute - IFB
• Goals are to provide core bioinformatics resources to the national and
international life science research community in key fields such as genomics,
proteomics, systems biology, etc.
• Aims at building a national academic cloud devoted to Bioinformatics, inspired
by the model evaluated through the IDB’s cloud.
• European ELIXIR infrastructure
• To build a sustainable European
infrastructure for biological
information, supporting life science
research and its
translation
• IFB will be the French
representative in ELIXIR.
Bioinformatics
CenterAppliances
catalog
Scientists
French biologists
have access to
regional resources
(RENABI)
Yes
Engineers
No
tool
X ? Cloud
Bioinformatics or
public cloud.
Regional, national
or a federation.
Appliances
create new
register
Available ?
Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013
• Acknowledgment
• IDB members: Clément Gauthey, Simon Malesys
• StratusLab members
• co-funding by the European Community's Seventh
Framework Programme (INFSO-RI-261552) and by
the French National Research Agency's Arpege
Programme (ANR-10-SEGI-001).
Questions ?
http://idee-b.ibcp.fr

Contenu connexe

Tendances

PLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak świat
PLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak światPLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak świat
PLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak światPROIDEA
 
Challenges and Issues of Next Cloud Computing Platforms
Challenges and Issues of Next Cloud Computing PlatformsChallenges and Issues of Next Cloud Computing Platforms
Challenges and Issues of Next Cloud Computing PlatformsFrederic Desprez
 
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Ola Spjuth
 
Federation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research CloudFederation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research CloudOpenStack
 
Pathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaborationPathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaborationEOSC-hub project
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Frederic Desprez
 
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...Larry Smarr
 
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)Dag Endresen
 
Introduction NL-HUG (April)
Introduction NL-HUG (April)Introduction NL-HUG (April)
Introduction NL-HUG (April)Evert Lammerts
 
Open Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked DataOpen Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked DataPascal-Nicolas Becker
 
Enabling efficient movement of data into & out of a high-performance analysis...
Enabling efficient movement of data into & out of a high-performance analysis...Enabling efficient movement of data into & out of a high-performance analysis...
Enabling efficient movement of data into & out of a high-performance analysis...Jisc
 
Archivematica in Czech Libraries
Archivematica in Czech LibrariesArchivematica in Czech Libraries
Archivematica in Czech Librariesdp-blog-cz
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?Carole Goble
 
Overview of the W3C Semantic Sensor Network (SSN) ontology
Overview of the W3C Semantic Sensor Network (SSN) ontologyOverview of the W3C Semantic Sensor Network (SSN) ontology
Overview of the W3C Semantic Sensor Network (SSN) ontologyRaúl García Castro
 
OGF Introductory Overview - FAS* 2014
OGF Introductory Overview -  FAS* 2014OGF Introductory Overview -  FAS* 2014
OGF Introductory Overview - FAS* 2014Alan Sill
 
OGF Standards Overview - ITU-T JCA Cloud
OGF Standards Overview - ITU-T JCA CloudOGF Standards Overview - ITU-T JCA Cloud
OGF Standards Overview - ITU-T JCA CloudAlan Sill
 

Tendances (19)

PLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak świat
PLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak światPLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak świat
PLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak świat
 
Challenges and Issues of Next Cloud Computing Platforms
Challenges and Issues of Next Cloud Computing PlatformsChallenges and Issues of Next Cloud Computing Platforms
Challenges and Issues of Next Cloud Computing Platforms
 
From IoT Devices to Cloud
From IoT Devices to CloudFrom IoT Devices to Cloud
From IoT Devices to Cloud
 
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
 
Federation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research CloudFederation and Interoperability in the Nectar Research Cloud
Federation and Interoperability in the Nectar Research Cloud
 
Pathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaborationPathways for EOSC-hub and MaX collaboration
Pathways for EOSC-hub and MaX collaboration
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
 
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
 
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
EURISCO and GBIF, at the European genbank network meeting (Bonn, April 2004)
 
Itapa2010 custodea
Itapa2010 custodeaItapa2010 custodea
Itapa2010 custodea
 
Introduction NL-HUG (April)
Introduction NL-HUG (April)Introduction NL-HUG (April)
Introduction NL-HUG (April)
 
Open Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked DataOpen Science Days 2014 - Becker - Repositories and Linked Data
Open Science Days 2014 - Becker - Repositories and Linked Data
 
Enabling efficient movement of data into & out of a high-performance analysis...
Enabling efficient movement of data into & out of a high-performance analysis...Enabling efficient movement of data into & out of a high-performance analysis...
Enabling efficient movement of data into & out of a high-performance analysis...
 
Archivematica in Czech Libraries
Archivematica in Czech LibrariesArchivematica in Czech Libraries
Archivematica in Czech Libraries
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?
 
Virtualization for HPC at NCI
Virtualization for HPC at NCIVirtualization for HPC at NCI
Virtualization for HPC at NCI
 
Overview of the W3C Semantic Sensor Network (SSN) ontology
Overview of the W3C Semantic Sensor Network (SSN) ontologyOverview of the W3C Semantic Sensor Network (SSN) ontology
Overview of the W3C Semantic Sensor Network (SSN) ontology
 
OGF Introductory Overview - FAS* 2014
OGF Introductory Overview -  FAS* 2014OGF Introductory Overview -  FAS* 2014
OGF Introductory Overview - FAS* 2014
 
OGF Standards Overview - ITU-T JCA Cloud
OGF Standards Overview - ITU-T JCA CloudOGF Standards Overview - ITU-T JCA Cloud
OGF Standards Overview - ITU-T JCA Cloud
 

Similaire à IDB-Cloud Providing Bioinformatics Services on Cloud

Providing Bioinformatics Services on Cloud
Providing Bioinformatics Services on CloudProviding Bioinformatics Services on Cloud
Providing Bioinformatics Services on Cloudstratuslab
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchBlue BRIDGE
 
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...OpenAIRE
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair" OpenAIRE
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubBjörn Backeberg
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver
 
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos
 
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Archiver
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Robert Grossman
 
Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?Blue BRIDGE
 
From construction to deployment of LifeWatchGreece the potentail role of EGI-...
From construction to deployment of LifeWatchGreece the potentail role of EGI-...From construction to deployment of LifeWatchGreece the potentail role of EGI-...
From construction to deployment of LifeWatchGreece the potentail role of EGI-...Emmanouella Panteri
 
ACC-2012, Bangalore, India, 28 July, 2012
ACC-2012, Bangalore, India, 28 July, 2012ACC-2012, Bangalore, India, 28 July, 2012
ACC-2012, Bangalore, India, 28 July, 2012Charith Perera
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...Frederic Desprez
 
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructureeROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructuree-ROSA
 
MDIS workshop 2015
MDIS workshop 2015MDIS workshop 2015
MDIS workshop 2015terradue
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
National scale research computing and beyond pearc panel 2017
National scale research computing and beyond   pearc panel 2017National scale research computing and beyond   pearc panel 2017
National scale research computing and beyond pearc panel 2017Gregory Newby
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud ComputingDavid Wallom
 

Similaire à IDB-Cloud Providing Bioinformatics Services on Cloud (20)

Providing Bioinformatics Services on Cloud
Providing Bioinformatics Services on CloudProviding Bioinformatics Services on Cloud
Providing Bioinformatics Services on Cloud
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
The Neuroinformatics community in OpenAIRE Connect (Presentation by Sorina Po...
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
 
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
 
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011
 
Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?Towards an e-infrastructure in agriculture?
Towards an e-infrastructure in agriculture?
 
From construction to deployment of LifeWatchGreece the potentail role of EGI-...
From construction to deployment of LifeWatchGreece the potentail role of EGI-...From construction to deployment of LifeWatchGreece the potentail role of EGI-...
From construction to deployment of LifeWatchGreece the potentail role of EGI-...
 
ACC-2012, Bangalore, India, 28 July, 2012
ACC-2012, Bangalore, India, 28 July, 2012ACC-2012, Bangalore, India, 28 July, 2012
ACC-2012, Bangalore, India, 28 July, 2012
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Session 33 - Production Grids
Session 33 - Production GridsSession 33 - Production Grids
Session 33 - Production Grids
 
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
SILECS/SLICES - Super Infrastructure for Large-Scale Experimental Computer Sc...
 
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructureeROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
eROSA Stakeholder WS1: EUDAT – The pan-European data infrastructure
 
MDIS workshop 2015
MDIS workshop 2015MDIS workshop 2015
MDIS workshop 2015
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
National scale research computing and beyond pearc panel 2017
National scale research computing and beyond   pearc panel 2017National scale research computing and beyond   pearc panel 2017
National scale research computing and beyond pearc panel 2017
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
 

Dernier

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Dernier (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

IDB-Cloud Providing Bioinformatics Services on Cloud

  • 1. Christophe Blanchet, Clément Gauthey Infrastructure Distributed for Biology IDB-IBCP CNRS FR3302 - LYON - FRANCE http://idee-b.ibcp.fr IDB acknowledges co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and the French National Research Agency's Arpege Programme (ANR-10-SEGI-001) IDB-Cloud Providing Bioinformatics Services on Cloud
  • 2. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Bioinformatics Today • Biological data are big data • 1512 online databases (NAR Database Issue 2013) • Institut Sanger, UK, 5 PB • Beijing Genome Institute, China, 4 sites, 10 PB ➡ Big data in lot of places • Analysing such data became difficult • Scale-up of the analyses : gene/protein to complete genome/ proteome, ... • Lot of different daily-used tools • That need to be combined in workflows • Usual interfaces: portals,Web services, federation,... ➡ Datacenters with ease of access/use • Distributed resources • Experimental platforms: NGS, imaging, ... • Bioinformatics platforms ➡ Federation of datacenters ADN BI M ADN A ADN BI CC BI ADN ADN
  • 3. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Sequencing Genomes source: www.politigenomics.com/next-generation-sequencing-informatics Complete genome sequencing become a lab commodity with NGS (cheap and efficient) source: www.genomesonline.org
  • 4. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Infrastructures in Biology Lot of tools and web services to treat and vizualize lot of data
  • 5. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 The scene • Bioinformatics services providers • Is it easy to deploy lot of (incompatible) tools ? • To make them connected to public databases ? • To limit transfer of huge data ? • To provide users with their own computing resources ? • With their own isolated storage ? • Scientists • Is it easy to access/use these tools ? • To adapt to your usage ? • To get your/other tools deployed on a datacenter ? • To combine them ? • To get my own computing/ storage resources ? ADN ADN BI M ADN BI ADN ADN BI CC BI ADN ADN ADN Bioinformatics Center Scientists Computer Resources French biologists have access to regional resources (RENABI) Availability? Yes Engineers No Compatible? Usually one cluster for all use Yes No ? tool X ? installation time
  • 6. RENABI GRISBI www.grisbio.fr i i GR SB - GRISBI - Bioinformatics French Grid © RENABI GRISBI - www.grisbio.fr RENABI-GO APLIBIO PRABI RENABI-SO IBISA PF-2008 RENABI-NE RENABI GRISBI • Groupe de réflexion sur l’organisation et les technologies: e.g. gLite, DIET, GridWay, BioMaj,ActiveCircle, Caringo, HDFS, XtreemFS, dCache, … • Infrastructure distribuée de Bioinformatique • Soutien financier par RENABI , IBISA 2008-2011, Institut des Grilles 2009-2010 • Ressources informatiques: • dans les PFs 2600 coeurs, 310 To stockage • déjà sur GRISBI 860 coeurs, 26 To stockage • 5 centres régionaux RENABI • PFs de production en Bioinformatique • Labellisées RIO / IBISA • 9 sites, 7 CNRS, 2 INRA • ~70 membres enregistrés • Collaboration avec les infrastructures informatiques nationales: Institut des Grilles, Grid5000 GENCI, Mésocentres => Pour structurer la communauté et proposer des réponses aux besoins des biologistes 563 c 90 TB 444 c 62 TB 376 c 50 TB 304 c 32 TB 876 c 75 TB www.grisbio.fr
  • 7. RENABI GRISBI www.grisbio.fr Satisfactions des besoins gLite GRISBI Banques internationales ~ oui biomaj NFS Espace personnel ~ oui XtreemFS ? Espace commun ~ oui Accès simple au stockage non XtreemFS ? Distribution des calculs WMS Intégration cluster l’existant ~ oui CE-gateway Déploiement des logiciels SWAREA ++ temps humain Workflow/pipeline ~ DAG Gestion des identités et accès vo.renabi.fr Shibboleth/LDAP Interface facile à utiliser ~ CLI « commandes GR » Interface publique: accès anonyme sur portail et web services non ? certificats robot, myproxy ? ➡ Logiciel gLite répond au besoin en puissance de calcul ➡ Modes d’accès et de gestion des données sont moins adaptés aux usages de la communauté
  • 8. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Cloud computing ? Created by Sam Johnston License: Creative Commons
  • 9. 9 StratusLab Project Goal §Create comprehensive, open-source, IaaS cloud distribution EU FP7 project §1 June 2010—31 May 2012 (2 years) §6 partners from 5 countries §Budget : 3.3 M€ (2.3 M€ EC) Contacts §Site web: http://stratuslab.eu/ §Twitter: @StratusLab §Support: support@stratuslab.eu CNRS (FR) UCM (ES) GRNET (GR) SIXSQ (CH) TID (ES) TCD (IE)
  • 10. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 IDB’s Cloud • Cloud workbench for Biology • 13 turnkey bioinformatics appliances (as of Apr. 2013) • Running since Sept. 2011, opened to Biology community • Lyon, FRANCE • Powered by • StratusLab • Compute nodes, Block storage • +900 cores, +4TB RAM, 36TB vdisks • Mainly Intel SandyBridge servers with 32c 128GB • Bigmen servers with 64c 768GB • VMs from 1core-1GB to 64cores-768GB RAM • + Openstack • Object storage (Swift) • +200 TB redundant & scalable storage
  • 11. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Driven throught a simple web interface
  • 12. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Integrate Bioinformatics Tools in Cloud BLAST GOR4 FastA SSearch Abyss ClustalW Bioinformatics Tools Ray BWA PhyML RedHat, CentOS Debian, Ubuntu Suse Linux Virtual machines Create new Appliance Bioinformatics Marketplace NGSStructure Galaxy ARIA (…)Sequence • Appliances are virtual machines • small : few GB, easy to convert in most virtualization formats • Installed and pre-configured with common bioinformatics tools • e.g. BLAST, Clustalw,ARIA, MEME, HMMer, TopHat, BWA, Samtools, etc.
  • 13. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Bioinformatics Appliances
  • 14. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Select your bioinformatics tools
  • 15. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Run Bioinformatics Cloud Instances Bioinformatics Marketplace NGSStructure Galaxy ARIA (…)Sequence IBCP's Cloud Resources BLAST, Clustal, etc. PaaS Workers VM CNS SharedFS launch jobs sshIaaS Master & Storage VM ARIA Portal Launch Instances
  • 16. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Manage your Cloud Instances
  • 17. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 UNIPROT PDB EMBL PROSITE Genomes Public Data sources Bioinformatics Cloud BLAST, Clustal, etc. PaaS Workers VM CNS SharedFS launch jobs sshIaaS Master & Storage VM ARIA Portal shared (NFS) User Persistent data pdisk (iSCSI) Biological Data in Cloud Upload your data Get your results scp http/S3 scp http/S3
  • 18. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Biological examples
  • 19. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Common bioinformatics node • ‘Biocompute’ appliance • Use your own instance(s) • With pre-installed standard bioinformatics tools • BLAST, FastA, SSearch,HMM,... • ClustalW2, Clustal-Omega, Muscle,.. • Bowtie(2), BWA, samtools, ... • MEME, R, etc. • Connected to public reference data • Uniprot, EMBL, genomes, PDB, etc. • Automaticaly shared to theVMs
  • 20. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Structural Biology • TOwards StruCtural AssignmeNt Improvement • To improve the determination of protein structures based on Nuclear Magnetic Resonance (NMR) information with ARIA software • Large computational needs. • A NMR laboratory will not specially invest in building a cluster of about 100 nodes to be able to run such NMR structure calculations. • Flexibility of the cloud to deploy the different required bioinformatics tools can accelerate such a procedure. • Commercial interest in providing such tools to structural biologists on a “pay as you go” basis. • Endorsers: Institut Pasteur Paris and CNRS IBCP
  • 21. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 IaaS deployment of ARIA Shared Storage Intermediate results CNS CNS CNS CNS CNS CNS CNS CNS ... (20-100) Structure preparation (8x) ARIA Final results Input data: 10s MB Results: GB Read Write Virtual Cluster Workers VM CNS Master & Storage VM ARIA SharedFS launch jobs ssh Significant increase in the number of calculated protein conformations improves the statistics on the NMR conformations and can help to overcome the ambiguity bottleneck.
  • 22. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Galaxy portal for NGS analyses • Analyse NGS data • portal Galaxy is widely used in the community • connected to large public data: sequences and indexes • large user data (GBs) • Preserve workflows and results (persistent storage)
  • 23. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Proteomics desktop • Motivation • Collaboration with a mass spectroscopy platform • Running out of space on their local resources • Protein identification • Mass experimental data • Reference databases : nr, Swiss-Prot • Reference screening tools: OMSSA, X!Tandem • User interface • Remote display • NX • Reference GUIs • SearchGUI • PeptidShaker source: PeptideShaker site
  • 24. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Conclusion • Provide turnkey bioinformatics appliances • Standard tools and pipelines • Interoperability: ready to run on cloud • Easier to transfer appliances than data (GB vs TB) • Provide a cloud infrastructure tightly connected to existing bioinformatics infrastructure • Public IDB’s bioinformatics cloud • Linked to public biological databases • In collaboration with the French Bioinformatics Institute • Ease the usage by scientists • Usual bioinformatics gateways • Persistent and large ubiquitous storage • Web interface for cloud management • Access on a registration basis and standard use
  • 25. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 Perspectives • Define good practices to provide academic community and industry with bioinformatics services! • French Bioinformatics Institute - IFB • Goals are to provide core bioinformatics resources to the national and international life science research community in key fields such as genomics, proteomics, systems biology, etc. • Aims at building a national academic cloud devoted to Bioinformatics, inspired by the model evaluated through the IDB’s cloud. • European ELIXIR infrastructure • To build a sustainable European infrastructure for biological information, supporting life science research and its translation • IFB will be the French representative in ELIXIR. Bioinformatics CenterAppliances catalog Scientists French biologists have access to regional resources (RENABI) Yes Engineers No tool X ? Cloud Bioinformatics or public cloud. Regional, national or a federation. Appliances create new register Available ?
  • 26. Réseau des Ingénieurs en Bioinformatique, Lille, 23 mai 2013 • Acknowledgment • IDB members: Clément Gauthey, Simon Malesys • StratusLab members • co-funding by the European Community's Seventh Framework Programme (INFSO-RI-261552) and by the French National Research Agency's Arpege Programme (ANR-10-SEGI-001). Questions ? http://idee-b.ibcp.fr