SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
The Pulse of Cloud Computing
with Bioinformatics as an example
Nuwan Goonasekera†
, Enis Afgan*
†
University of Melbourne, Melbourne Bioinformatics, Australia
* Johns Hopkins University, Taylor Lab, USA
@ University of Colombo
Feb 2017
The answer to everything?
Overview
• The key characteristics of Cloud Computing
• Using Cloud Computing for bioinformatics
Source: http://dilbert.com/strips/comic/2012-05-25/
A modern data-center
Source: http://www.businessinsider.com/google-data-centers-2014-10?op=1
Data center use before cloud computing
source: http://www.rackspace.com/knowledge_center/whitepaper/revolution-not-evolution-how-cloud-computing-differs-from-traditional-it-and-why-it
Cloud Computing: A Definition
• NIST definition: “Cloud computing is a model for enabling
ubiquitous, convenient, on-demand network access to a
shared pool of configurable computing resources (e.g.,
networks, servers, storage, applications, and services) that
can be rapidly provisioned and released with minimal
management effort or service provider interaction.”
» National Institute of Standards and Technology
(http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf)
The Cloud Model
Private Community Public Hybrid
Deployment
Models
Delivery
Models
Essential
Characteristics
Software as a Service
(SaaS)
Platform as a Service
(PaaS)
Infrastructure as a
Service (IaaS)
• On-demand self-service
• Broad network access
• Resource pooling
• Rapid elasticity
• Measured service
Delivery Models
source: http://www.businessinsider.com.au/10-most-important-in-cloud-computing-2013-4?op=1#a-word-about-clouds-1
Infrastructure-as-a-Service (IaaS)
• Amazon Web Services (Market leader)
• Rackspace Cloud
• NeCTAR/OpenStack Research Cloud
• Joyent Cloud
• GoGrid
• FlexiScale
Public PaaS Examples
Cloud Name Language and
Developer Tools
Programming
Models Supported
by Provider
Target Applications
and Storage Options
Google App Engine Python, Java, Go,
PHP + JVM languages
(scala, groovy, jruby)
MapReduce, Web,
DataStore, Storage
and other APIs
Web applications and
BigTable storage
Salesforce.com’s
Force.com
Apex, Eclipsed-based
IDE, web-based
wizard
Workflow, excel-like
formula, web
programming
Business applications
such as CRM
Microsoft Azure .NET, Visual Studio,
Azure tools
Unrestricted model Enterprise and web
apps
Amazon Elastic
MapReduce
Hive, Pig, Java, Ruby
etc.
MapReduce Data processing and
e-commerce
Aneka .NET, stand-alone
SDK
Threads, task,
MapReduce
.NET enterprise
applications, HPC
Public SaaS examples
• Gmail
• Sharepoint
• Salesforce.com CRM
• On-live
• Gaikai
• Microsoft Office 365
• Some definitions include those that do not require payment.
E.g. ad-supported sites
Things we find most interesting
• Accessibility
• Infrastructure as code
• Elasticity
• Programming models that fit the cloud
Accessibility
● Global availability via public clouds
● On-demand self-service
● A platform for democratisation of computing
● Access is enabled via point-and-click interfaces (blends with the Internet)
Infrastructure as Code
• Programmable
• Captures knowledge
• DevOps
Elasticity
• Rapidly expand and shrink based on demand
• “Infinite” scaling
• Cost-driven architecture
• Ties in with infrastructure-as-code
Programming models that fit the cloud
• Fault-tolerant models
• Massively scalable
• Distributed algorithms
Cloud computing is a valuable resource -
but what do we use it for?
Bioinformatics
A multi-disciplinary science using computers for acquiring, managing and
analyzing biological data.
It is a data-driven science.
It is a tool for genomics research.
Biology Medicine
Math &
Physics
Computer
Science
Bioinformatics
Genomics
Oxford dictionaries
“The branch of molecular biology concerned with the
structure, function, evolution, and mapping of genomes.”
Where are the genes and other interesting pieces?
How do sequences change over evolutionary time?
What does all the DNA do?
What are the physical shapes of the genome and its products?
Genomics: contrast with biology and genetics
Biology and genetics
Targeted studies of one
or a few genes
Targeted,
low-throughput
experiments
Clever experimental design,
painstaking experimentation
Genomics
Studies considering all
genes in a genome
Global,
high-throughput
experiments
Tons of data,
uncertainty, computation
scope
technology
hard part
* Everything on this slide is
a generalization
Where is genomics used?
Basic science
● What is the DNA sequence of the genome?
● Where are the genes?
● What does all the DNA in the genome do?
● How did history shape our ethnicities and populations?
Medicine
● What’s the difference between DNA in a tumor vs DNA in healthy tissue?
● Can genomic data help predict what drugs might be appropriate for:
○ a particular cancer patient?
○ a particular genetic disorder?
● Can genomic data help us predict what flu strains will prevail next year?
Genome
Oxford dictionaries
“The complete set of genes or genetic material
present in a cell or organism.”
“Blueprint” or “recipe” of life.
Self-copying store of read-only information about
how to develop and maintain an organism.
Where do genomes live?
All the trillions of cells in a person have
same genomic DNA in the nucleus.
Picture source:
https://publications.nigms.nih.gov/insidethecell/preface.html
Genome
How do we obtain genome data? Sequencing!
First methods developed in the mid-1970’s, called Sanger sequencing.
In the 1990’s, the international Human Genome Project took 13 years to sequence
the human genome.
In the 2000’s, massively parallel Next Generation Sequencers (NGS) were
developed that took days to sequence a human genome at a much lesser cost.
Today, nanopore sequencers are emerging, offering real time sequencing.
There are many public data repositories with
free access to data (e.g., TCGA, 1000 genomes,
GenBank).
Two unrelated humans have genomes that are ~99.8% similar by sequence.
There are about 3-4 million differences. Most are small, e.g. Single Nucleotide
Polymorphisms (SNPs).
Human and chimpanzee
genomes are about 96%
similar.
Genome variation
Apply data transformations to extract useful information
This is not always a well-defined process
This is typically done with existing tools, or by developing one’s own
Tools can be chained into workflows
Making sense of the data through data manipulations
What does all of this have to do with
Cloud Computing?
omicsmaps.com
World’s clouds
bit.ly/worldclouds
Results
External reference
data
Raw
data
Data analysis
100-1000's GB
few GB
Typical genomics flow
Results
Raw
data
Some computers + reliable persistent data storage +
bioinf tools + reference data + workflow system
100-1000's GB
few GB
Indexed
genomes
10-100's GB
Aug
Sep
Oct
Nov
...
A real-world infrastructure requirements
A Data analysis and integration tool
A (free for everyone) web service integrating a
wealth of tools, compute resources, terabytes of
reference data and permanent storage
Open source software that makes integrating your
own tools and data and customizing for your own
site simple
Galaxy: accessible analysis system
Three ways to use Galaxy
1. Download and run locally
2. Public website (http://usegalaxy.org)
3. Run on the Cloud
Bringing cloud resources to genomics
Cloud resources need to be provisioned and configured for use in genomics.
A Cloud Manager that orchestrates all of the steps required to provision, manage,
and share a compute platform on a cloud infrastructure, all through a web
browser.
Accessibility
Get started at https://launch.usegalaxy.org/
Elasticity
Manage it programmatically
Create a new CloudMan compute cluster
Manage an existing CloudMan instance
How is it all achieved?
Architectural stack
CloudLaunch.usegalaxy.org
C L O U D A P P S
CloudBridge
CloudMan
cloudbridge.readthedocs.org
github.com/gvlproject/cloudbridge
beta.launch.usegalaxy.org
github.com/galaxyproject/cloudlaunch-ui
github.com/galaxyproject/cloudlaunch
wiki.galaxyproject.org/CloudMan
github.com/galaxyproject/cloudman
Impact?
http://www.citeulike.org/group/16008/tag/usecloud
Acknowledgments
Everything talked about here is an effort from a large community!
Come talk to us; get involved.
enis.afgan@jhu.edu or nuwan.goonasekera@unimelb.edu.au

Contenu connexe

Tendances

A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
balmanme
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
Ian Foster
 

Tendances (20)

Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Bioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWSBioinformatics Data Pipelines built by CSIRO on AWS
Bioinformatics Data Pipelines built by CSIRO on AWS
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
Toward a National Research Platform
Toward a National Research PlatformToward a National Research Platform
Toward a National Research Platform
 
Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)Accelerating your Research with Microsoft Azure (June 2015)
Accelerating your Research with Microsoft Azure (June 2015)
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonDoing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis Gannon
 
Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)
 
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis  GannonKeynote IEEE International Workshop on Cloud Analytics. Dennis  Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
A4 r overview deck_1.7
A4 r overview deck_1.7A4 r overview deck_1.7
A4 r overview deck_1.7
 
Reproducible Research and the Cloud
Reproducible Research and the CloudReproducible Research and the Cloud
Reproducible Research and the Cloud
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURECYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
 

En vedette

Pasos para crear un blog
Pasos para crear un blogPasos para crear un blog
Pasos para crear un blog
angiedaiana
 
CRM Via SMS
CRM Via  SMSCRM Via  SMS
CRM Via SMS
MABSIV
 

En vedette (20)

Resource planning on the (Amazon) cloud
Resource planning on the (Amazon) cloudResource planning on the (Amazon) cloud
Resource planning on the (Amazon) cloud
 
Scaling Data Science: Engineering a Platform
Scaling Data Science: Engineering a PlatformScaling Data Science: Engineering a Platform
Scaling Data Science: Engineering a Platform
 
Pasos para crear un blog
Pasos para crear un blogPasos para crear un blog
Pasos para crear un blog
 
From Analysis to Action- Communicating Data Science Insights
From Analysis to Action- Communicating Data Science InsightsFrom Analysis to Action- Communicating Data Science Insights
From Analysis to Action- Communicating Data Science Insights
 
06. la 1ª GERRA MUNDIAL y la revolución rusa
06. la 1ª GERRA MUNDIAL y la revolución rusa06. la 1ª GERRA MUNDIAL y la revolución rusa
06. la 1ª GERRA MUNDIAL y la revolución rusa
 
Tasarım kuralları
Tasarım kurallarıTasarım kuralları
Tasarım kuralları
 
CRM Via SMS
CRM Via  SMSCRM Via  SMS
CRM Via SMS
 
Mantıksal programlama
Mantıksal programlama Mantıksal programlama
Mantıksal programlama
 
Emirates- A marketing excellence case study
Emirates- A marketing excellence case studyEmirates- A marketing excellence case study
Emirates- A marketing excellence case study
 
Strategic Technical Presenation
Strategic Technical PresenationStrategic Technical Presenation
Strategic Technical Presenation
 
Big Process for Big Data
Big Process for Big DataBig Process for Big Data
Big Process for Big Data
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panel
 
Big Data and Genomics
Big Data and GenomicsBig Data and Genomics
Big Data and Genomics
 
Effective ansible
Effective ansibleEffective ansible
Effective ansible
 
HL7: Clinical Decision Support
HL7: Clinical Decision SupportHL7: Clinical Decision Support
HL7: Clinical Decision Support
 
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
 
Jsm madduri-august-2015
Jsm madduri-august-2015Jsm madduri-august-2015
Jsm madduri-august-2015
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
 
Public.Cdsc.Middleton
Public.Cdsc.MiddletonPublic.Cdsc.Middleton
Public.Cdsc.Middleton
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 

Similaire à The pulse of cloud computing with bioinformatics as an example

Similaire à The pulse of cloud computing with bioinformatics as an example (20)

2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Big Data
Big Data Big Data
Big Data
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Kerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensorsKerry Taylor - Semantics & sensors
Kerry Taylor - Semantics & sensors
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Climb bath
Climb bathClimb bath
Climb bath
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
 
Sgci esip-7-20-18
Sgci esip-7-20-18Sgci esip-7-20-18
Sgci esip-7-20-18
 
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
 
An Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data ResourceAn Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data Resource
 
The BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative researchThe BlueBRIDGE approach to collaborative research
The BlueBRIDGE approach to collaborative research
 
Cyberistructure
CyberistructureCyberistructure
Cyberistructure
 
Jisc's new shared data centre
Jisc's new shared data centreJisc's new shared data centre
Jisc's new shared data centre
 
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
 
SFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free softwareSFSCON23 - Michele Finelli - Management of large genomic data with free software
SFSCON23 - Michele Finelli - Management of large genomic data with free software
 
Final Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational ResearchFinal Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational Research
 

Plus de Enis Afgan

GCC 2014 scriptable workshop
GCC 2014 scriptable workshopGCC 2014 scriptable workshop
GCC 2014 scriptable workshop
Enis Afgan
 
Galaxy workshop
Galaxy workshopGalaxy workshop
Galaxy workshop
Enis Afgan
 
CloudMan workshop
CloudMan workshopCloudMan workshop
CloudMan workshop
Enis Afgan
 

Plus de Enis Afgan (15)

Federated Galaxy: Biomedical Computing at the Frontier
Federated Galaxy: Biomedical Computing at the FrontierFederated Galaxy: Biomedical Computing at the Frontier
Federated Galaxy: Biomedical Computing at the Frontier
 
From laptop to super-computer: standardizing installation and management of G...
From laptop to super-computer: standardizing installation and management of G...From laptop to super-computer: standardizing installation and management of G...
From laptop to super-computer: standardizing installation and management of G...
 
Horizontal scaling with Galaxy
Horizontal scaling with GalaxyHorizontal scaling with Galaxy
Horizontal scaling with Galaxy
 
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEndofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
 
2016 07 - CloudBridge Python library (XSEDE16)
2016 07 - CloudBridge Python library (XSEDE16)2016 07 - CloudBridge Python library (XSEDE16)
2016 07 - CloudBridge Python library (XSEDE16)
 
2017.07.19 Galaxy & Jetstream cloud
2017.07.19 Galaxy & Jetstream cloud2017.07.19 Galaxy & Jetstream cloud
2017.07.19 Galaxy & Jetstream cloud
 
Galaxy CloudMan performance on AWS
Galaxy CloudMan performance on AWSGalaxy CloudMan performance on AWS
Galaxy CloudMan performance on AWS
 
Adding Transparency and Automation into the Galaxy Tool Installation Process
Adding Transparency and Automation into the Galaxy Tool Installation ProcessAdding Transparency and Automation into the Galaxy Tool Installation Process
Adding Transparency and Automation into the Galaxy Tool Installation Process
 
Enabling Cloud Bursting for Life Sciences within Galaxy
Enabling Cloud Bursting for Life Sciences within GalaxyEnabling Cloud Bursting for Life Sciences within Galaxy
Enabling Cloud Bursting for Life Sciences within Galaxy
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
IRB Galaxy CloudMan radionica
IRB Galaxy CloudMan radionicaIRB Galaxy CloudMan radionica
IRB Galaxy CloudMan radionica
 
GCC 2014 scriptable workshop
GCC 2014 scriptable workshopGCC 2014 scriptable workshop
GCC 2014 scriptable workshop
 
Data analysis with Galaxy on the Cloud
Data analysis with Galaxy on the CloudData analysis with Galaxy on the Cloud
Data analysis with Galaxy on the Cloud
 
Galaxy workshop
Galaxy workshopGalaxy workshop
Galaxy workshop
 
CloudMan workshop
CloudMan workshopCloudMan workshop
CloudMan workshop
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

The pulse of cloud computing with bioinformatics as an example

  • 1. The Pulse of Cloud Computing with Bioinformatics as an example Nuwan Goonasekera† , Enis Afgan* † University of Melbourne, Melbourne Bioinformatics, Australia * Johns Hopkins University, Taylor Lab, USA @ University of Colombo Feb 2017
  • 2. The answer to everything?
  • 3. Overview • The key characteristics of Cloud Computing • Using Cloud Computing for bioinformatics Source: http://dilbert.com/strips/comic/2012-05-25/
  • 4. A modern data-center Source: http://www.businessinsider.com/google-data-centers-2014-10?op=1
  • 5. Data center use before cloud computing source: http://www.rackspace.com/knowledge_center/whitepaper/revolution-not-evolution-how-cloud-computing-differs-from-traditional-it-and-why-it
  • 6. Cloud Computing: A Definition • NIST definition: “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” » National Institute of Standards and Technology (http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf)
  • 7. The Cloud Model Private Community Public Hybrid Deployment Models Delivery Models Essential Characteristics Software as a Service (SaaS) Platform as a Service (PaaS) Infrastructure as a Service (IaaS) • On-demand self-service • Broad network access • Resource pooling • Rapid elasticity • Measured service
  • 9. Infrastructure-as-a-Service (IaaS) • Amazon Web Services (Market leader) • Rackspace Cloud • NeCTAR/OpenStack Research Cloud • Joyent Cloud • GoGrid • FlexiScale
  • 10. Public PaaS Examples Cloud Name Language and Developer Tools Programming Models Supported by Provider Target Applications and Storage Options Google App Engine Python, Java, Go, PHP + JVM languages (scala, groovy, jruby) MapReduce, Web, DataStore, Storage and other APIs Web applications and BigTable storage Salesforce.com’s Force.com Apex, Eclipsed-based IDE, web-based wizard Workflow, excel-like formula, web programming Business applications such as CRM Microsoft Azure .NET, Visual Studio, Azure tools Unrestricted model Enterprise and web apps Amazon Elastic MapReduce Hive, Pig, Java, Ruby etc. MapReduce Data processing and e-commerce Aneka .NET, stand-alone SDK Threads, task, MapReduce .NET enterprise applications, HPC
  • 11. Public SaaS examples • Gmail • Sharepoint • Salesforce.com CRM • On-live • Gaikai • Microsoft Office 365 • Some definitions include those that do not require payment. E.g. ad-supported sites
  • 12. Things we find most interesting • Accessibility • Infrastructure as code • Elasticity • Programming models that fit the cloud
  • 13. Accessibility ● Global availability via public clouds ● On-demand self-service ● A platform for democratisation of computing ● Access is enabled via point-and-click interfaces (blends with the Internet)
  • 14. Infrastructure as Code • Programmable • Captures knowledge • DevOps
  • 15. Elasticity • Rapidly expand and shrink based on demand • “Infinite” scaling • Cost-driven architecture • Ties in with infrastructure-as-code
  • 16. Programming models that fit the cloud • Fault-tolerant models • Massively scalable • Distributed algorithms
  • 17. Cloud computing is a valuable resource - but what do we use it for?
  • 18. Bioinformatics A multi-disciplinary science using computers for acquiring, managing and analyzing biological data. It is a data-driven science. It is a tool for genomics research. Biology Medicine Math & Physics Computer Science Bioinformatics
  • 19. Genomics Oxford dictionaries “The branch of molecular biology concerned with the structure, function, evolution, and mapping of genomes.” Where are the genes and other interesting pieces? How do sequences change over evolutionary time? What does all the DNA do? What are the physical shapes of the genome and its products?
  • 20. Genomics: contrast with biology and genetics Biology and genetics Targeted studies of one or a few genes Targeted, low-throughput experiments Clever experimental design, painstaking experimentation Genomics Studies considering all genes in a genome Global, high-throughput experiments Tons of data, uncertainty, computation scope technology hard part * Everything on this slide is a generalization
  • 21. Where is genomics used? Basic science ● What is the DNA sequence of the genome? ● Where are the genes? ● What does all the DNA in the genome do? ● How did history shape our ethnicities and populations? Medicine ● What’s the difference between DNA in a tumor vs DNA in healthy tissue? ● Can genomic data help predict what drugs might be appropriate for: ○ a particular cancer patient? ○ a particular genetic disorder? ● Can genomic data help us predict what flu strains will prevail next year?
  • 22. Genome Oxford dictionaries “The complete set of genes or genetic material present in a cell or organism.” “Blueprint” or “recipe” of life. Self-copying store of read-only information about how to develop and maintain an organism.
  • 23. Where do genomes live? All the trillions of cells in a person have same genomic DNA in the nucleus. Picture source: https://publications.nigms.nih.gov/insidethecell/preface.html Genome
  • 24. How do we obtain genome data? Sequencing! First methods developed in the mid-1970’s, called Sanger sequencing. In the 1990’s, the international Human Genome Project took 13 years to sequence the human genome. In the 2000’s, massively parallel Next Generation Sequencers (NGS) were developed that took days to sequence a human genome at a much lesser cost. Today, nanopore sequencers are emerging, offering real time sequencing. There are many public data repositories with free access to data (e.g., TCGA, 1000 genomes, GenBank).
  • 25. Two unrelated humans have genomes that are ~99.8% similar by sequence. There are about 3-4 million differences. Most are small, e.g. Single Nucleotide Polymorphisms (SNPs). Human and chimpanzee genomes are about 96% similar. Genome variation
  • 26. Apply data transformations to extract useful information This is not always a well-defined process This is typically done with existing tools, or by developing one’s own Tools can be chained into workflows Making sense of the data through data manipulations
  • 27. What does all of this have to do with Cloud Computing?
  • 28.
  • 30.
  • 33. Results Raw data Some computers + reliable persistent data storage + bioinf tools + reference data + workflow system 100-1000's GB few GB Indexed genomes 10-100's GB Aug Sep Oct Nov ... A real-world infrastructure requirements
  • 34. A Data analysis and integration tool A (free for everyone) web service integrating a wealth of tools, compute resources, terabytes of reference data and permanent storage Open source software that makes integrating your own tools and data and customizing for your own site simple
  • 36. Three ways to use Galaxy 1. Download and run locally 2. Public website (http://usegalaxy.org) 3. Run on the Cloud
  • 37. Bringing cloud resources to genomics Cloud resources need to be provisioned and configured for use in genomics. A Cloud Manager that orchestrates all of the steps required to provision, manage, and share a compute platform on a cloud infrastructure, all through a web browser.
  • 38.
  • 39. Accessibility Get started at https://launch.usegalaxy.org/
  • 41. Manage it programmatically Create a new CloudMan compute cluster Manage an existing CloudMan instance
  • 42. How is it all achieved?
  • 43. Architectural stack CloudLaunch.usegalaxy.org C L O U D A P P S CloudBridge CloudMan cloudbridge.readthedocs.org github.com/gvlproject/cloudbridge beta.launch.usegalaxy.org github.com/galaxyproject/cloudlaunch-ui github.com/galaxyproject/cloudlaunch wiki.galaxyproject.org/CloudMan github.com/galaxyproject/cloudman
  • 46. Everything talked about here is an effort from a large community! Come talk to us; get involved. enis.afgan@jhu.edu or nuwan.goonasekera@unimelb.edu.au