SlideShare une entreprise Scribd logo
1  sur  26
Storage Media Federation
for Galaxy
A Galaxy Instance
Storage resourcesCompute resources
Personal Computer Institutional Cluster Galaxy on Cloud
e.g., AWS, Azure
How to distribute data on user-owned cloud-based resources, serving two goals:
[essentially] unlimited storage
joint data analysis
A Galaxy Instance
Admin
Storage resources
■ Local ■ NAS ■ Cloud
Storage resources configuration:
Where to store data?
[Advanced] How to distribute data?
Configure Galaxy to distribute data on multiple persistence media
.
├── CITATION
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── CONTRIBUTORS.md
├── LICENSE.txt
├── Makefile
├── README.rst
├── SECURITY_POLICY.md
├── client
├── config
│ ├── ...
│ ├── object_store_conf.xml
│ ├── ...
<?xml version="1.0"?>
<object_store type="hierarchical">
<backends>
<object_store type="disk" id="primary" order="0">
<files_dir path="..."/>
</object_store>
<object_store type="s3" id="secondary" order="1">
<auth access_key="..." secret_key="..." />
<bucket name="..."/>
</object_store>
<object_store type="azure_blob" id="tertiary" order="2">
<auth account_name="..." account_key="..." />
<container name="..."/>
</object_store>
</backends>
</object_store>
User 1
A1 A2
User 2
B1 B2
A1
B1
B2
A2
40% 85%
Persistence media
setup
is transparent to
an end-user
■ Local ■ NAS ■ Cloud
Admin
A Galaxy Instance
Storage resources
A1
B1
B2
A2
Persistence media
setup
is transparent to
an end-user
■ Local ■ NAS ■ Cloud
Admin
A Galaxy Instance
Storage resources
File System
User 1
A1 A2
User 2
B1 B2
40% 85%
History History
Two challenges with this model that you’ll face … sooner or later … guaranteed!!
1. Genomical data is competing with astronomical data for the biggest big data
problem of mankind title … and … genomics is performing promisingly!!
Stephens, Zachary D., et al. "Big data: astronomical or
genomical?." PLoS biology 13.7 (2015): e1002195.
Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, et al. (2015) Big Data: Astronomical or Genomical?. PLOS Biology 13(7): e1002195.
https://doi.org/10.1371/journal.pbio.1002195
A1
B1
B2
A2
■ Local ■ NAS ■ Cloud
Admin
A Galaxy Instance
Storage resources
100% 100%
User 1
A1 A2
User 2
B1 B2
Two challenges with this model that you’ll face … sooner or later … guaranteed!!
1. Genomical data is competing with astronomical data for the biggest big data
problem of mankind title … and … genomics is performing promisingly!!
Stephens, Zachary D., et al. "Big data: astronomical or
genomical?." PLoS biology 13.7 (2015): e1002195.
1. Joint data analysis is difficult with data scattered on disconnected storages.
Galaxy Instance 1
Storage resources
A1 A3
A2
Galaxy Instance 2
Storage resources
B1
User 1
A1
A2
Tool 1 A3
User 2
B1
Tool 2
Aha! Solution!
Federated Storage Resources!
Step #1:
Integrate with a user-owned cloud-based storage
A Galaxy Instance
Storage resources
User 1
40%
User-owned cloud-based storage
Upload
data from a Galaxy history to cloud
Download
data from cloud to a Galaxy history
Galaxy Instance 1
Storage resources
A1 A3
A2
Galaxy Instance 2
Storage resources
B1
User 1
A1
A2
Tool 1 A3
User 2
B1
Tool 2
User-owned cloud-based storage
Back-end
from 10km
API
Galaxy
Upload Download
History ID, Provider,
Bucket, Credentials,
Dataset IDs
Payload
History ID, Provider,
Bucket, Credentials,
Object
Payload
CloudBridge
Azure
BLOB
AWS
S3
OpenStack
Swift
Back-end
from 5km
API
Galaxy
Download
History ID, Provider,
Bucket, Credentials,
Object
Payload
CloudBridge
Validate payload
Establish a connection to the
specified provider
Cache the object
Persist the object
Create a dataset for the
download object
Add the dataset to the history
Delete cached object
The info of the created
dataset in JSON
Azure
BLOB
AWS
S3
OpenStack
Swift
Back-end
from 5km
API
Galaxy
Upload
History ID, Provider,
Bucket, Credentials,
Dataset IDs
Payload
CloudBridge
Validate payload
Establish a connection to the
specified provider
Any dataset IDs
given?
Upload the specified datasets
Upload all the datasets in the
specified history
YesNo
A message of
successful upload
Azure
BLOB
AWS
S3
OpenStack
Swift
Back-end
from 10km
API
Galaxy
Upload Download
Payload Payload
CloudBridgeDo NOT share your credentials!
&
We will NOT ask for your credentials!
History ID, Provider,
Bucket, Credentials,
Dataset IDs
History ID, Provider,
Bucket, Credentials,
Object
Azure
BLOB
AWS
S3
OpenStack
Swift
CloudBridge
Upload
API
Download
Back-end
from 10km
History ID, Provider,
Bucket, Credentials,
Dataset IDs
Payload
History ID, Provider,
Bucket, Credentials,
Object
Payload
CloudAuthz OpenID Connect
Azure
BLOB
AWS
S3
OpenStack
Swift
OIDC ID Token
Galaxy
Cloud Access Tokens
CloudAuthz
github.com/galaxyproject
/
cloudauthz
API
Download
Merged PR #4474
Open PR #5903
& more to be PRed
Open PR #5835
API
Upload
Open PR #6078
At the moment, all features are accessible via APIs, and
UI is under development
Galaxy Main Private Servers Public Servers Servers Cloud Galaxy Appliance
User
∞∞∞∞ ∞
Conclusion
Feature:
Upload and Download your
Galaxy datasets to and
from cloud-based storages
without sharing your
credentials.
Azure
BLOB
AWS
S3
OpenStack
Swift
Bonus:
Azure
BLOB
AWS
S3
OpenStack
Swift
Galaxy Main Private Servers Public Servers Servers Cloud Galaxy Appliance
User
∞∞∞∞ ∞
Conclusion
Applications:
Theoretically unlimited
storage.
Simplified joint data analysis.
Simplified data sharing across
different Galaxy instances and
third-party applications.
Azure
BLOB
AWS
S3
OpenStack
Swift
S3
Future work
A Galaxy Instance
Storage resources
User-owned cloud-based storage
Upload Download
Step #2:
Plug-your-own-media
(User-Based ObjectStore)
[WIP] Open PR #4840
S3BLOB
BLOB
Step #2:
Plug-your-own-media
(User-Based ObjectStore)
Upload Download
Future work
A Galaxy Instance
Storage resources
DB
User-owned cloud-based storage
Corresponds to a
dataset in DB
Uploaded
Online
Offline
[WIP] Open PR #4840
S3
S3
Thanks
Vahid Jalili
Enis Afgan
Nuwan Goonasekera
Dannon Baker
Jeremy Goecks
The “Core” Galaxy team and the community
Supported by the NHGRI (HG005542, HG004909, HG005133, HG006620), NSF (DBI-0850103, DBI-1661497),
Penn State University, Johns Hopkins University, Oregon Health and Science University, and the
Pennsylvania Department of Public Health

Contenu connexe

Tendances

Big Data Expo 2015 - Gigaspaces Making Sense of it all
Big Data Expo 2015 - Gigaspaces Making Sense of it allBig Data Expo 2015 - Gigaspaces Making Sense of it all
Big Data Expo 2015 - Gigaspaces Making Sense of it allBigDataExpo
 
Ensuring data quality with lakeFS
Ensuring data quality with lakeFSEnsuring data quality with lakeFS
Ensuring data quality with lakeFSPaul Singman
 
OGCE TG09 Tech Track Presentation
OGCE TG09 Tech Track PresentationOGCE TG09 Tech Track Presentation
OGCE TG09 Tech Track Presentationmarpierc
 
Unlock cassandra data for application developers using graphQL
Unlock cassandra data for application developers using graphQLUnlock cassandra data for application developers using graphQL
Unlock cassandra data for application developers using graphQLCédrick Lunven
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 
OGCE TeraGrid 2010 ASTA Support
OGCE TeraGrid 2010 ASTA SupportOGCE TeraGrid 2010 ASTA Support
OGCE TeraGrid 2010 ASTA Supportmarpierc
 
Redis Streams plus Spark Structured Streaming
Redis Streams plus Spark Structured StreamingRedis Streams plus Spark Structured Streaming
Redis Streams plus Spark Structured StreamingDave Nielsen
 
University of Oxford: building a next generation SIEM
University of Oxford: building a next generation SIEMUniversity of Oxford: building a next generation SIEM
University of Oxford: building a next generation SIEMElasticsearch
 
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark ProcessingBulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark ProcessingSpark Summit
 
Overview of Zookeeper, Helix and Kafka (Oakjug)
Overview of Zookeeper, Helix and Kafka (Oakjug)Overview of Zookeeper, Helix and Kafka (Oakjug)
Overview of Zookeeper, Helix and Kafka (Oakjug)Chris Richardson
 
Pachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On KubernetesPachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On KubernetesKubeAcademy
 
Backup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesBackup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesLeandro Totino Pereira
 
OGCE MSI Presentation
OGCE MSI PresentationOGCE MSI Presentation
OGCE MSI Presentationmarpierc
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...Imply
 
A Sober Look at Machine Learning
A Sober Look at Machine LearningA Sober Look at Machine Learning
A Sober Look at Machine LearningSven Krasser
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack IntroductionVikram Shinde
 
AWS IoT 핸즈온 워크샵 - 실습 4. Device Failure 상황 처리하기 (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 4. Device Failure 상황 처리하기 (김무현 솔루션즈 아키텍트)AWS IoT 핸즈온 워크샵 - 실습 4. Device Failure 상황 처리하기 (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 4. Device Failure 상황 처리하기 (김무현 솔루션즈 아키텍트)Amazon Web Services Korea
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and BotsHow TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and BotsImply
 

Tendances (20)

Big Data Expo 2015 - Gigaspaces Making Sense of it all
Big Data Expo 2015 - Gigaspaces Making Sense of it allBig Data Expo 2015 - Gigaspaces Making Sense of it all
Big Data Expo 2015 - Gigaspaces Making Sense of it all
 
Elastic Stack Roadmap
Elastic Stack RoadmapElastic Stack Roadmap
Elastic Stack Roadmap
 
Ensuring data quality with lakeFS
Ensuring data quality with lakeFSEnsuring data quality with lakeFS
Ensuring data quality with lakeFS
 
OGCE TG09 Tech Track Presentation
OGCE TG09 Tech Track PresentationOGCE TG09 Tech Track Presentation
OGCE TG09 Tech Track Presentation
 
Unlock cassandra data for application developers using graphQL
Unlock cassandra data for application developers using graphQLUnlock cassandra data for application developers using graphQL
Unlock cassandra data for application developers using graphQL
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
OGCE TeraGrid 2010 ASTA Support
OGCE TeraGrid 2010 ASTA SupportOGCE TeraGrid 2010 ASTA Support
OGCE TeraGrid 2010 ASTA Support
 
Redis Streams plus Spark Structured Streaming
Redis Streams plus Spark Structured StreamingRedis Streams plus Spark Structured Streaming
Redis Streams plus Spark Structured Streaming
 
University of Oxford: building a next generation SIEM
University of Oxford: building a next generation SIEMUniversity of Oxford: building a next generation SIEM
University of Oxford: building a next generation SIEM
 
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark ProcessingBulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
 
Overview of Zookeeper, Helix and Kafka (Oakjug)
Overview of Zookeeper, Helix and Kafka (Oakjug)Overview of Zookeeper, Helix and Kafka (Oakjug)
Overview of Zookeeper, Helix and Kafka (Oakjug)
 
Pachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On KubernetesPachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On Kubernetes
 
Backup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipesBackup multi-cloud solution based on named pipes
Backup multi-cloud solution based on named pipes
 
OGCE MSI Presentation
OGCE MSI PresentationOGCE MSI Presentation
OGCE MSI Presentation
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
 
A Sober Look at Machine Learning
A Sober Look at Machine LearningA Sober Look at Machine Learning
A Sober Look at Machine Learning
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
 
AWS IoT 핸즈온 워크샵 - 실습 4. Device Failure 상황 처리하기 (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 4. Device Failure 상황 처리하기 (김무현 솔루션즈 아키텍트)AWS IoT 핸즈온 워크샵 - 실습 4. Device Failure 상황 처리하기 (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 4. Device Failure 상황 처리하기 (김무현 솔루션즈 아키텍트)
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and BotsHow TrafficGuard uses Druid to Fight Ad Fraud and Bots
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
 

Similaire à Federated Storage Resources GCC2018 https://vimeo.com/291738189

Kazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep diveKazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep diveKazoup
 
Getting Started with Managed Services | AWS Public Sector Summit 2016
Getting Started with Managed Services | AWS Public Sector Summit 2016Getting Started with Managed Services | AWS Public Sector Summit 2016
Getting Started with Managed Services | AWS Public Sector Summit 2016Amazon Web Services
 
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...Docker, Inc.
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldRob Gillen
 
Taming the Cloud Database with Apache jclouds
Taming the Cloud Database with Apache jcloudsTaming the Cloud Database with Apache jclouds
Taming the Cloud Database with Apache jcloudszshoylev
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
Architecting Cloud Apps
Architecting Cloud AppsArchitecting Cloud Apps
Architecting Cloud Appsjineshvaria
 
JavaOne 2014: Taming the Cloud Database with jclouds
JavaOne 2014: Taming the Cloud Database with jcloudsJavaOne 2014: Taming the Cloud Database with jclouds
JavaOne 2014: Taming the Cloud Database with jcloudszshoylev
 
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014zshoylev
 
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...Amazon Web Services
 
Introduction to Azure Cloud Storage
Introduction to Azure Cloud StorageIntroduction to Azure Cloud Storage
Introduction to Azure Cloud StorageGanga R Jaiswal
 
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open SourceLiberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open SourceIsaac Christoffersen
 
Horizontal scaling with Galaxy
Horizontal scaling with GalaxyHorizontal scaling with Galaxy
Horizontal scaling with GalaxyEnis Afgan
 
Cloud computing and bioinformatics
Cloud computing and bioinformaticsCloud computing and bioinformatics
Cloud computing and bioinformaticsEnis Afgan
 
Private Cloud Storage via Open Source
Private Cloud Storage via Open SourcePrivate Cloud Storage via Open Source
Private Cloud Storage via Open SourceIsaac Christoffersen
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryAmazon Web Services
 
From VMs to Containers: Decompose and Migrate Old Legacy JavaEE Application
From VMs to Containers: Decompose and Migrate Old Legacy JavaEE ApplicationFrom VMs to Containers: Decompose and Migrate Old Legacy JavaEE Application
From VMs to Containers: Decompose and Migrate Old Legacy JavaEE ApplicationJelastic Multi-Cloud PaaS
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryAmazon Web Services
 

Similaire à Federated Storage Resources GCC2018 https://vimeo.com/291738189 (20)

Kazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep diveKazoup software appliance - A technical deep dive
Kazoup software appliance - A technical deep dive
 
Getting Started with Managed Services | AWS Public Sector Summit 2016
Getting Started with Managed Services | AWS Public Sector Summit 2016Getting Started with Managed Services | AWS Public Sector Summit 2016
Getting Started with Managed Services | AWS Public Sector Summit 2016
 
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
DCEU 18: Use Cases and Practical Solutions for Docker Container Storage on Sw...
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The Field
 
Taming the Cloud Database with Apache jclouds
Taming the Cloud Database with Apache jcloudsTaming the Cloud Database with Apache jclouds
Taming the Cloud Database with Apache jclouds
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Architecting Cloud Apps
Architecting Cloud AppsArchitecting Cloud Apps
Architecting Cloud Apps
 
JavaOne 2014: Taming the Cloud Database with jclouds
JavaOne 2014: Taming the Cloud Database with jcloudsJavaOne 2014: Taming the Cloud Database with jclouds
JavaOne 2014: Taming the Cloud Database with jclouds
 
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014
 
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
 
Introduction to Azure Cloud Storage
Introduction to Azure Cloud StorageIntroduction to Azure Cloud Storage
Introduction to Azure Cloud Storage
 
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open SourceLiberate Your Files with a Private Cloud Storage Solution powered by Open Source
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
 
Horizontal scaling with Galaxy
Horizontal scaling with GalaxyHorizontal scaling with Galaxy
Horizontal scaling with Galaxy
 
Cloud computing and bioinformatics
Cloud computing and bioinformaticsCloud computing and bioinformatics
Cloud computing and bioinformatics
 
Private Cloud Storage via Open Source
Private Cloud Storage via Open SourcePrivate Cloud Storage via Open Source
Private Cloud Storage via Open Source
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
From VMs to Containers: Decompose and Migrate Old Legacy JavaEE Application
From VMs to Containers: Decompose and Migrate Old Legacy JavaEE ApplicationFrom VMs to Containers: Decompose and Migrate Old Legacy JavaEE Application
From VMs to Containers: Decompose and Migrate Old Legacy JavaEE Application
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 

Dernier

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 

Dernier (20)

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 

Federated Storage Resources GCC2018 https://vimeo.com/291738189

  • 2. A Galaxy Instance Storage resourcesCompute resources Personal Computer Institutional Cluster Galaxy on Cloud e.g., AWS, Azure How to distribute data on user-owned cloud-based resources, serving two goals: [essentially] unlimited storage joint data analysis
  • 3. A Galaxy Instance Admin Storage resources ■ Local ■ NAS ■ Cloud Storage resources configuration: Where to store data? [Advanced] How to distribute data?
  • 4. Configure Galaxy to distribute data on multiple persistence media . ├── CITATION ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── CONTRIBUTORS.md ├── LICENSE.txt ├── Makefile ├── README.rst ├── SECURITY_POLICY.md ├── client ├── config │ ├── ... │ ├── object_store_conf.xml │ ├── ... <?xml version="1.0"?> <object_store type="hierarchical"> <backends> <object_store type="disk" id="primary" order="0"> <files_dir path="..."/> </object_store> <object_store type="s3" id="secondary" order="1"> <auth access_key="..." secret_key="..." /> <bucket name="..."/> </object_store> <object_store type="azure_blob" id="tertiary" order="2"> <auth account_name="..." account_key="..." /> <container name="..."/> </object_store> </backends> </object_store>
  • 5. User 1 A1 A2 User 2 B1 B2 A1 B1 B2 A2 40% 85% Persistence media setup is transparent to an end-user ■ Local ■ NAS ■ Cloud Admin A Galaxy Instance Storage resources
  • 6. A1 B1 B2 A2 Persistence media setup is transparent to an end-user ■ Local ■ NAS ■ Cloud Admin A Galaxy Instance Storage resources File System User 1 A1 A2 User 2 B1 B2 40% 85% History History
  • 7. Two challenges with this model that you’ll face … sooner or later … guaranteed!! 1. Genomical data is competing with astronomical data for the biggest big data problem of mankind title … and … genomics is performing promisingly!! Stephens, Zachary D., et al. "Big data: astronomical or genomical?." PLoS biology 13.7 (2015): e1002195.
  • 8. Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, et al. (2015) Big Data: Astronomical or Genomical?. PLOS Biology 13(7): e1002195. https://doi.org/10.1371/journal.pbio.1002195
  • 9. A1 B1 B2 A2 ■ Local ■ NAS ■ Cloud Admin A Galaxy Instance Storage resources 100% 100% User 1 A1 A2 User 2 B1 B2
  • 10. Two challenges with this model that you’ll face … sooner or later … guaranteed!! 1. Genomical data is competing with astronomical data for the biggest big data problem of mankind title … and … genomics is performing promisingly!! Stephens, Zachary D., et al. "Big data: astronomical or genomical?." PLoS biology 13.7 (2015): e1002195. 1. Joint data analysis is difficult with data scattered on disconnected storages.
  • 11. Galaxy Instance 1 Storage resources A1 A3 A2 Galaxy Instance 2 Storage resources B1 User 1 A1 A2 Tool 1 A3 User 2 B1 Tool 2
  • 12. Aha! Solution! Federated Storage Resources! Step #1: Integrate with a user-owned cloud-based storage
  • 13. A Galaxy Instance Storage resources User 1 40% User-owned cloud-based storage Upload data from a Galaxy history to cloud Download data from cloud to a Galaxy history
  • 14. Galaxy Instance 1 Storage resources A1 A3 A2 Galaxy Instance 2 Storage resources B1 User 1 A1 A2 Tool 1 A3 User 2 B1 Tool 2 User-owned cloud-based storage
  • 15. Back-end from 10km API Galaxy Upload Download History ID, Provider, Bucket, Credentials, Dataset IDs Payload History ID, Provider, Bucket, Credentials, Object Payload CloudBridge Azure BLOB AWS S3 OpenStack Swift
  • 16. Back-end from 5km API Galaxy Download History ID, Provider, Bucket, Credentials, Object Payload CloudBridge Validate payload Establish a connection to the specified provider Cache the object Persist the object Create a dataset for the download object Add the dataset to the history Delete cached object The info of the created dataset in JSON Azure BLOB AWS S3 OpenStack Swift
  • 17. Back-end from 5km API Galaxy Upload History ID, Provider, Bucket, Credentials, Dataset IDs Payload CloudBridge Validate payload Establish a connection to the specified provider Any dataset IDs given? Upload the specified datasets Upload all the datasets in the specified history YesNo A message of successful upload Azure BLOB AWS S3 OpenStack Swift
  • 18. Back-end from 10km API Galaxy Upload Download Payload Payload CloudBridgeDo NOT share your credentials! & We will NOT ask for your credentials! History ID, Provider, Bucket, Credentials, Dataset IDs History ID, Provider, Bucket, Credentials, Object Azure BLOB AWS S3 OpenStack Swift
  • 19.
  • 20. CloudBridge Upload API Download Back-end from 10km History ID, Provider, Bucket, Credentials, Dataset IDs Payload History ID, Provider, Bucket, Credentials, Object Payload CloudAuthz OpenID Connect Azure BLOB AWS S3 OpenStack Swift OIDC ID Token Galaxy Cloud Access Tokens
  • 21. CloudAuthz github.com/galaxyproject / cloudauthz API Download Merged PR #4474 Open PR #5903 & more to be PRed Open PR #5835 API Upload Open PR #6078 At the moment, all features are accessible via APIs, and UI is under development
  • 22. Galaxy Main Private Servers Public Servers Servers Cloud Galaxy Appliance User ∞∞∞∞ ∞ Conclusion Feature: Upload and Download your Galaxy datasets to and from cloud-based storages without sharing your credentials. Azure BLOB AWS S3 OpenStack Swift Bonus:
  • 23. Azure BLOB AWS S3 OpenStack Swift Galaxy Main Private Servers Public Servers Servers Cloud Galaxy Appliance User ∞∞∞∞ ∞ Conclusion Applications: Theoretically unlimited storage. Simplified joint data analysis. Simplified data sharing across different Galaxy instances and third-party applications. Azure BLOB AWS S3 OpenStack Swift
  • 24. S3 Future work A Galaxy Instance Storage resources User-owned cloud-based storage Upload Download Step #2: Plug-your-own-media (User-Based ObjectStore) [WIP] Open PR #4840 S3BLOB
  • 25. BLOB Step #2: Plug-your-own-media (User-Based ObjectStore) Upload Download Future work A Galaxy Instance Storage resources DB User-owned cloud-based storage Corresponds to a dataset in DB Uploaded Online Offline [WIP] Open PR #4840 S3 S3
  • 26. Thanks Vahid Jalili Enis Afgan Nuwan Goonasekera Dannon Baker Jeremy Goecks The “Core” Galaxy team and the community Supported by the NHGRI (HG005542, HG004909, HG005133, HG006620), NSF (DBI-0850103, DBI-1661497), Penn State University, Johns Hopkins University, Oregon Health and Science University, and the Pennsylvania Department of Public Health