2. A Galaxy Instance
Storage resourcesCompute resources
Personal Computer Institutional Cluster Galaxy on Cloud
e.g., AWS, Azure
How to distribute data on user-owned cloud-based resources, serving two goals:
[essentially] unlimited storage
joint data analysis
3. A Galaxy Instance
Admin
Storage resources
■ Local ■ NAS ■ Cloud
Storage resources configuration:
Where to store data?
[Advanced] How to distribute data?
5. User 1
A1 A2
User 2
B1 B2
A1
B1
B2
A2
40% 85%
Persistence media
setup
is transparent to
an end-user
■ Local ■ NAS ■ Cloud
Admin
A Galaxy Instance
Storage resources
7. Two challenges with this model that you’ll face … sooner or later … guaranteed!!
1. Genomical data is competing with astronomical data for the biggest big data
problem of mankind title … and … genomics is performing promisingly!!
Stephens, Zachary D., et al. "Big data: astronomical or
genomical?." PLoS biology 13.7 (2015): e1002195.
8. Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, et al. (2015) Big Data: Astronomical or Genomical?. PLOS Biology 13(7): e1002195.
https://doi.org/10.1371/journal.pbio.1002195
9. A1
B1
B2
A2
■ Local ■ NAS ■ Cloud
Admin
A Galaxy Instance
Storage resources
100% 100%
User 1
A1 A2
User 2
B1 B2
10. Two challenges with this model that you’ll face … sooner or later … guaranteed!!
1. Genomical data is competing with astronomical data for the biggest big data
problem of mankind title … and … genomics is performing promisingly!!
Stephens, Zachary D., et al. "Big data: astronomical or
genomical?." PLoS biology 13.7 (2015): e1002195.
1. Joint data analysis is difficult with data scattered on disconnected storages.
13. A Galaxy Instance
Storage resources
User 1
40%
User-owned cloud-based storage
Upload
data from a Galaxy history to cloud
Download
data from cloud to a Galaxy history
15. Back-end
from 10km
API
Galaxy
Upload Download
History ID, Provider,
Bucket, Credentials,
Dataset IDs
Payload
History ID, Provider,
Bucket, Credentials,
Object
Payload
CloudBridge
Azure
BLOB
AWS
S3
OpenStack
Swift
16. Back-end
from 5km
API
Galaxy
Download
History ID, Provider,
Bucket, Credentials,
Object
Payload
CloudBridge
Validate payload
Establish a connection to the
specified provider
Cache the object
Persist the object
Create a dataset for the
download object
Add the dataset to the history
Delete cached object
The info of the created
dataset in JSON
Azure
BLOB
AWS
S3
OpenStack
Swift
17. Back-end
from 5km
API
Galaxy
Upload
History ID, Provider,
Bucket, Credentials,
Dataset IDs
Payload
CloudBridge
Validate payload
Establish a connection to the
specified provider
Any dataset IDs
given?
Upload the specified datasets
Upload all the datasets in the
specified history
YesNo
A message of
successful upload
Azure
BLOB
AWS
S3
OpenStack
Swift
18. Back-end
from 10km
API
Galaxy
Upload Download
Payload Payload
CloudBridgeDo NOT share your credentials!
&
We will NOT ask for your credentials!
History ID, Provider,
Bucket, Credentials,
Dataset IDs
History ID, Provider,
Bucket, Credentials,
Object
Azure
BLOB
AWS
S3
OpenStack
Swift
19.
20. CloudBridge
Upload
API
Download
Back-end
from 10km
History ID, Provider,
Bucket, Credentials,
Dataset IDs
Payload
History ID, Provider,
Bucket, Credentials,
Object
Payload
CloudAuthz OpenID Connect
Azure
BLOB
AWS
S3
OpenStack
Swift
OIDC ID Token
Galaxy
Cloud Access Tokens
22. Galaxy Main Private Servers Public Servers Servers Cloud Galaxy Appliance
User
∞∞∞∞ ∞
Conclusion
Feature:
Upload and Download your
Galaxy datasets to and
from cloud-based storages
without sharing your
credentials.
Azure
BLOB
AWS
S3
OpenStack
Swift
Bonus:
23. Azure
BLOB
AWS
S3
OpenStack
Swift
Galaxy Main Private Servers Public Servers Servers Cloud Galaxy Appliance
User
∞∞∞∞ ∞
Conclusion
Applications:
Theoretically unlimited
storage.
Simplified joint data analysis.
Simplified data sharing across
different Galaxy instances and
third-party applications.
Azure
BLOB
AWS
S3
OpenStack
Swift
24. S3
Future work
A Galaxy Instance
Storage resources
User-owned cloud-based storage
Upload Download
Step #2:
Plug-your-own-media
(User-Based ObjectStore)
[WIP] Open PR #4840
S3BLOB
26. Thanks
Vahid Jalili
Enis Afgan
Nuwan Goonasekera
Dannon Baker
Jeremy Goecks
The “Core” Galaxy team and the community
Supported by the NHGRI (HG005542, HG004909, HG005133, HG006620), NSF (DBI-0850103, DBI-1661497),
Penn State University, Johns Hopkins University, Oregon Health and Science University, and the
Pennsylvania Department of Public Health