SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Automating
Research Data
Management
at Scale
Vas Vasiliadis
Greg Nawrocki
SGCI Webinar
September 9, 2020
Globus is …
a non-profit service
developed and operated by
Our mission is to…
increase the efficiency and
effectiveness of researchers
engaged in data-driven
science and scholarship
through sustainable software
4
Fast, reliable file transfer …from any to any system
User-initiated,
or automated
transfer request
1
Instrument,
Lab server
Compute
Facility
Globus transfers files
reliably, securely
2
Globally accessible
multi-tenant service
• Fire-and-forget transfers
• Optimized speed
• Assured reliability
• Unified view of storage
• Browser, REST API, CLI
v
Optional
notifications
3
Secure data sharing …from any storage
Collaborator logs into Globus
and accesses shared files;
no local account required;
download via Globus2
On-prem or public
cloud storage
Select files to share,
select user or group,
and set access
permissions
1Globally accessible
multi-tenant service
Globus controls
access to shared files
on existing storage
Laptop, server,
compute facility
• Fine-grained access
control “overlay” on
storage system
• Share with any
identity, email, group
• No need to stage
data just for sharing
v
Globus Connectors
ActiveScale
Object
Storage
Coming soon…
Globus Developed
Community Developed
Use(r)-appropriate interfaces
GET /endpoint/go%23ep1
PUT /endpoint/demodoc#my_endpt
200 OK
X-Transfer-API-Version: 0.10
Content-Type: application/json
…
Globus
service
Web
CLI
Platform
(RESTful APIs)
Globus Command Line Interface (CLI)
• Native application: docs.globus.org/cli
• Open source, uses Python SDK
• globus login – get access and refresh tokens
– Tokens stored locally in ~/.globus.cfg
• Service (transfer/auth) invocation uses tokens
• globus logout – delete tokens
docs.globus.org/cli/examples
Globus CLI Examples
11
Globus CLI Examples
12
Automated instrument data egress
Cryo EM
Lightsheet
Sequencer
ALS/APS
….
Local system
download
Remote analysis,
visualization
• Reliable, near-real time
data access
• Automatically set policy
based permissions
• Self-service access
control, management
• Federated login for
frictionless data access
Local
policy
store
--/cohort045
--/cohort096
--/cohort127
v
v
Repository data distribution
Bulk data
transfer
2
Search, request
data of interest
1
• Gateway/data portal/app
enables faceted search
• Enforces fine-grained
authorization
• HTTPS download for
“small” data
• Asynchronous transfer
for larger data sets
2
Browser based
download
Globally accessible
multi-tenant service
2
--/run123/output (r)
Output data staged
with access control
2
Data staging for compute
--/run123/input (rw)
Compute service
v
• User securely uploads data
for analysis
• Results available with fine-
grained permissions
• Automated setup/tear down
of folders, permissions
3 User accesses,
downloads results
1
Input data
upload
Automation Examples
• Syncing a directory
– bash script; calls the Globus CLI
– Python module; run as script or import as module
• Staging data for distribution
– bash and Python variants
• Removing directories after files are transferred
– Python script
16
github.com/globus/automation-examples
…but scripting only
gets us so far…
17
Globus platform services
• Identity and Access Management (IAM): Auth, Groups
• Data Services: Connect, Transfer, Manifest*
• Search
• Identifiers (collaboration with DataCite)
• Flows*
18
Globus automation platform
19
Globus Automate
A platform service for defining, applying, and sharing
distributed research automation flows
• Triggers start flows based on subscribed events
• Flows call Action Providers to perform tasks
Globus automation architecture
• Built on AWS Step Functions
– JSON-based state machine language
– Conditions, loops, fault tolerance, etc.
– Propagates state through the flow
• Standardized API for integrating
custom event and action services
– Actions: synchronous or asynchronous
– Custom Web forms prompt for user input
• Actions secured with Globus Auth
Automation Action Providers
Delete ACLs
Search
DLHub
User Form Notification
Expression
Evaluation
Describe
Web FormIdentifier
Transfer
Ingest
Xtract
funcX
Globus action
providers
Custom action
providers
funcX action provider
funcX: FaaS platform for HPC
• funcX endpoints deployed at resources
• Service routes requests to endpoints
• Parsl acquires resources
• Singularity containers run functions
• Globus Auth secures communication
funcX
Xtract action provider
Xtract: Derive rich metadata on-demand
Xtract
Automation
Examples
25
UChicago Kasthuri Lab: Brain aging and disease
• Construct connectomes—mapping of neuron connections
• Use APS synchrotron to rapidly image brains
– Beam time available once every few months
– ~20GB/minute for large (cm) unsectioned brains
• Generate segmented datasets/visualizations for the community
• Perform semi-standard reconstruction on all data across HPC
resources
Argonne JLSEUChicago
Argonne
Leadership
Computing
FacilityAPS
Publication7
Neurocartography
Imaging1
Lab Server 1
Acquisition2
Lab Server 2
Pre-processing3 Preview/Center4
Reconstruction6Visualization8
User validation5
Science!9
Neuroanatomy
reconstruction
pipeline
Automating neurocartography
Web form
User input
Search
Ingest
Share
Set policy
Identifier
Mint DOI
funcX
Auth
Get
credentials
Automate
Run job
Describe
Get
metadata
Transfer
Transfer
data
funcX
Run job
Transfer
Transfer
data
XPCS: X-ray Photon Correlation Spectroscopy
Data Portal Storage
Argonne
Leadership
Computing
Facility
Advanced Photon Source
Publication5
Imaging1
Lab Server
1
Acquisition2
Plot results4
XPCS-Eigen3
Science!6
Automating XPCS
Search
Ingest
funcX
Auth
Get
credentials
Automate
Plot
Results
Transfer
Transfer
HDF5
Transfer
Transfer
IMM
funcX
Run Corr
ACLs
Set policy
Transfer
Return
Results
Materials Data Facility
> 35 TB of data > 320 authors> 400 datasets
• Accept data from many
locations with flexible
interfaces
• Index dataset contents in
science-aware ways
• Dispatch data to the
community
• Using Automate to
simplify building
composable flows of
services
MDF Data Publication Automation
Ingest
Bulk
Ingest
Auth
Get
Credentials
Automate
Transfer
Transfer
Dataset
XTract
Extract
Metadata
Share
Set
permissions
Transfer
Move
metadata
Transfer
Transfer
Dataset
Transfers
Transfer
Dataset
Identifier
Mint DOI
Web form
Metadata
Notify
Notify
Curator
Web form
Curation
Notify
Notify
user
Petrel data services
• Data service providing simple, self-managed project-based
data and sharing capabilities (https://petreldata.net)
• Flexible user-managed search index and discovery portal
PaaS: develop custom action providers
• Directly use the platform to build and run
extensible flows
• Develop action providers
– Fit for purpose
– Developed and deployed by the project
– Plugged into their flows
• Action Provider Development toolkit
36
SaaS: instrument data management
• Templated solution
• Configurable…
– Set transfer triggers
– Select destination(s)
– Define metadata
• Extensible…
– Add/remove actions
– Change action providers
• No development required
Cryo EM
Lightsheet
Sequencer
….
Indexing for
search
Image reconstruction,
analysis, visualization
Automated egress
from device
--/cohort045
--/cohort096
--/cohort127
Transfer
funcXXtract
SaaS: Data Management Plans
• “Turnkey” DMP enablement
• Select dataset (collection)…
• …add metadata for indexing
• …generate persistent ID
(DOI, ARK, etc.)
38
Transfer
Identifier
Ingest
“Point & Click” to
findable and
accessible data
Thank you, funders...
U . S . D E P A R T M E N T O F
ENERGY
Thank you, subscribers!

Contenu connexe

Tendances

Tendances (20)

Introduction to the Globus Platform (APS Workshop)
Introduction to the Globus Platform (APS Workshop)Introduction to the Globus Platform (APS Workshop)
Introduction to the Globus Platform (APS Workshop)
 
Instrument Data Orchestration with Globus Search and Flows
Instrument Data Orchestration with Globus Search and FlowsInstrument Data Orchestration with Globus Search and Flows
Instrument Data Orchestration with Globus Search and Flows
 
GlobusWorld 2021 Tutorial: Building with the Globus Platform
GlobusWorld 2021 Tutorial: Building with the Globus PlatformGlobusWorld 2021 Tutorial: Building with the Globus Platform
GlobusWorld 2021 Tutorial: Building with the Globus Platform
 
Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)
 
Automating Research Data Flows with the Globus Command Line Interface (CLI)
Automating Research Data Flows with the Globus Command Line Interface (CLI)Automating Research Data Flows with the Globus Command Line Interface (CLI)
Automating Research Data Flows with the Globus Command Line Interface (CLI)
 
Gateways 2020 Tutorial - Introduction to Globus
Gateways 2020 Tutorial - Introduction to GlobusGateways 2020 Tutorial - Introduction to Globus
Gateways 2020 Tutorial - Introduction to Globus
 
Data Orchestration at Scale (GlobusWorld Tour West)
Data Orchestration at Scale (GlobusWorld Tour West)Data Orchestration at Scale (GlobusWorld Tour West)
Data Orchestration at Scale (GlobusWorld Tour West)
 
"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018
 
Globus Platform Overview
Globus Platform OverviewGlobus Platform Overview
Globus Platform Overview
 
Globus Command Line Interface (APS Workshop)
Globus Command Line Interface (APS Workshop)Globus Command Line Interface (APS Workshop)
Globus Command Line Interface (APS Workshop)
 
GlobusWorld 2021 Tutorial: Globus for System Administrators
GlobusWorld 2021 Tutorial: Globus for System AdministratorsGlobusWorld 2021 Tutorial: Globus for System Administrators
GlobusWorld 2021 Tutorial: Globus for System Administrators
 
Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
 
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
Recent Upgrades to ARM Data Transfer and Delivery Using GlobusRecent Upgrades to ARM Data Transfer and Delivery Using Globus
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data Staging
 
Globus status and publication plans
Globus status and publication plansGlobus status and publication plans
Globus status and publication plans
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17
 
Globus publication demo screenshots
Globus publication demo screenshotsGlobus publication demo screenshots
Globus publication demo screenshots
 

Similaire à Automating Research Data Management at Scale with Globus

Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
Kirill Osipov
 

Similaire à Automating Research Data Management at Scale with Globus (20)

Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
 
GlobusWorld 2021 Tutorial: Introduction to Globus
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobusWorld 2021 Tutorial: Introduction to Globus
GlobusWorld 2021 Tutorial: Introduction to Globus
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 Tutorial
 
Introduction to Globus (GlobusWorld Tour West)
Introduction to Globus (GlobusWorld Tour West)Introduction to Globus (GlobusWorld Tour West)
Introduction to Globus (GlobusWorld Tour West)
 
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
 
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
 
Globus presentation
Globus presentationGlobus presentation
Globus presentation
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
 
Managing Protected and Controlled Data with Globus
Managing Protected and Controlled Data with Globus Managing Protected and Controlled Data with Globus
Managing Protected and Controlled Data with Globus
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to Globus
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing Data
 
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Streamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchStreamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer research
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
 
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
 

Plus de Globus

Plus de Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Working with Globus Platform Services
Working with Globus Platform ServicesWorking with Globus Platform Services
Working with Globus Platform Services
 

Dernier

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Dernier (20)

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 

Automating Research Data Management at Scale with Globus

  • 1. Automating Research Data Management at Scale Vas Vasiliadis Greg Nawrocki SGCI Webinar September 9, 2020
  • 2. Globus is … a non-profit service developed and operated by
  • 3. Our mission is to… increase the efficiency and effectiveness of researchers engaged in data-driven science and scholarship through sustainable software
  • 4. 4
  • 5. Fast, reliable file transfer …from any to any system User-initiated, or automated transfer request 1 Instrument, Lab server Compute Facility Globus transfers files reliably, securely 2 Globally accessible multi-tenant service • Fire-and-forget transfers • Optimized speed • Assured reliability • Unified view of storage • Browser, REST API, CLI v Optional notifications 3
  • 6. Secure data sharing …from any storage Collaborator logs into Globus and accesses shared files; no local account required; download via Globus2 On-prem or public cloud storage Select files to share, select user or group, and set access permissions 1Globally accessible multi-tenant service Globus controls access to shared files on existing storage Laptop, server, compute facility • Fine-grained access control “overlay” on storage system • Share with any identity, email, group • No need to stage data just for sharing v
  • 9. Use(r)-appropriate interfaces GET /endpoint/go%23ep1 PUT /endpoint/demodoc#my_endpt 200 OK X-Transfer-API-Version: 0.10 Content-Type: application/json … Globus service Web CLI Platform (RESTful APIs)
  • 10. Globus Command Line Interface (CLI) • Native application: docs.globus.org/cli • Open source, uses Python SDK • globus login – get access and refresh tokens – Tokens stored locally in ~/.globus.cfg • Service (transfer/auth) invocation uses tokens • globus logout – delete tokens docs.globus.org/cli/examples
  • 13. Automated instrument data egress Cryo EM Lightsheet Sequencer ALS/APS …. Local system download Remote analysis, visualization • Reliable, near-real time data access • Automatically set policy based permissions • Self-service access control, management • Federated login for frictionless data access Local policy store --/cohort045 --/cohort096 --/cohort127 v
  • 14. v Repository data distribution Bulk data transfer 2 Search, request data of interest 1 • Gateway/data portal/app enables faceted search • Enforces fine-grained authorization • HTTPS download for “small” data • Asynchronous transfer for larger data sets 2 Browser based download Globally accessible multi-tenant service 2
  • 15. --/run123/output (r) Output data staged with access control 2 Data staging for compute --/run123/input (rw) Compute service v • User securely uploads data for analysis • Results available with fine- grained permissions • Automated setup/tear down of folders, permissions 3 User accesses, downloads results 1 Input data upload
  • 16. Automation Examples • Syncing a directory – bash script; calls the Globus CLI – Python module; run as script or import as module • Staging data for distribution – bash and Python variants • Removing directories after files are transferred – Python script 16 github.com/globus/automation-examples
  • 17. …but scripting only gets us so far… 17
  • 18. Globus platform services • Identity and Access Management (IAM): Auth, Groups • Data Services: Connect, Transfer, Manifest* • Search • Identifiers (collaboration with DataCite) • Flows* 18
  • 20. Globus Automate A platform service for defining, applying, and sharing distributed research automation flows • Triggers start flows based on subscribed events • Flows call Action Providers to perform tasks
  • 21. Globus automation architecture • Built on AWS Step Functions – JSON-based state machine language – Conditions, loops, fault tolerance, etc. – Propagates state through the flow • Standardized API for integrating custom event and action services – Actions: synchronous or asynchronous – Custom Web forms prompt for user input • Actions secured with Globus Auth
  • 22. Automation Action Providers Delete ACLs Search DLHub User Form Notification Expression Evaluation Describe Web FormIdentifier Transfer Ingest Xtract funcX Globus action providers Custom action providers
  • 23. funcX action provider funcX: FaaS platform for HPC • funcX endpoints deployed at resources • Service routes requests to endpoints • Parsl acquires resources • Singularity containers run functions • Globus Auth secures communication funcX
  • 24. Xtract action provider Xtract: Derive rich metadata on-demand Xtract
  • 26. UChicago Kasthuri Lab: Brain aging and disease • Construct connectomes—mapping of neuron connections • Use APS synchrotron to rapidly image brains – Beam time available once every few months – ~20GB/minute for large (cm) unsectioned brains • Generate segmented datasets/visualizations for the community • Perform semi-standard reconstruction on all data across HPC resources
  • 27. Argonne JLSEUChicago Argonne Leadership Computing FacilityAPS Publication7 Neurocartography Imaging1 Lab Server 1 Acquisition2 Lab Server 2 Pre-processing3 Preview/Center4 Reconstruction6Visualization8 User validation5 Science!9 Neuroanatomy reconstruction pipeline
  • 28. Automating neurocartography Web form User input Search Ingest Share Set policy Identifier Mint DOI funcX Auth Get credentials Automate Run job Describe Get metadata Transfer Transfer data funcX Run job Transfer Transfer data
  • 29. XPCS: X-ray Photon Correlation Spectroscopy Data Portal Storage Argonne Leadership Computing Facility Advanced Photon Source Publication5 Imaging1 Lab Server 1 Acquisition2 Plot results4 XPCS-Eigen3 Science!6
  • 31. Materials Data Facility > 35 TB of data > 320 authors> 400 datasets • Accept data from many locations with flexible interfaces • Index dataset contents in science-aware ways • Dispatch data to the community • Using Automate to simplify building composable flows of services
  • 32. MDF Data Publication Automation Ingest Bulk Ingest Auth Get Credentials Automate Transfer Transfer Dataset XTract Extract Metadata Share Set permissions Transfer Move metadata Transfer Transfer Dataset Transfers Transfer Dataset Identifier Mint DOI Web form Metadata Notify Notify Curator Web form Curation Notify Notify user
  • 33. Petrel data services • Data service providing simple, self-managed project-based data and sharing capabilities (https://petreldata.net) • Flexible user-managed search index and discovery portal
  • 34. PaaS: develop custom action providers • Directly use the platform to build and run extensible flows • Develop action providers – Fit for purpose – Developed and deployed by the project – Plugged into their flows • Action Provider Development toolkit 36
  • 35. SaaS: instrument data management • Templated solution • Configurable… – Set transfer triggers – Select destination(s) – Define metadata • Extensible… – Add/remove actions – Change action providers • No development required Cryo EM Lightsheet Sequencer …. Indexing for search Image reconstruction, analysis, visualization Automated egress from device --/cohort045 --/cohort096 --/cohort127 Transfer funcXXtract
  • 36. SaaS: Data Management Plans • “Turnkey” DMP enablement • Select dataset (collection)… • …add metadata for indexing • …generate persistent ID (DOI, ARK, etc.) 38 Transfer Identifier Ingest “Point & Click” to findable and accessible data
  • 37. Thank you, funders... U . S . D E P A R T M E N T O F ENERGY