SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Introduction to Globus:
Research Data Management
Software at the ALCF
Rick Wagner
rick@globus.org
rpwagner@uchicago.edu
rwagner@anl.gov
Research data management today
How do we...
...move?
...share?
...discover?
...reproduce?
Index?
3
Globus delivers…
Fast and reliable big data transfer,
sharing, and platform services…
…directly from your own storage
systems…
...via software-as-a-service using
existing identities with the overarching
goal of...
4
Research Computing HPC
Desktop Workstations
Mass Storage Instruments
Personal Resources
Public Cloud
National Resources
…unifying access to data across tiers
IBM Spectrum Scale
Current Planned
Storage Connectors - globus.org/connectors
Public / private cloud stores
External
campus
storage
EC2
Project
repositories,
replication stores
Public repositories
Share with collaborators/community
Analysis
store
Next-Gen Sequencer
MRI
Advanced Light Source
Personal system
Remote visualization
Light Sheet Microscope
High-durability,
low-cost store
Manage data from instruments
Cryo-EM
Use(r)-appropriate interfaces
8
GET /endpoint/go%23ep1
PUT /endpoint/vas#my_endpt
200 OK
X-Transfer-API-Version: 0.10
Content-Type: application/json
…
Globus service
Web
CLI
Rest
API
Globus SaaS / PaaS: Research data lifecycle
Researcher initiates
transfer request; or
requested automatically
by script, science
gateway
1
Instrument
Compute Facility
Globus transfers files
reliably, securely
2
Globus controls
access to shared
files on existing
storage; no need
to move files to
cloud storage!
4
Researcher
selects files to
share, selects
user or group,
and sets access
permissions
3
Collaborator logs in to
Globus and accesses
shared files; no local
account required;
download via Globus
5
Automating research
workflows and
ensuring those that
need access to the
data have it.
8
Personal Computer
Transfer
Share
• Use a Web browser or
platform services
• Access any storage
• Use an existing identity
Build
The Globus
Command Line
Interface, API sets,
and Python SDK
provide a platform…
6
… for building
science gateways,
portals and
publication services.
7
Conceptual architecture: Hybrid SaaS
DATA
Channel
CONTROL
Channel
Source
Endpoint
Destination
Endpoint
Subscriber owned
and administered
storage system
Globus
“client” software
No data relay or
staging via GlobusSubscriber
Control
Domain
Globus
Control
Domain
Single, globally accessible
multi-tenant service
Conceptual architecture: Sharing
Managed
Endpoint
Subscriber
Control
Domain
Globus
Control
Domain Globus managed
”overlay” permissions
Shared
Endpoint
DATA
Channel
CONTROL
Channel Subscriber managed
filesystem permissions
External User
Control
Domain
…makes your
storage system a
Globus endpoint
Endpoints (Collections)
• Storage abstraction
– All transfers happen between two endpoints
– Globus Connect instantiates endpoints
• Collection ~= Endpoint
• Test / Demo Endpoints
– Globus Tutorial Endpoint 1
– Globus Tutorial Endpoint 2
– ESnet Test Endpoints
o Contain file samples of various sizes
• Globus Connect Personal
– Now your laptop is an endpoint
– https://www.globus.org/globus-connect-personal
14
Globus Connect Personal
• Installers do not require admin access
• Zero configuration; auto updating
• Handles NATs
• Installs in seconds – easy to delete
The Globus Web App - Accounts
• A Globus Account is
– A Primary Identity
– Possible Linked Identities
• Linking Identities
• Managing Identities
• Consents
Demonstration
Identities
File Transfer
File Sharing
Activity Monitoring
• Recent / History / Filter
• Drilling Down
– File transfer statistics
– Overview
– Event Log
– Cancelling an active task
Groups
• What can they be used for?
– Sharing: Access permissions for more than one person
– Roles: Endpoint management and monitoring
• Groups
– Creating groups and setting the visibility
– Members (invitations), Subgroups, Settings
– Settings
o Policies / Membership Fields / Terms & Conditions
– Roles
o Giving others authority over your groups
Endpoint Sharing and Roles
• Sharing
– Select the directory and create the “share”
– A ”share” is another type of endpoint
– Share with: Users / Groups / All Globus Users
• Roles
– Giving others (or groups of others) control or monitoring rights
for your endpoints
Bookmarks
• Just like browser bookmarks – frequently used, or
maybe not used frequently enough!
• Creating a bookmark
• Using a bookmark
• Sorting and Filtering
• Editing and Deleting
Globus Command Line Interface
Open source, uses
Python SDK
docs.globus.org/cli
github.com/globus/
globus-cli
The Globus CLI
• Installation
– docs.globus.org/cli/installation
– Prerequisites
• Logging On (remember the consents?)
– globus login / logout
• Getting help / list of commands
– globus –help
– globus list-commands
• Doing something
– It all about the UUIDs
– Don’t forget the file paths!
The Globus CLI – Let’s do a few things…
• Find endpoints
– globus endpoint search Midway
– globus endpoint search ESNet
– globus endpoint search --filter-scope=recently-used
• Find endpoint contents
– globus ls af7bda53-6d04-11e5-ba46-22000b92c6ec
– globus ls af7bda53-6d04-11e5-ba46-22000b92c6ec:RMACC2018
• Transfer a file
– From ESnet Read-Only Test DTN at CERN to Midway
– Note the specific paths
– globus transfer d8eb36b6-6d04-11e5-ba46-22000b92c6ec:/~/data1/1M.dat af7bda53-6d04-11e5-
ba46-22000b92c6ec:/~/1M.dat
• Transfer a directory
– From Globus Tutorial Endpoint 2 to Midway (create directory and contents)
– globus transfer --recursive ddb59af0-6d04-11e5-ba46-22000b92c6ec:/~/sync-demo af7bda53-
6d04-11e5-ba46-22000b92c6ec:/~/syncDemo
• https://docs.globus.org/cli/examples/
Globus CLI
• Easy install and updates
• It’s a native application distributed by Globus
– https://docs.globus.org/cli/
– https://github.com/globus/globus-cli
• Command globus login gets access tokens and refresh tokens
– Stores the token locally (~/.globus.cfg )
– The CLI ”acts as” the logged in user
• All interactions with the service use the tokens
– Tokens for Globus Auth and Transfer services
• Command globus logout deletes those
• https://docs.globus.org/cli/examples/
• https://github.com/globus/automation-examples
Demonstration
Globus CLI
cloud4scieng.org
Industry software builds on platform services
28
Globus delivers… with applications and as a
platform…
Fast and reliable data transfer, sharing, and file
management…
…directly from your own storage systems…
...via software-as-a-service using existing
identities.
How can I integrate
Globus into my research
workflows?
29
Globus serves as…
A platform for building science
gateways, portals and other
web applications in support of
research and education.
30
Globus Auth API
(Group Management)
…
GlobusTransferAPI
GlobusConnect
Data Discovery
File Sharing
File Transfer & Replication
Globus Platform-as-a-Service
Use existing institutional
ID systems in external
web applications
Integrate file transfer and sharing
capabilities into scientific web
apps, portals, gateways, etc...
Example web apps that leverage Globus
32
Globus Transfer API Set
• Doc
– https://docs.globus.org/api/transfer/
• Sample data portal
– https://docs.globus.org/modern-research-
data-portal/
• Jupyter notebook
– https://github.com/globus/globus-jupyter-
notebooks
33
Globus Auth API Set
• Doc
– https://docs.globus.org/api/auth/
• Sample data portal
– https://docs.globus.org/modern-research-
data-portal/
• Native app examples
– https://github.com/globus/native-app-
examples
34
35
Petrel online store
https://petrel.alcf.anl.gov
94 Gbit/s Petrel—Blue Waters
Petrel:
A Programmatically Accessible
Research Data Service
3.2 petabytes
100 Gbps
X
X
Ceph
3.2
A bit of Globus history
U . S . D E P A R T M E N T O F
ENERGY
Globus sustainability model
• Standard Subscription
– Shared endpoints
– Management console
– Usage reporting
– Priority support
– Application integration
– HTTPS support (coming soon)
• Branded Web Site
• Premium Storage Connectors
• Alternate Identity Provider (InCommon is standard)
37
The path to sustainability
8,000
active shared
endpoints
100+
subscribers
635+ PB
transferred
18,000
active GCP
endpoints
73 billion
files processed
1,700
active GCS
endpoints
3 months
longest running transfer
1 PB
largest single
transfer to date
99.9%
availability
500+
identity providers
1,042
most shared
endpoints
at a single
institution 100,000+
users
Globus by the numbers
ALCF Globus Resources
• Documentation
– https://www.alcf.anl.gov/user-guides/data-transfer
– https://www.alcf.anl.gov/user-guides/using-globus
• Endpoints
– Theta: alcf#dtn_theta
– Mira: alcf#dtn_mira
– Cetus: alcf#dtn_mira
– Cooley: alcf#dtn_mira
– Vesta: alcf#dtn_vesta
– HPSS: alcf#dtn_hpss
Globus support resources
• Globus documentation: docs.globus.org
• Helpdesk and issue escalation: support@globus.org
• Mailing Lists
– https://www.globus.org/mailing-lists
• Customer engagement team
• Globus professional services team
– Assist with portal/gateway/app architecture and design
– Develop custom applications that leverage the Globus platform
– Advise on customized deployment and integration scenarios

Contenu connexe

Tendances

Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109
Chengjen Lee
 
Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012
Hortonworks
 
Nutch as a Web data mining platform
Nutch as a Web data mining platformNutch as a Web data mining platform
Nutch as a Web data mining platform
abial
 

Tendances (20)

Large scale crawling with Apache Nutch
Large scale crawling with Apache NutchLarge scale crawling with Apache Nutch
Large scale crawling with Apache Nutch
 
Polyglot metadata for Hadoop
Polyglot metadata for HadoopPolyglot metadata for Hadoop
Polyglot metadata for Hadoop
 
Openstack Swift - Lots of small files
Openstack Swift - Lots of small filesOpenstack Swift - Lots of small files
Openstack Swift - Lots of small files
 
Harnessing the power of Nutch with Scala
Harnessing the power of Nutch with ScalaHarnessing the power of Nutch with Scala
Harnessing the power of Nutch with Scala
 
An introduction to Storm Crawler
An introduction to Storm CrawlerAn introduction to Storm Crawler
An introduction to Storm Crawler
 
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
 
Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109Ckan tutorial odw2013 131109
Ckan tutorial odw2013 131109
 
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)
Introduction to Globus for System Administrators (GlobusWorld Tour - UMich)
 
On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Globus Endpoint Setup and Configuration - XSEDE14 Tutorial
Globus Endpoint Setup and Configuration - XSEDE14 TutorialGlobus Endpoint Setup and Configuration - XSEDE14 Tutorial
Globus Endpoint Setup and Configuration - XSEDE14 Tutorial
 
Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012Web Services Hadoop Summit 2012
Web Services Hadoop Summit 2012
 
dNFS for DBA's
dNFS for DBA'sdNFS for DBA's
dNFS for DBA's
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Web Crawling and Data Gathering with Apache Nutch
Web Crawling and Data Gathering with Apache NutchWeb Crawling and Data Gathering with Apache Nutch
Web Crawling and Data Gathering with Apache Nutch
 
StormCrawler at Bristech
StormCrawler at BristechStormCrawler at Bristech
StormCrawler at Bristech
 
DSD-INT 2015 - Delft3D 4 open source workshop - Adri Mourits
DSD-INT 2015 - Delft3D 4 open source workshop - Adri MouritsDSD-INT 2015 - Delft3D 4 open source workshop - Adri Mourits
DSD-INT 2015 - Delft3D 4 open source workshop - Adri Mourits
 
2014 hadoop wrocław jug
2014 hadoop   wrocław jug2014 hadoop   wrocław jug
2014 hadoop wrocław jug
 
Nutch as a Web data mining platform
Nutch as a Web data mining platformNutch as a Web data mining platform
Nutch as a Web data mining platform
 
Elk presentation 2#3
Elk presentation 2#3Elk presentation 2#3
Elk presentation 2#3
 

Similaire à Introduction to Globus: Research Data Management Software at the ALCF

Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
Globus
 

Similaire à Introduction to Globus: Research Data Management Software at the ALCF (20)

Jupyter + Globus: The Foundation for Interactive Data Science
Jupyter + Globus: The Foundation for Interactive Data ScienceJupyter + Globus: The Foundation for Interactive Data Science
Jupyter + Globus: The Foundation for Interactive Data Science
 
Globus: Beyond File Transfer
Globus: Beyond File TransferGlobus: Beyond File Transfer
Globus: Beyond File Transfer
 
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)
Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)
 
Tutorial: What's New with Globus
Tutorial: What's New with GlobusTutorial: What's New with Globus
Tutorial: What's New with Globus
 
Tutorial: Leveraging Globus in your Research Applications
Tutorial: Leveraging Globus in your Research ApplicationsTutorial: Leveraging Globus in your Research Applications
Tutorial: Leveraging Globus in your Research Applications
 
Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)
 
Globus Command Line Interface (APS Workshop)
Globus Command Line Interface (APS Workshop)Globus Command Line Interface (APS Workshop)
Globus Command Line Interface (APS Workshop)
 
Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Building Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with GlobusBuilding Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with Globus
 
"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018
 
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
 
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDKGlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
 
Globus presentation
Globus presentationGlobus presentation
Globus presentation
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
 
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
 
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
 
Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
 
Globus for System Administrators (CHPC 2019 - South Africa)
Globus for System Administrators (CHPC 2019 - South Africa)Globus for System Administrators (CHPC 2019 - South Africa)
Globus for System Administrators (CHPC 2019 - South Africa)
 
GlobusWorld 2021 Tutorial: Introduction to Globus
GlobusWorld 2021 Tutorial: Introduction to GlobusGlobusWorld 2021 Tutorial: Introduction to Globus
GlobusWorld 2021 Tutorial: Introduction to Globus
 

Plus de Globus

Plus de Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Introduction to Globus: Research Data Management Software at the ALCF

  • 1. Introduction to Globus: Research Data Management Software at the ALCF Rick Wagner rick@globus.org rpwagner@uchicago.edu rwagner@anl.gov
  • 2. Research data management today How do we... ...move? ...share? ...discover? ...reproduce? Index?
  • 3. 3 Globus delivers… Fast and reliable big data transfer, sharing, and platform services… …directly from your own storage systems… ...via software-as-a-service using existing identities with the overarching goal of...
  • 4. 4 Research Computing HPC Desktop Workstations Mass Storage Instruments Personal Resources Public Cloud National Resources …unifying access to data across tiers
  • 5. IBM Spectrum Scale Current Planned Storage Connectors - globus.org/connectors
  • 6. Public / private cloud stores External campus storage EC2 Project repositories, replication stores Public repositories Share with collaborators/community
  • 7. Analysis store Next-Gen Sequencer MRI Advanced Light Source Personal system Remote visualization Light Sheet Microscope High-durability, low-cost store Manage data from instruments Cryo-EM
  • 8. Use(r)-appropriate interfaces 8 GET /endpoint/go%23ep1 PUT /endpoint/vas#my_endpt 200 OK X-Transfer-API-Version: 0.10 Content-Type: application/json … Globus service Web CLI Rest API
  • 9. Globus SaaS / PaaS: Research data lifecycle Researcher initiates transfer request; or requested automatically by script, science gateway 1 Instrument Compute Facility Globus transfers files reliably, securely 2 Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4 Researcher selects files to share, selects user or group, and sets access permissions 3 Collaborator logs in to Globus and accesses shared files; no local account required; download via Globus 5 Automating research workflows and ensuring those that need access to the data have it. 8 Personal Computer Transfer Share • Use a Web browser or platform services • Access any storage • Use an existing identity Build The Globus Command Line Interface, API sets, and Python SDK provide a platform… 6 … for building science gateways, portals and publication services. 7
  • 10. Conceptual architecture: Hybrid SaaS DATA Channel CONTROL Channel Source Endpoint Destination Endpoint Subscriber owned and administered storage system Globus “client” software No data relay or staging via GlobusSubscriber Control Domain Globus Control Domain Single, globally accessible multi-tenant service
  • 11. Conceptual architecture: Sharing Managed Endpoint Subscriber Control Domain Globus Control Domain Globus managed ”overlay” permissions Shared Endpoint DATA Channel CONTROL Channel Subscriber managed filesystem permissions External User Control Domain
  • 12. …makes your storage system a Globus endpoint
  • 13. Endpoints (Collections) • Storage abstraction – All transfers happen between two endpoints – Globus Connect instantiates endpoints • Collection ~= Endpoint • Test / Demo Endpoints – Globus Tutorial Endpoint 1 – Globus Tutorial Endpoint 2 – ESnet Test Endpoints o Contain file samples of various sizes • Globus Connect Personal – Now your laptop is an endpoint – https://www.globus.org/globus-connect-personal 14
  • 14. Globus Connect Personal • Installers do not require admin access • Zero configuration; auto updating • Handles NATs • Installs in seconds – easy to delete
  • 15. The Globus Web App - Accounts • A Globus Account is – A Primary Identity – Possible Linked Identities • Linking Identities • Managing Identities • Consents
  • 17. Activity Monitoring • Recent / History / Filter • Drilling Down – File transfer statistics – Overview – Event Log – Cancelling an active task
  • 18. Groups • What can they be used for? – Sharing: Access permissions for more than one person – Roles: Endpoint management and monitoring • Groups – Creating groups and setting the visibility – Members (invitations), Subgroups, Settings – Settings o Policies / Membership Fields / Terms & Conditions – Roles o Giving others authority over your groups
  • 19. Endpoint Sharing and Roles • Sharing – Select the directory and create the “share” – A ”share” is another type of endpoint – Share with: Users / Groups / All Globus Users • Roles – Giving others (or groups of others) control or monitoring rights for your endpoints
  • 20. Bookmarks • Just like browser bookmarks – frequently used, or maybe not used frequently enough! • Creating a bookmark • Using a bookmark • Sorting and Filtering • Editing and Deleting
  • 21. Globus Command Line Interface Open source, uses Python SDK docs.globus.org/cli github.com/globus/ globus-cli
  • 22. The Globus CLI • Installation – docs.globus.org/cli/installation – Prerequisites • Logging On (remember the consents?) – globus login / logout • Getting help / list of commands – globus –help – globus list-commands • Doing something – It all about the UUIDs – Don’t forget the file paths!
  • 23. The Globus CLI – Let’s do a few things… • Find endpoints – globus endpoint search Midway – globus endpoint search ESNet – globus endpoint search --filter-scope=recently-used • Find endpoint contents – globus ls af7bda53-6d04-11e5-ba46-22000b92c6ec – globus ls af7bda53-6d04-11e5-ba46-22000b92c6ec:RMACC2018 • Transfer a file – From ESnet Read-Only Test DTN at CERN to Midway – Note the specific paths – globus transfer d8eb36b6-6d04-11e5-ba46-22000b92c6ec:/~/data1/1M.dat af7bda53-6d04-11e5- ba46-22000b92c6ec:/~/1M.dat • Transfer a directory – From Globus Tutorial Endpoint 2 to Midway (create directory and contents) – globus transfer --recursive ddb59af0-6d04-11e5-ba46-22000b92c6ec:/~/sync-demo af7bda53- 6d04-11e5-ba46-22000b92c6ec:/~/syncDemo • https://docs.globus.org/cli/examples/
  • 24. Globus CLI • Easy install and updates • It’s a native application distributed by Globus – https://docs.globus.org/cli/ – https://github.com/globus/globus-cli • Command globus login gets access tokens and refresh tokens – Stores the token locally (~/.globus.cfg ) – The CLI ”acts as” the logged in user • All interactions with the service use the tokens – Tokens for Globus Auth and Transfer services • Command globus logout deletes those • https://docs.globus.org/cli/examples/ • https://github.com/globus/automation-examples
  • 27. 28 Globus delivers… with applications and as a platform… Fast and reliable data transfer, sharing, and file management… …directly from your own storage systems… ...via software-as-a-service using existing identities.
  • 28. How can I integrate Globus into my research workflows? 29
  • 29. Globus serves as… A platform for building science gateways, portals and other web applications in support of research and education. 30
  • 30. Globus Auth API (Group Management) … GlobusTransferAPI GlobusConnect Data Discovery File Sharing File Transfer & Replication Globus Platform-as-a-Service Use existing institutional ID systems in external web applications Integrate file transfer and sharing capabilities into scientific web apps, portals, gateways, etc...
  • 31. Example web apps that leverage Globus 32
  • 32. Globus Transfer API Set • Doc – https://docs.globus.org/api/transfer/ • Sample data portal – https://docs.globus.org/modern-research- data-portal/ • Jupyter notebook – https://github.com/globus/globus-jupyter- notebooks 33
  • 33. Globus Auth API Set • Doc – https://docs.globus.org/api/auth/ • Sample data portal – https://docs.globus.org/modern-research- data-portal/ • Native app examples – https://github.com/globus/native-app- examples 34
  • 34. 35 Petrel online store https://petrel.alcf.anl.gov 94 Gbit/s Petrel—Blue Waters Petrel: A Programmatically Accessible Research Data Service 3.2 petabytes 100 Gbps X X Ceph 3.2
  • 35. A bit of Globus history U . S . D E P A R T M E N T O F ENERGY
  • 36. Globus sustainability model • Standard Subscription – Shared endpoints – Management console – Usage reporting – Priority support – Application integration – HTTPS support (coming soon) • Branded Web Site • Premium Storage Connectors • Alternate Identity Provider (InCommon is standard) 37
  • 37. The path to sustainability
  • 38. 8,000 active shared endpoints 100+ subscribers 635+ PB transferred 18,000 active GCP endpoints 73 billion files processed 1,700 active GCS endpoints 3 months longest running transfer 1 PB largest single transfer to date 99.9% availability 500+ identity providers 1,042 most shared endpoints at a single institution 100,000+ users Globus by the numbers
  • 39. ALCF Globus Resources • Documentation – https://www.alcf.anl.gov/user-guides/data-transfer – https://www.alcf.anl.gov/user-guides/using-globus • Endpoints – Theta: alcf#dtn_theta – Mira: alcf#dtn_mira – Cetus: alcf#dtn_mira – Cooley: alcf#dtn_mira – Vesta: alcf#dtn_vesta – HPSS: alcf#dtn_hpss
  • 40. Globus support resources • Globus documentation: docs.globus.org • Helpdesk and issue escalation: support@globus.org • Mailing Lists – https://www.globus.org/mailing-lists • Customer engagement team • Globus professional services team – Assist with portal/gateway/app architecture and design – Develop custom applications that leverage the Globus platform – Advise on customized deployment and integration scenarios