This tutorial from the Gateways 2018 conference in Austin, TX explored the capabilities provided by Globus for assembling, describing, publishing, identifying, searching, and discovering datasets.
3. 3
Research Computing HPC
Desktop Workstations
Mass Storage Instruments
Personal Resources
Public Cloud
National Resources
Unify access to data across tiers
8. 8,000
active shared
endpoints
90
subscribers
425 PB
transferred
18,000
active GCP
endpoints
70 billion
files processed
1,700
active GCS
endpoints
3 months
longest running transfer
1 PB
largest single
transfer to date
99.9%
availability
500
identity providers
1,042
most shared
endpoints
at a single
institution 100,000
users
Globus by the numbers
9. Conceptual architecture: Hybrid SaaS
DATA
Channel
CONTROL
Channel
Source
Endpoint
Destination
Endpoint
Subscriber owned
and administered
storage system
Globus
“connector”
software
No data relay or
staging via Globus
cloud service
Subscriber
Control
Domain
Globus
Control
Domain
Single, globally accessible
multi-tenant service
11. Endpoints (Collections)
• Storage abstraction
– All transfers happen between two endpoints
– Globus Connect instantiates endpoints
• Collection ~= Endpoint
• Test / Demo Endpoints
– Globus Tutorial Endpoint 1
– Globus Tutorial Endpoint 2
– ESnet Test Endpoints
o Contain file samples of various sizes
• Globus Connect Personal
– Now your laptop is an endpoint
– https://www.globus.org/globus-connect-personal
11
12. Globus Connect Personal
• Installers do not require admin access
• Zero configuration; auto updating
• Handles NATs
• Installs in seconds – easy to delete - I’ll prove it!
13. The Globus Web App - Accounts
• A Globus Account is
– A Primary Identity
– Possible Linked Identities
• Linking Identities
• Managing Identities
• Consents
14. The Globus Web App - Hidden in Plain Sight
• The Hamburger Menu
– Not always the same – based on the type of endpoint and the storage behind it
– A great place to get the link to a share
• Transfer Settings
– label – when using the activity monitor it’s nice to see a recognizable name
– sync - only transfer new or changed files
– delete files on destination that do not exist on source
– preserve source file modification times
– verify file integrity after transfer
– encrypt transfer
• Search
– Look for the magnifying glass
– Search for: Endpoints / Users / Groups
15. Activity Monitoring
• Recent / History / Filter
• Drilling Down
– File transfer statistics
– Overview
– Event Log
– Cancelling an active task
16. Groups
• What can they be used for?
– Sharing: Access permissions for more than one person
– Roles: Endpoint management and monitoring
• Groups
– Creating groups and setting the visibility
– Members (invitations), Subgroups, Settings
– Settings
o Policies / Membership Fields / Terms & Conditions
– Roles
o Giving others authority over your groups
17. Endpoint Sharing and Roles
• Sharing
– Select the directory and create the “share”
– A ”share” is another type of endpoint
– Share with: Users / Groups / All Globus Users
• Roles
– Give others (users or groups) rights on your endpoint to:
o Monitor activity on the endpoint
o Manage activity on the endpoint (e.g., cancel, pause)
o Manage access control permissions on the endpoint
18. Bookmarks
• Just like browser bookmarks – frequently used, or
maybe not used frequently enough!
• Creating a bookmark
• Using a bookmark
• Sorting and Filtering
• Editing and Deleting
19. Globus Command Line Interface
Open source, uses
Python SDK
docs.globus.org/cli
github.com/globus/
globus-cli
20. The Globus CLI
• Installation
– docs.globus.org/cli/installation
– Prerequisites
• Logging On (remember the consents?)
– globus login / logout
• Getting help / list of commands
– globus –help
– globus list-commands
• Doing something
– It all about the UUIDs
– Don’t forget the file paths!
21. The Globus CLI – Let’s do a few things…
• Find endpoints
– globus endpoint search Midway
– globus endpoint search ESNet
– globus endpoint search --filter-scope=recently-used
• Find endpoint contents
– globus ls af7bda53-6d04-11e5-ba46-22000b92c6ec
– globus ls af7bda53-6d04-11e5-ba46-22000b92c6ec:RMACC2018
• Transfer a file
– From ESnet Read-Only Test DTN at CERN to Midway
– Note the specific paths
– globus transfer d8eb36b6-6d04-11e5-ba46-22000b92c6ec:/~/data1/1M.dat af7bda53-6d04-11e5-
ba46-22000b92c6ec:/~/1M.dat
• Transfer a directory
– From Globus Tutorial Endpoint 2 to Midway (create directory and contents)
– globus transfer --recursive ddb59af0-6d04-11e5-ba46-22000b92c6ec:/~/sync-demo af7bda53-
6d04-11e5-ba46-22000b92c6ec:/~/syncDemo
• https://docs.globus.org/cli/examples/
25. Globus Auth
(and Group Management)
…
GlobusAPIs
GlobusConnect
Data Automation, Publication
& Search
File Sharing
File Transfer & Replication
Globus Platform-as-a-Service
Use existing institutional
ID systems in external
web applications
Integrate file transfer and sharing
capabilities into scientific web
apps, portals, gateways, etc...
26. PaaS Security Challenges – Globus Auth
• How to provide:
– Login to apps
o Web apps (Jupyter Notebook, Portals), Mobile, Desktop, Command line
– Protect all REST API communications
o App à Globus service (Jupyter Notebook, MRDP)
o App à non-Globus service (MRDP)
o Service à service (MRDP)
• While:
– Not introducing even more identities
o Providing a platform to consolidate those identities
– Providing least privileges security model (consents)
– Being agnostic to programming language and framework
– Being web friendly
– Making it easy for users and developers
26
27. Client
(Web Portal,
Application,
Jupyter)
Globus Transfer
(Resource Server)
Globus Auth
(Authorization
Server)
5. Authenticate using client id
and secret, send authorization
code
Authorization Code Grant
Browser (User)
1. Access
portal
2.
Redirects
user
3. User authenticates and
consents
4. Authorization
code
6. Access token(s)
7. Authenticate with access
token(s) to give the client
the authority invoke the
transfer service
Identity
Provider
29. Globus Transfer API
• Globus Web App consumes public Transfer API
• REST API: Resources/actions named by URL
– Query params allow refinement (e.g., filter)
• Globus APIs use JSON for documents and resource
representations
• Requests authorized via Globus Auth issued OAuth2
access token
– Authorization: Bearer asdflkqhafsdafeawk
docs.globus.org/api/transfer
29
30. Globus Python SDK
• Python client library for the Globus Auth and Transfer
REST APIs
• globus_sdk.TransferClient class handles
connection management, security, framing,
marshaling
from globus_sdk import TransferClient
tc = TransferClient()
globus.github.io/globus-sdk-python
30
31. TransferClient low-level calls
• Thin wrapper around REST API
– post(), get(), update(), delete()
get(path, params=None, headers=None, auth=None,
response_class=None)
o path – path for the request, with or without leading slash
o params – dict to be encoded as a query string
o headers – dict of HTTP headers to add to the request
o response_class – class response object, overrides the client’s
default_response_class
o Returns: GlobusHTTPResponse object
31
32. TransferClient higher-level calls
• One method for each API resource and HTTP verb
• Largely direct mapping to REST API
endpoint_search(filter_fulltext=None,
filter_scope=None,
num_results=25,
**params)
32
33. Walkthrough Jupyter Notebook with our Hub
• https://jupyter.demo.globus.org
– Sign in with Globus
– Verify the consents
– Start My Server (this will take about a minute)
– Open folder: globus-jupyter-notebooks
– Open folder: globusWorld2018
– Run SDK_autoAuth.ipynb
• If you mess it up and want to “go back to the beginning”
– Back down to the root folder
– Run NotebookPuller.ipynb
• If you want to use the notebook outside of our hub
– https://github.com/globus/globus-jupyter-notebooks
– Autentication is a manual cut and paste of exchanging the authorization
code for an access token
33
34. Endpoint Search
• Plain text search for endpoint
– Searches owner, display name, keywords, description,
organization, department
– Full word and prefix match
• Limit search to pre-defined scopes
– all, my-endpoints, recently-used, in-use, shared-
by-me, shared-with-me
• Returns: List of endpoint documents
34
36. Endpoint Activation
• Activating endpoint means binding a credential to an
endpoint for login
• Globus Connect Server endpoint that have MyProxy
or MyProxy OAuth identity provider require login via
web
• Auto-activate
– Globus Connect Personal and Shared endpoints use Globus-
provided credential
– Must auto-activate before any API calls to endpoints
36
37. File operations
• List directory contents (ls)
• Make directory (mkdir)
• Rename
• Note:
– Path encoding & UTF gotchas
– Don’t forget to auto-activate first
37
38. Task submission
• Asynchronous operations
– Transfer
o Sync level option
– Delete
• Get submission_id, followed by submit
– Once and only once submission
38
39. Task management
• Get task by id
• Get task_list
• Update task by id (label, deadline)
• Cancel task by id
• Get event list for task
• Get task pause info
39
40. Bookmarks
• Get list of bookmarks
• Create bookmark
• Get bookmark by id
• Update bookmark
• Delete bookmark by id
• Cannot perform other operations directly on bookmarks
– Requires client-side resolution
40
41. Shared endpoints and access rules (ACL)
• Shared Endpoint – create / delete / get info / get list
• Administrator role required to delegate access managers
• Access manager role required to manage
permissions/ACL
• Operations:
– Get list of access rules
– Get access rule by id
– Create access rule
– Update access rule
– Delete access rule
41
42. Management API
• Allow endpoint administrators to monitor and manage
all tasks with endpoint
– Task API is essentially the same as for users
– Information limited to what they could see locally
• Cancel tasks
• Pause rules
42
43. Globus Helper Pages
• Globus pages designed for use by your web apps
– Browse Endpoint
– Activate Endpoint
– Select Group
– Manage Identities
– Manage Consents
– Logout
docs.globus.org/api/helper-pages
43
48. Data replication
• For backup: initiated by user or system back up
• Automated transfer of data from science instrument
48
Recurring transfers
with sync option
Copy /ingest
Daily @ 3:30am
49. Staging data with compute jobs
• Stage data in or out as part of the job
• Transfer task is submitted when the job is run
– Endpoint may not be currently activated
• Alternative approaches
1. User adds directives to job submission script
2. Application manages data staging on user’s behalf
50. Application driven automation
• Application (e.g. portal, science gateway) submits a
transfer of compute results as the user
• Application monitors transfer, and initiates additional
processing and/or backup of data
52. Globus Auth: Native apps
• Client that cannot keep a secret, e.g…
– Command line, desktop apps
– Mobile apps
– Jupyter notebooks
• Native app is registered with Globus Auth
– Not a confidential client
• Native App Grant is used
– Variation on the Authorization Code Grant
• Globus SDK:
– To get tokens: NativeAppAuthClient
– To use tokens: AccessTokenAuthorizer
52
53. Browser
Native App grant
53
Native App
(Client)
1. Run
application
2. URL to
authenticate
3. Authenticate and
consent
4. Auth code
5. Register
auth code
6. Exchange
code
7. Access tokens
8. Authenticate with access
tokens to invoke transfer
service as user App/Service
(Resource Server)
Globus Auth
(Authorization Server)
54. Refresh tokens
• Common use cases
– Portal checking transfer status when user is not logged in
– Running command line app from script
• Refresh tokens issued to client, in particular scope
• Client uses refresh token to get access token
– Confidential client: client_id and client_secret required
– Native app: client_secret not required
• Refresh token good for 6 months after last use
• Consent rescindment revokes resource token
54
55. Refresh tokens
55
Native App
(Client)
App/Service
(Resource Server)
Globus Auth
(Authorization Server)
1. Run
application
2. URL to
authenticate
Browser
3. Authenticate and consent
4. Auth code
5. Register
auth code
6. Exchange code,
request refresh tokens
7. Access
tokens and refresh tokens
9. Exchange refresh token
for new access tokens
8. Store refresh tokens
10. Access tokens
11. Authenticate with access
tokens to invoke service as user
56. Native App/Refresh Tokens Sample Code
github.com/globus/native-app-examples
• ./example_copy_paste.py
– User copies and pastes code to the app
• ./example_copy_paste_refresh_token.py
– Stores refresh token locally, uses it to get new access tokens
• See README for installation
56
On your EC2 instance in ~/native-app-examples
58. Globus Command Line Interface (CLI)
• Native application, with easy installation/update
• globus login – get access and refresh tokens
– Tokens stored locally in ~/.globus.cfg
• Service (transfer/auth) invocation uses tokens
• globus logout – delete tokens
docs.globus.org/cli/examples
Also available on your EC2 instance
59. UUIDs everywhere
• UUIDs for endpoint, task, user identity, groups…
• Use search/list options
• get-identities for identity username to UUID
$ globus endpoint search 'Globus Tutorial'
$ globus task list
$ globus get-identities vas@globus.org bfc122a3-
af43-43e1-8a41-d36f28a2bc0a
60. Batch Transfers
• Transfer tasks have one source/destination, but can have
any number of files
• Provide input source-dest pairs via local file
• e.g. move files listed in files.txt from $ep1 to $ep2
$ ep1=ddb59aef-6d04-11e5-ba46-22000b92c6ec
$ ep2=ddb59af0-6d04-11e5-ba46-22000b92c6ec
$ globus transfer $ep1:/share/godata/ $ep2:/~/ --
batch --label 'CLI Batch' < files.txt
61. Parsing CLI output
• Default output is text; for JSON output use --format json
$ globus endpoint search --filter-scope my-endpoints
$ globus endpoint search --filter-scope my-endpoints --
format json
• Extract specific attributes using --jmespath <expression>
$ globus endpoint search --filter-scope my-endpoints --
jmespath 'DATA[].[id, display_name]'
62. Managing notifications
• Turn off emails sent for tasks
• Useful when an application manages tasks for a user
• Disable notifications with the --notify option
--notify off (all notifications)
--notify succeeded|failed|inactive (select notifications)
63. Permission management
• Set and manage permissions on shared endpoint
• Requires access manager role
$ share=<shared_endpoint_UUID>
$ globus endpoint permission create --permissions r --
identity tuecke@globus.org $share:/NCARTest/
$ globus endpoint permission list $share
$ globus endpoint permission delete $share <perm_UUID>
64. Automation with CLI
• Script transfers data repeatedly via task manager/cron
– Interactions are as user: both for data access and to Globus
services
• CLI commands used in job submission script
– CLI is installed on head node
– User runs globus login; tokens stored in user’s home directory
– Tokens accessible when job runs and submits stage in/out tasks
– Use --skip-activation-check to submit task even if endpoint is
not activated at submit time
65. Automation with portals
• Portal needs to act as the user
• User grants “offline” access to the portal
– Portal gets and stores refresh tokens for each user
– Uses client id/secret + refresh tokens to get new access tokens
– Portal maintains state about transfers being managed (task id)
66. Useful submission options
• Safe resubmissions
– Applies to all tasks (transfer and delete)
– Get a task UUID and use it in submission
– $ globus task generate-submission-id
– Use --submission-id option in transfer command
• Task wait
– Useful for scripting conditional on transfer task status
67. Walkthrough
• Sample script that uses sync option to transfer files
github.com/globus/automation-
examples/blob/master/cli-sync.sh
• Same task, via an app that uses Python SDK
github.com/globus/automation-
examples/blob/master/globus_folder_sync.py
69. A simplistic data flow…
• Adequate for ad hoc sharing (implicit knowledge)
• Broader access, reuse requires “formalization”
• Leverage Globus data publication services
Notebook
Data
Repository
Bearer a45cd…
Dataset
Shared
endpoint
POST '/endpoint/a3c345f... /mkdir’
200 OK
...
X-Transfer-API-Version: 0.10
Content-Type: application/json
...
Analyze
70. Globus Data Publication V1
SaaS publication
BYO Storage & in-place
publication
User-managed collections
Arbitrary metadata (with
pre-defined schema)
Handle, DOI PIDs
Adoption since 2015:
>1800 users, >600 datasets
71. Publication V2: A platform for automation
• Decompose data publication v1 into platform services
• Facilitate flexible re-composition, adaptation by
customers
• Enable extension and enhancement
• Initial services
– Search, identifiers (and data management)
• Future services
– Description (metadata), flows
72. Globus Search
• Scalable service à billions of entries
• Schema agnostic: use standard (e.g. DataCite) or custom
metadata
• Fine grained access control: only returns results that are
visible to user
• Plain text search: ranked results
• Faceted search: facilitates data discovery
• Rich query language: ranges, expressions, regex, etc.
72
docs.globus.org/api/search
73. Globus Identifiers
• Service for issuing persistent identifiers
– DOI, ARK, Handle, Globus
– e.g. https://identifiers.globus.org/doi:10.1145/2076450.2076468
• Within a namespace, e.g. your DataCite namespace
– Control which identities/groups can create identifiers
• Each identifier has…
– Link to data: one or more https URLs, to file, folder or manifest
– Landing page: provided by service, or by user
– Visibility: identities, groups that can see identifier
– Checksum: of the file or manifest
– Metadata: as required by identifier (e.g., DataCite), extensible
– Replaces/replaced-by: for versioning
73
74. SearchIdentifierDescribeTransferAuth
Extending the automation flow
• How can we automate a data publication flow using
Globus platform services?
Create
folder
Transfer
data
Get
metadata
Mint
persistent
identifier
Catalog
Get
credentials
Set ACL
75. Support resources
• Globus documentation: docs.globus.org
• Sample code: github.com/globus
• Helpdesk and issue escalation: support@globus.org
• Customer engagement team
• Globus professional services team
– Assist with portal/gateway/app architecture and design
– Develop custom applications that leverage the Globus platform
– Advise on customized deployment and integration scenarios
76. Join the Globus community
• Access the service: globus.org/login
• Create a personal endpoint: globus.org/app/endpoints/create-gcp
• Documentation: docs.globus.org
• Engage: globus.org/mailing-lists
• Subscribe: globus.org/subscriptions
• Need help? support@globus.org
• Follow us: @globusonline