SlideShare une entreprise Scribd logo
1  sur  12
Simplifying Science Gateway Data
Management with Globus
Part IV – Automated Data Ingests
October 2020, Gateways 2020
Phase 1 - Gather data
Gathering datasets from research partners
• Your project is gathering datasets
from partners. Each dataset is
several TBs and takes ~a day to
transfer over the network.
• For the data to be useful, it needs
descriptive metadata.
• Ultimately, the team needs to find
datasets that match specific
criteria.
What are the dataset ingest challenges?
• Getting very large datasets transferred from gateway
users’ systems to the central repository
– (This is Scenario I - large-scale data transfer.)
• Generating persistent identifiers for the data in the
central repository so we can link metadata to data
• Storing the metadata
• Indexing the metadata to enable searching
Demonstration
Data ingests in a
web application
https://petraldata.net/
What needs to be in place for it to work?
• Data storage
– Globus Connect Server on Petrel
• Persistent identifiers
– FAIR Research Identifier Service
– Hosted by https://fair-research.org/
• Metadata storage, indexing, search
– Globus Search API
– Hosted by Globus
Globus Connect Server on Petrel
• Configured for self-service projects
– Researchers do not receive local (Linux) accounts!
– Uses Globus for authorization & management
• Guest collections and groups
– Project PIs request access by applying to join the “Petrel
Project Owners” group (using the Globus web app)
– Admin creates Globus group, makes PI a group manager
– Admin creates guest collection, makes PI an access manager
– Admin sets a quota of 100TB for the guest collection
• RESTful web service, written in Python, that
stores identifier metadata
• Mints (creates) identifiers from external
service providers using a unified service
provider interface (SPI)
• Different identifiers supported through
namespaces
• Client requests served as HTML landing
pages or other machine-readable formats
(e.g., JSON, JSON-LD)
FAIR Research Identifiers
AWS-RDS
AWS-EC2
Postgres
Registration SPI
(Python)
Web Server - REST API
(Apache, Flask, Python)
RDBMS ORM
(SQLAlchemy)
AuthN/AuthZ
(Globus Auth, Globus Groups)
Web
Browser
Client
APIs
HTML JSON, JSON-LD, other
extensible renderings
DataCite
(DOI)
EZID
(ARK)
Minid
(Handle)
https://minid.readthedocs.io/en/develop/
• REST API provides a simple CRUD
interface
• Has other capabilities, like finding
identifiers by checksum
• JSON is used for request and
response
• Namespaces may also have their own
handlers, landing pages, and other
customizations.
FAIR Research Identifiers
Globus Search API
• RESTful API for indexing & search
– Hosted by Globus (including the metadata &
index storage!)
– Each project gets an “index” object (private
tenancy)
– REST API, Python client package, Python CLI
• https://docs.globus.org/api/search/
Globus Search API features
• Scalable: to billions of entries
• Schema agnostic: can use standard
(e.g., DataCite) or custom metadata
• Fine-grain access control: only returns
results that are visible to user
• Plain text search: ranked results
• Faceted search: for data discovery
• Rich query language: ranges,
expressions, regex, fuzzy, stemming, etc.
Key ingredients
1. UUID and base path for the guest collection where
data is gathered
2. Minid Python client
3. UUID for Globus Search index
4. Your choice of appropriate metadata schema for
your project’s datasets
Code
Data ingest in a
web application

Contenu connexe

Tendances

Introduction to the Globus Platform (APS Workshop)
Introduction to the Globus Platform (APS Workshop)Introduction to the Globus Platform (APS Workshop)
Introduction to the Globus Platform (APS Workshop)Globus
 
Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)Globus
 
What's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtraWhat's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtraGlobus
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Globus
 
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
Recent Upgrades to ARM Data Transfer and Delivery Using GlobusRecent Upgrades to ARM Data Transfer and Delivery Using Globus
Recent Upgrades to ARM Data Transfer and Delivery Using GlobusGlobus
 
"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018Globus
 
Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)Globus
 
Globus: Enabling the Open Storage Network
Globus: Enabling the Open Storage NetworkGlobus: Enabling the Open Storage Network
Globus: Enabling the Open Storage NetworkGlobus
 
GlobusWorld 2021 Tutorial: Globus for System Administrators
GlobusWorld 2021 Tutorial: Globus for System AdministratorsGlobusWorld 2021 Tutorial: Globus for System Administrators
GlobusWorld 2021 Tutorial: Globus for System AdministratorsGlobus
 
Globus: Beyond File Transfer
Globus: Beyond File TransferGlobus: Beyond File Transfer
Globus: Beyond File TransferGlobus
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data StagingHenning Bergmeyer
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobus
 
Introduction to the Globus Platform (GlobusWorld Tour - UMich)
Introduction to the Globus Platform (GlobusWorld Tour - UMich)Introduction to the Globus Platform (GlobusWorld Tour - UMich)
Introduction to the Globus Platform (GlobusWorld Tour - UMich)Globus
 
Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17Mary Bass
 
Globus status and publication plans
Globus status and publication plansGlobus status and publication plans
Globus status and publication plansIan Foster
 
Globus publication demo screenshots
Globus publication demo screenshotsGlobus publication demo screenshots
Globus publication demo screenshotsIan Foster
 
Automating Research Data Flows with the Globus Command Line Interface (CLI)
Automating Research Data Flows with the Globus Command Line Interface (CLI)Automating Research Data Flows with the Globus Command Line Interface (CLI)
Automating Research Data Flows with the Globus Command Line Interface (CLI)Globus
 
Log analysis using elk
Log analysis using elkLog analysis using elk
Log analysis using elkRushika Shah
 
Globus Platform Overview
Globus Platform OverviewGlobus Platform Overview
Globus Platform OverviewGlobus
 

Tendances (20)

Introduction to the Globus Platform (APS Workshop)
Introduction to the Globus Platform (APS Workshop)Introduction to the Globus Platform (APS Workshop)
Introduction to the Globus Platform (APS Workshop)
 
Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)
 
What's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtraWhat's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtra
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
 
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
Recent Upgrades to ARM Data Transfer and Delivery Using GlobusRecent Upgrades to ARM Data Transfer and Delivery Using Globus
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
 
"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018
 
Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)Introduction to Globus (APS Workshop)
Introduction to Globus (APS Workshop)
 
Globus: Enabling the Open Storage Network
Globus: Enabling the Open Storage NetworkGlobus: Enabling the Open Storage Network
Globus: Enabling the Open Storage Network
 
GlobusWorld 2021 Tutorial: Globus for System Administrators
GlobusWorld 2021 Tutorial: Globus for System AdministratorsGlobusWorld 2021 Tutorial: Globus for System Administrators
GlobusWorld 2021 Tutorial: Globus for System Administrators
 
Globus: Beyond File Transfer
Globus: Beyond File TransferGlobus: Beyond File Transfer
Globus: Beyond File Transfer
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data Staging
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
Introduction to the Globus Platform (GlobusWorld Tour - UMich)
Introduction to the Globus Platform (GlobusWorld Tour - UMich)Introduction to the Globus Platform (GlobusWorld Tour - UMich)
Introduction to the Globus Platform (GlobusWorld Tour - UMich)
 
SomeSlides
SomeSlidesSomeSlides
SomeSlides
 
Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17
 
Globus status and publication plans
Globus status and publication plansGlobus status and publication plans
Globus status and publication plans
 
Globus publication demo screenshots
Globus publication demo screenshotsGlobus publication demo screenshots
Globus publication demo screenshots
 
Automating Research Data Flows with the Globus Command Line Interface (CLI)
Automating Research Data Flows with the Globus Command Line Interface (CLI)Automating Research Data Flows with the Globus Command Line Interface (CLI)
Automating Research Data Flows with the Globus Command Line Interface (CLI)
 
Log analysis using elk
Log analysis using elkLog analysis using elk
Log analysis using elk
 
Globus Platform Overview
Globus Platform OverviewGlobus Platform Overview
Globus Platform Overview
 

Similaire à Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus

Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Advanced Computing Meets Data FAIRness
Advanced Computing Meets Data FAIRnessAdvanced Computing Meets Data FAIRness
Advanced Computing Meets Data FAIRnessGlobus
 
ASP.NET Mvc 4 web api
ASP.NET Mvc 4 web apiASP.NET Mvc 4 web api
ASP.NET Mvc 4 web apiTiago Knoch
 
Building Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with GlobusBuilding Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with GlobusGlobus
 
Building RESTfull Data Services with WebAPI
Building RESTfull Data Services with WebAPIBuilding RESTfull Data Services with WebAPI
Building RESTfull Data Services with WebAPIGert Drapers
 
Building Software Backend (Web API)
Building Software Backend (Web API)Building Software Backend (Web API)
Building Software Backend (Web API)Alexander Goida
 
SOLID Programming with Portable Class Libraries
SOLID Programming with Portable Class LibrariesSOLID Programming with Portable Class Libraries
SOLID Programming with Portable Class LibrariesVagif Abilov
 
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...Thomas W. Fry
 
Denodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me AnythingDenodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me AnythingDenodo
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaSGlobus
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 
Tutorial: Leveraging Globus in your Research Applications
Tutorial: Leveraging Globus in your Research ApplicationsTutorial: Leveraging Globus in your Research Applications
Tutorial: Leveraging Globus in your Research ApplicationsGlobus
 
ArcReady - Architecting For The Cloud
ArcReady - Architecting For The CloudArcReady - Architecting For The Cloud
ArcReady - Architecting For The CloudMicrosoft ArcReady
 
Evolution Of The Web Platform & Browser Security
Evolution Of The Web Platform & Browser SecurityEvolution Of The Web Platform & Browser Security
Evolution Of The Web Platform & Browser SecuritySanjeev Verma, PhD
 
Debugging the Web with Fiddler
Debugging the Web with FiddlerDebugging the Web with Fiddler
Debugging the Web with FiddlerIdo Flatow
 
Working with Globus Platform Services
Working with Globus Platform ServicesWorking with Globus Platform Services
Working with Globus Platform ServicesGlobus
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksMapR Technologies
 

Similaire à Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus (20)

Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Advanced Computing Meets Data FAIRness
Advanced Computing Meets Data FAIRnessAdvanced Computing Meets Data FAIRness
Advanced Computing Meets Data FAIRness
 
ASP.NET Mvc 4 web api
ASP.NET Mvc 4 web apiASP.NET Mvc 4 web api
ASP.NET Mvc 4 web api
 
Building Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with GlobusBuilding Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with Globus
 
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
 
Building RESTfull Data Services with WebAPI
Building RESTfull Data Services with WebAPIBuilding RESTfull Data Services with WebAPI
Building RESTfull Data Services with WebAPI
 
Building Software Backend (Web API)
Building Software Backend (Web API)Building Software Backend (Web API)
Building Software Backend (Web API)
 
SOLID Programming with Portable Class Libraries
SOLID Programming with Portable Class LibrariesSOLID Programming with Portable Class Libraries
SOLID Programming with Portable Class Libraries
 
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
 
Denodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me AnythingDenodo Partner Connect: Technical Webinar - Ask Me Anything
Denodo Partner Connect: Technical Webinar - Ask Me Anything
 
Basics of the Web Platform
Basics of the Web PlatformBasics of the Web Platform
Basics of the Web Platform
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
HDF Cloud Services
HDF Cloud ServicesHDF Cloud Services
HDF Cloud Services
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Tutorial: Leveraging Globus in your Research Applications
Tutorial: Leveraging Globus in your Research ApplicationsTutorial: Leveraging Globus in your Research Applications
Tutorial: Leveraging Globus in your Research Applications
 
ArcReady - Architecting For The Cloud
ArcReady - Architecting For The CloudArcReady - Architecting For The Cloud
ArcReady - Architecting For The Cloud
 
Evolution Of The Web Platform & Browser Security
Evolution Of The Web Platform & Browser SecurityEvolution Of The Web Platform & Browser Security
Evolution Of The Web Platform & Browser Security
 
Debugging the Web with Fiddler
Debugging the Web with FiddlerDebugging the Web with Fiddler
Debugging the Web with Fiddler
 
Working with Globus Platform Services
Working with Globus Platform ServicesWorking with Globus Platform Services
Working with Globus Platform Services
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 

Plus de Globus

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration TopicsGlobus
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowGlobus
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesGlobus
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusGlobus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for ResearchersGlobus
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with GlobusGlobus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System AdministratorsGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersGlobus
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersGlobus
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Globus
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeGlobus
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New UsersGlobus
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsGlobus
 
Globus Automation
Globus AutomationGlobus Automation
Globus AutomationGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to GlobusGlobus
 

Plus de Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to Globus
 

Dernier

Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 

Dernier (20)

Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 

Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus

  • 1. Simplifying Science Gateway Data Management with Globus Part IV – Automated Data Ingests October 2020, Gateways 2020
  • 2. Phase 1 - Gather data Gathering datasets from research partners • Your project is gathering datasets from partners. Each dataset is several TBs and takes ~a day to transfer over the network. • For the data to be useful, it needs descriptive metadata. • Ultimately, the team needs to find datasets that match specific criteria.
  • 3. What are the dataset ingest challenges? • Getting very large datasets transferred from gateway users’ systems to the central repository – (This is Scenario I - large-scale data transfer.) • Generating persistent identifiers for the data in the central repository so we can link metadata to data • Storing the metadata • Indexing the metadata to enable searching
  • 4. Demonstration Data ingests in a web application https://petraldata.net/
  • 5. What needs to be in place for it to work? • Data storage – Globus Connect Server on Petrel • Persistent identifiers – FAIR Research Identifier Service – Hosted by https://fair-research.org/ • Metadata storage, indexing, search – Globus Search API – Hosted by Globus
  • 6. Globus Connect Server on Petrel • Configured for self-service projects – Researchers do not receive local (Linux) accounts! – Uses Globus for authorization & management • Guest collections and groups – Project PIs request access by applying to join the “Petrel Project Owners” group (using the Globus web app) – Admin creates Globus group, makes PI a group manager – Admin creates guest collection, makes PI an access manager – Admin sets a quota of 100TB for the guest collection
  • 7. • RESTful web service, written in Python, that stores identifier metadata • Mints (creates) identifiers from external service providers using a unified service provider interface (SPI) • Different identifiers supported through namespaces • Client requests served as HTML landing pages or other machine-readable formats (e.g., JSON, JSON-LD) FAIR Research Identifiers AWS-RDS AWS-EC2 Postgres Registration SPI (Python) Web Server - REST API (Apache, Flask, Python) RDBMS ORM (SQLAlchemy) AuthN/AuthZ (Globus Auth, Globus Groups) Web Browser Client APIs HTML JSON, JSON-LD, other extensible renderings DataCite (DOI) EZID (ARK) Minid (Handle) https://minid.readthedocs.io/en/develop/
  • 8. • REST API provides a simple CRUD interface • Has other capabilities, like finding identifiers by checksum • JSON is used for request and response • Namespaces may also have their own handlers, landing pages, and other customizations. FAIR Research Identifiers
  • 9. Globus Search API • RESTful API for indexing & search – Hosted by Globus (including the metadata & index storage!) – Each project gets an “index” object (private tenancy) – REST API, Python client package, Python CLI • https://docs.globus.org/api/search/
  • 10. Globus Search API features • Scalable: to billions of entries • Schema agnostic: can use standard (e.g., DataCite) or custom metadata • Fine-grain access control: only returns results that are visible to user • Plain text search: ranked results • Faceted search: for data discovery • Rich query language: ranges, expressions, regex, fuzzy, stemming, etc.
  • 11. Key ingredients 1. UUID and base path for the guest collection where data is gathered 2. Minid Python client 3. UUID for Globus Search index 4. Your choice of appropriate metadata schema for your project’s datasets
  • 12. Code Data ingest in a web application

Notes de l'éditeur

  1. You’re working on a project with partners at other institutions, each of whom is analyzing unique samples and generating big datasets from them. You need to gather 100s of TBs of data on your campus’s HPC storage system. How can you make it easy for your partners to get the data from their labs to your server? And once it’s there, how are the partners going to understand each others’ datasets? First, they need to be able to see, in general, what’s been uploaded. Then, they need to find datasets that have specific features. NOTE: We’re presenting this as a single project, but at Globus, we see this happening for dozens-to-hundreds of research projects on a continuous basis. Our end goal is to enable research teams to do this routinely, without special planning or extraordinary measures by individual projects.
  2. Examples of “analysis on a community dataset”: Examples of ”analyze user’s data”: Examples of “download simulation results”: Examples of “submit data to a repository”:
  3. Petrel Data: https://petreldata.net/ Data storage is provided by the Advanced Leadership Computing Facility at Argonne National Laboratory. Petrel offers 100TB allocations to approved projects, with a total of 3PB of storage. Goal is to enable projects to self-manage themselves, including ingest, metadata management, index & search, and sharing permissions.
  4. PIs request access by applying to join a Globus group Petrel admin creates a project group for the PI and makes the PI a group manager Petrel admin creates a Globus guest collection with access managed by the PI Petrel admin also sets a quota of 100TB for the guest collection’s directory.
  5. https://github.com/globus/globus-jupyter-notebooks/blob/master/Data_Publication_Flow.ipynb