SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Working with Instrument
Data
Ryan Chard
rchard@anl.gov
Overview
• Data management challenges
• Managing instrument data with Globus
• Use cases and lessons learned
• Notebook demo
Data management challenges
• Event Horizon Telescope
– 12 telescopes
– Generate 900TB per 5 day run
– Data written to ~1000 HDDs
– Transported to MIT & Max Planck via airplane
– Aggregated and analyzed
• Global resources, long timescales
• Too much data for manual processing
• Data loss due to HDD failure
Research data management challenges
• Data acquired at various
locations/times
• Analyses executed on
distributed resources
• Catalogs of descriptive
metadata and provenance
• Dynamic collaborations
around data and analysis
Raw
data
store
Catalog
DOE LabCampus
Community Archive
NIH
Exacerbated by large scale science
• Best practices overlooked, useful
data forgotten, errors propagate
• Researchers allocated short periods
of instrument time
• Inefficiencies -> less science
• Errors -> long delays, missed
opportunity …forever!
Scientific Data Lifecycle
data.library.virginia.edu/data-management/lifecycle/
Goal
Automate data manipulation tasks
from acquisition, transfer and
sharing, to publication, indexing,
analysis, and inference
Automation and Globus
• Globus provides a rich data management
ecosystem for both admins and
researchers
• Compose multiple services into reliable,
secure data management pipelines
• Execute on behalf of users
• Create data-aware automations that
respond to data events
Globus Services
• Transfer: Move data, set ACLs, create shares
• Search: Find data, catalog metadata
• Identifiers: Mint persistent IDs for datasets
• Auth: Glue that ties everything together
Globus Auth
• Programmatic and secure access to both Globus
services and any third party services that support it
• Grant permission to apps to act on your behalf
– Dependent tokens
• Refresh tokens enable one-time authentication that
can be put into long-running pipelines
Automation via Globus
Glue services together
• Globus SDK (docs.globus.org)
• Scripting with the Globus CLI
– globus task wait
• Automate
Example Use Cases
• Advanced Photon Source
– Connectomics
– Time series spectroscopy
• Scanning Electron Microscope
UChicago Kasthuri Lab: Brain aging and disease
• Construct connectomes—mapping of neuron connections
• Use APS synchrotron to rapidly image brains
– Beam time available once every few months
– ~20GB/minute for large (cm) unsectioned brains
• Generate segmented datasets/visualizations for the community
• Perform semi-standard reconstruction on all data across HPC
resources
Original Approach
• Collect data—20 mins
• Move to a local machine
• Generate previews for a couple of images—5 mins
• Collect more data
• Initiate local reconstruction—1 hour
• Batch process the rest after beamtime
Advanced Photon Source
Argonne Leadership
Computing Facility
1 km
5μsec
15
Requirements
• Accomodate many different beamline users of
different skillsets
– Automatically apply a “base” reconstruction to data
• Leverage HPC due to computational requirements
• Unobstructive to the user
Ripple: A Trigger-Action platform for data
• Provided set of triggers and
actions to create rules
• Ripple processes data
triggers and reliably
executes actions
• Usable by non-experts
• Daisy-chain rules for
complex flows
Not product!
Data-driven automation
• Filesystem-specific
tools monitor and
report events
– inotify (Linux
– FSWatch (macOS)
• Capture local data
events
– Create, delete, move
Watchdog: github.com/gorakhargosh/watchdog
Argonne JLSEUChicago
Argonne
Leadership
Computing
FacilityAPS
Publication7
Building the connectome
Imaging1
Lab Server 1
Acquisition2
Lab Server 2
Pre-processing3 Preview/Center4
Reconstruction6Visualization8
User validation5
Science!9
Neuroanatomy
reconstruction
pipeline
New Approach
• Detect data as they are collected
• Automaticlaly move data to ALCF
• Initiate a preview and reconstruction
• Detect the preview and move it back to the APS
• Move results to shared storage
• Catalog data in Search for users to explore
Lessons Learned
• Automate data capture where possible - far easier
than convincing people to run things
• Transparency is critical - operators need the ability to
debug at 3am
• “Manual” automation is better than no automation
Scanning Electron Microscope
Rapidly process SEM images to flag bad data while
samples are still in the machine
Good Bad
SEM Focus
1. Slice the image into 6 random
subsections
1. Apply Laplacian blob detection
1. Use NN to classify as in or out
of focus
Credit: Aarthi Koripelly
DLHub
• Collect, publish, categorize models from many
disciplines (materials science, physics, chemistry,
genomics, etc.)
• Serve model inference on-demand via API to simplify
sharing, consumption, and access
• Enable new science through reuse, real-time model-in-
the-loop integration, and synthesis & ensembling of
existing models
Using DLHub
1
2
3
Describe
Publish
Run
Secured with Globus
Auth to verify users
Inference are performed
at ALCF’s PetrelKube
Use Globus-accessible
data as inputs (HTTPS)
Processing SEM Data with DLHub
• Detect files placed in a “/process” directory
• Move data to Petrel
• Generate input for DLHub
• Invoke DLHub
• Put results in a Search index for users
• Append to a list in a “/results” folder
Example
Lessons Learned
• Perfect, complex hooks are sometimes unnecessary
• No value if the user can’t easily find and use the
result
• Outsource and leverage special-purpose services
– You don’t need to do everything
X-ray Photon Correlation Spectroscopy
• APS Beamline 8-ID
• Generate a lot of data
– Images every ~10 seconds
• Apply XPCS-Eigen tool to
HDF files containing many
images
Current Approach
• Internal workflow engine is started in response to
data
• Enormous bash scripts everywhere
• Restricted to local resources
• Any new tools must fit into their dashboard
Current Approach
• Plug in a new step to kick off Globus actions
• Fits into existing dashboards
• Easy for them to debug -- stand alone, or in the flow
Lessons Learned
• Everyone loves automation
• Everyone has a “working” solution and doesn’t want
to change
• Make results easy to find - no value without results
• HPC timeouts are still a pain
Thanks! Questions?
rchard@anl.gov

Contenu connexe

Tendances

Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
Ian Foster
 
Ilkay Altintas: Kepler
Ilkay Altintas: KeplerIlkay Altintas: Kepler
Ilkay Altintas: Kepler
David LeBauer
 
Current Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data BenchmarkingCurrent Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data Benchmarking
eXascale Infolab
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
Ian Foster
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
Ian Foster
 

Tendances (20)

Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Ilkay Altintas: Kepler
Ilkay Altintas: KeplerIlkay Altintas: Kepler
Ilkay Altintas: Kepler
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging Services
 
Cloud com foster december 2010
Cloud com foster december 2010Cloud com foster december 2010
Cloud com foster december 2010
 
Current Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data BenchmarkingCurrent Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data Benchmarking
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
 
Globus Integrations (CHPC 2019 - South Africa)
Globus Integrations (CHPC 2019 - South Africa)Globus Integrations (CHPC 2019 - South Africa)
Globus Integrations (CHPC 2019 - South Africa)
 
Intro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LAIntro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LA
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
 
Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant Store
 
DIET_BLAST
DIET_BLASTDIET_BLAST
DIET_BLAST
 
GeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusGeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with Globus
 

Similaire à Working with Instrument Data (GlobusWorld Tour - UMich)

Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Databricks
 

Similaire à Working with Instrument Data (GlobusWorld Tour - UMich) (20)

Globus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus Labs: Forging the Next Frontier
Globus Labs: Forging the Next Frontier
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
 
Research Cyberinfrastructure at UCSD - David Minor - RDAP12
Research Cyberinfrastructure at UCSD - David Minor - RDAP12Research Cyberinfrastructure at UCSD - David Minor - RDAP12
Research Cyberinfrastructure at UCSD - David Minor - RDAP12
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
 
Big Data
Big Data Big Data
Big Data
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Pieper NISO Virtual Conf Feb17
Pieper NISO Virtual Conf Feb17Pieper NISO Virtual Conf Feb17
Pieper NISO Virtual Conf Feb17
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Climb stateoftheartintro
Climb stateoftheartintroClimb stateoftheartintro
Climb stateoftheartintro
 
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the Cloud
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 

Plus de Globus

Plus de Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 

Dernier

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Dernier (20)

Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

Working with Instrument Data (GlobusWorld Tour - UMich)

  • 1. Working with Instrument Data Ryan Chard rchard@anl.gov
  • 2. Overview • Data management challenges • Managing instrument data with Globus • Use cases and lessons learned • Notebook demo
  • 3. Data management challenges • Event Horizon Telescope – 12 telescopes – Generate 900TB per 5 day run – Data written to ~1000 HDDs – Transported to MIT & Max Planck via airplane – Aggregated and analyzed • Global resources, long timescales • Too much data for manual processing • Data loss due to HDD failure
  • 4. Research data management challenges • Data acquired at various locations/times • Analyses executed on distributed resources • Catalogs of descriptive metadata and provenance • Dynamic collaborations around data and analysis Raw data store Catalog DOE LabCampus Community Archive NIH
  • 5. Exacerbated by large scale science • Best practices overlooked, useful data forgotten, errors propagate • Researchers allocated short periods of instrument time • Inefficiencies -> less science • Errors -> long delays, missed opportunity …forever!
  • 7. Goal Automate data manipulation tasks from acquisition, transfer and sharing, to publication, indexing, analysis, and inference
  • 8. Automation and Globus • Globus provides a rich data management ecosystem for both admins and researchers • Compose multiple services into reliable, secure data management pipelines • Execute on behalf of users • Create data-aware automations that respond to data events
  • 9. Globus Services • Transfer: Move data, set ACLs, create shares • Search: Find data, catalog metadata • Identifiers: Mint persistent IDs for datasets • Auth: Glue that ties everything together
  • 10. Globus Auth • Programmatic and secure access to both Globus services and any third party services that support it • Grant permission to apps to act on your behalf – Dependent tokens • Refresh tokens enable one-time authentication that can be put into long-running pipelines
  • 11. Automation via Globus Glue services together • Globus SDK (docs.globus.org) • Scripting with the Globus CLI – globus task wait • Automate
  • 12. Example Use Cases • Advanced Photon Source – Connectomics – Time series spectroscopy • Scanning Electron Microscope
  • 13. UChicago Kasthuri Lab: Brain aging and disease • Construct connectomes—mapping of neuron connections • Use APS synchrotron to rapidly image brains – Beam time available once every few months – ~20GB/minute for large (cm) unsectioned brains • Generate segmented datasets/visualizations for the community • Perform semi-standard reconstruction on all data across HPC resources
  • 14. Original Approach • Collect data—20 mins • Move to a local machine • Generate previews for a couple of images—5 mins • Collect more data • Initiate local reconstruction—1 hour • Batch process the rest after beamtime
  • 15. Advanced Photon Source Argonne Leadership Computing Facility 1 km 5μsec 15
  • 16. Requirements • Accomodate many different beamline users of different skillsets – Automatically apply a “base” reconstruction to data • Leverage HPC due to computational requirements • Unobstructive to the user
  • 17. Ripple: A Trigger-Action platform for data • Provided set of triggers and actions to create rules • Ripple processes data triggers and reliably executes actions • Usable by non-experts • Daisy-chain rules for complex flows Not product!
  • 18. Data-driven automation • Filesystem-specific tools monitor and report events – inotify (Linux – FSWatch (macOS) • Capture local data events – Create, delete, move Watchdog: github.com/gorakhargosh/watchdog
  • 19. Argonne JLSEUChicago Argonne Leadership Computing FacilityAPS Publication7 Building the connectome Imaging1 Lab Server 1 Acquisition2 Lab Server 2 Pre-processing3 Preview/Center4 Reconstruction6Visualization8 User validation5 Science!9 Neuroanatomy reconstruction pipeline
  • 20. New Approach • Detect data as they are collected • Automaticlaly move data to ALCF • Initiate a preview and reconstruction • Detect the preview and move it back to the APS • Move results to shared storage • Catalog data in Search for users to explore
  • 21. Lessons Learned • Automate data capture where possible - far easier than convincing people to run things • Transparency is critical - operators need the ability to debug at 3am • “Manual” automation is better than no automation
  • 22. Scanning Electron Microscope Rapidly process SEM images to flag bad data while samples are still in the machine Good Bad
  • 23. SEM Focus 1. Slice the image into 6 random subsections 1. Apply Laplacian blob detection 1. Use NN to classify as in or out of focus Credit: Aarthi Koripelly
  • 24. DLHub • Collect, publish, categorize models from many disciplines (materials science, physics, chemistry, genomics, etc.) • Serve model inference on-demand via API to simplify sharing, consumption, and access • Enable new science through reuse, real-time model-in- the-loop integration, and synthesis & ensembling of existing models
  • 25. Using DLHub 1 2 3 Describe Publish Run Secured with Globus Auth to verify users Inference are performed at ALCF’s PetrelKube Use Globus-accessible data as inputs (HTTPS)
  • 26. Processing SEM Data with DLHub • Detect files placed in a “/process” directory • Move data to Petrel • Generate input for DLHub • Invoke DLHub • Put results in a Search index for users • Append to a list in a “/results” folder
  • 28. Lessons Learned • Perfect, complex hooks are sometimes unnecessary • No value if the user can’t easily find and use the result • Outsource and leverage special-purpose services – You don’t need to do everything
  • 29. X-ray Photon Correlation Spectroscopy • APS Beamline 8-ID • Generate a lot of data – Images every ~10 seconds • Apply XPCS-Eigen tool to HDF files containing many images
  • 30. Current Approach • Internal workflow engine is started in response to data • Enormous bash scripts everywhere • Restricted to local resources • Any new tools must fit into their dashboard
  • 31.
  • 32. Current Approach • Plug in a new step to kick off Globus actions • Fits into existing dashboards • Easy for them to debug -- stand alone, or in the flow
  • 33. Lessons Learned • Everyone loves automation • Everyone has a “working” solution and doesn’t want to change • Make results easy to find - no value without results • HPC timeouts are still a pain