SlideShare une entreprise Scribd logo
1  sur  34
Automating Real-time
Seismic Analysis
Through Streaming and High Throughput Workflows
Rafael Ferreira da Silva, Ph.D.
http://pegasus.isi.edu
Pegasus http://pegasus.isi.edu 2
Do we need seismic analysis?
Pegasus http://pegasus.isi.edu 3
USArray
A continental-scale Seismic Observatory
US Array TA (IRIS Service)
836 stations
394 stations have online data available
http://ds.iris.edu/ds/nodes/dmc/earthscope/usarray/_US-TA-operational/
Pegasus http://pegasus.isi.edu 4
The development of reliable risk assessment methods for these
hazards requires real-time analysis of seismic data
Pegasus http://pegasus.isi.edu 5
So, how to efficiently process these data?
Pegasus http://pegasus.isi.edu 6
Scientific Problem
Analytical Solution
Computational Scripts
Automation
Experiment Timeline
Monitoring
and Debug
Scientific Result
Models, Quality Control,
Image Analysis, etc.
Fault-tolerance, Provenance, etc.
Shell scripts, Python, Matlab, etc.
Earth Science, Astronomy,
Neuroinformatics,
Bioinformatics, etc.
Distributed
Computing
Workflows, MapReduce, etc.
Clusters, HPC,
Cloud, Grid, etc.
Pegasus http://pegasus.isi.edu 7
What is involved in an experiment execution?
Pegasus http://pegasus.isi.edu 8
Automate
Recover
Debug
Why Scientific Workflows?
Automates complex, multi-stage processing pipelines
Enables parallel, distributed computations
Automatically executes data transfers
Reusable, aids reproducibility
Records how data was produced (provenance)
Handles failures with to provide reliability
Keeps track of data and files
Pegasus http://pegasus.isi.edu 9
Taking a closer look into a workflow…
job
dependency
Usually data dependencies
split
merge
pipeline
Command-line programs
DAGdirected-acyclic graphs
abstract workflow
executable workflow
storage constraints
optimizations
Pegasus http://pegasus.isi.edu 10
From the abstraction to
execution!
stage-in job
stage-out job
registration job
Transfers the workflow input data
Transfers the workflow output data
Registers the workflow output data
abstract workflow
executable workflow
storage constraints
optimizations
Pegasus http://pegasus.isi.edu 11
Optimizing storage usage…
cleanup job
Removes unused data
abstract workflow
executable workflow
storage constraints
optimizations
Pegasus http://pegasus.isi.edu 12
Workflow systems provide tools to
generate the abstract workflow
dax = ADAG("test_dax")
firstJob = Job(name="first_job")
firstInputFile = File("input.txt")
firstOutputFile = File("tmp.txt")
firstJob.addArgument("input=input.txt", "output=tmp.txt")
firstJob.uses(firstInputFile, link=Link.INPUT)
firstJob.uses(firstOutputFile, link=Link.OUTPUT)
dax.addJob(firstJob)
for i in range(0, 5):
simulJob = Job(id="%s" % (i+1), name="simul_job”)
simulInputFile = File("tmp.txt”)
simulOutputFile = File("output.%d.dat" % i)
simulJob.addArgument("parameter=%d" % i, "input=tmp.txt”,
output=%s" % simulOutputFile.getName())
simulJob.uses(simulInputFile, link=Link.INPUT)
simulJob.uses(simulOutputFile, line=Link.OUTPUT)
dax.addJob(simulJob)
dax.depends(parent=firstJob, child=simulJob)
fp = open("test.dax", "w”)
dax.writeXML(fp)
fp.close()
abstract workflow
W h i c h W o r k f l o w
Management System?
Pegasus http://pegasus.isi.edu 13
PegasusNimrod
Pegasus http://pegasus.isi.edu 14
…and which model to use?
task-oriented stream-based
files
parallelism
no concurrency
data transfers via files
heterogeneous execution
tasks can run in heterogeneous
resources
streams
concurrency
tasks run concurrently
data transfers via
memory or message
homogeneous execution
tasks should run in homogeneous
resources
Pegasus http://pegasus.isi.edu 15
What does Pegasus provide?
Automation
Automates pipeline executions
Parallel, distributed computations
Automatically executes data transfers
Heterogeneous resources
Task-oriented model
Application is seen as a black box
Debug
Workflow execution and job performance metrics
Set of debugging tools to unveil issues
Real-time monitoring, graphs, provenance
Optimization
Job clustering
Data cleanup
Recovery
Job failure detection
Checkpoint Files
Job Retry
Rescue DAGs
Pegasus http://pegasus.isi.edu 16
…and dispel4py?
Automation
Automates pipeline executions
Concurrent, distributed computations
Stream-based model
Workflow Composition
Python Library
Grouping (all-to-all, all-to-one, one-to-all)
Optimization
Multiple streams (in/out)
Avoids I/O (shared memory
or message passing)
Mapping
Sequential
Multiprocessing (shared memory)
Distributed memory, message passing (MPI)
Distributed Real-time (Apache Storm)
Apache Spark (Prototype)
Pegasus http://pegasus.isi.edu 17
Asterism greatly simplifies the
effort required to develop
data-intensive applications that
run across multiple
heterogeneous resources
distributed in the wide area
Pegasus
sub-workflow
dispel4py workflow
ASTERISM
Pegasus http://pegasus.isi.edu 18
Where to run scientific workflows?
High Performance
Computing
http://pegasus.isi.edu 19Pegasus
shared
filesystem
submit host
(e.g., user’s laptop)
There are several possible configurations…
typically most HPC sites
Workflow
Engine
Cloud Computing
http://pegasus.isi.edu 20Pegasus
object
storage
submit host
(e.g., user’s laptop)
High-scalable object storages
Typical cloud computing deployment (Amazon S3,
Google Storage)
Workflow
Engine
http://pegasus.isi.edu 21Pegasus
submit host
(e.g., user’s laptop)
local data management
Typical OSG sites
Open Science Grid
Workflow
Engine
Grid Computing
http://pegasus.isi.edu 22Pegasus
shared
filesystem
submit host
(e.g., user’s laptop)
And yes… you can mix everything!
Compute site B
Compute site A
object storage
Workflow
Engine
Pegasus http://pegasus.isi.edu 23
How do we use Asterism
to automate seismic analysis?
WORKFLOW
Pegasus http://pegasus.isi.edu 24
Phase 1 (pre-process)
Phase 2 (cross-correlation)
Seismic Ambient Noise Cross-Correlation
Preprocesses and cross-correlates traces (sequences of measurements of acceleration in three
dimensions) from multiple seismic stations (IRIS database)
Phase 1: data preparation using statistics for
extracting information from the noise
Phase 2: compute correlation, identifying the time
for signals to travel between stations. Infers
properties of the intervening rock
Seismic Ambient Noise Cross-Correlation
Pegasus http://pegasus.isi.edu 25
Distributed computation framework for event stream processing
Designed for massive scalability, supports fault-tolerance with a “fail fast, auto restart” approach to
processes
Rich array of available spouts specialized for receiving data from all types of sources
Hadoop of real-time processing, very scalable
spout
bolt
http://pegasus.isi.edu 26
Seismic workflow execution
Pegasus
IRIS database
(stations)
data transfers between sites
performed by Pegasus
Compute site B
(Apache Storm)
Compute site A
(MPI-based)
input data (~150MB)
submit host
(e.g., user’s laptop)
output data (~40GB)
Phase 1
Phase 2
WORKFLOW
Pegasus http://pegasus.isi.edu 27
Southern California Earthquake Center’s
CyberShake
Builders ask seismologists: What will the peak ground motion be at my new building in the next 50 years?
Seismologists answer this question using Probabilistic Seismic Hazard Analysis (PSHA)
286 sites, 4 models
each workflow has 420,000 tasks
Pegasus http://pegasus.isi.edu 28
A few more workflow features…
Pegasus http://pegasus.isi.edu 29
Performance, why not improve it?
clustered job
Groups small jobs together
to improve performance
task
small granularity
workflow restructuring
workflow reduction
pegasus-mpi-cluster
hierarchical workflows
Pegasus http://pegasus.isi.edu 30
What about data reuse?
data already
available
Jobs which output data is
already available are pruned
from the DAG
data reuse
workflow restructuring
workflow reduction
pegasus-mpi-cluster
hierarchical workflows
workflow
reduction
data also
available
data reuse
http://pegasus.isi.edu 31
Handling large-scale workflows
pegasus-mpi-cluster
recursion ends
when workflow with
only compute jobs
is encountered
sub-workflow
sub-workflow
workflow restructuring
workflow reduction
hierarchical workflows
Pegasus http://pegasus.isi.edu 32
Running fine-grained workflows on HPC systems…
pegasus-mpi-cluster
HPC System
submit host
(e.g., user’s laptop)
workflow wrapped as an MPI job
Allows sub-graphs of a Pegasus workflow to be
submitted as monolithic jobs to remote resources
workflow restructuring
workflow reduction
hierarchical workflows
Pegasus
Automate, recover, and debug scientific computations.
Get Started
Pegasus Website
http://pegasus.isi.edu
Users Mailing List
pegasus-users@isi.edu
Support
pegasus-support@isi.edu
HipChat
Thank You
Questions?
Rafael Ferreira da Silva, Ph.D.
rafsilva@isi.edu Karan Vahi
Rafael Ferreira da Silva
Rajiv Mayani
Mats Rynge
Ewa Deelman
Automating Real-time
Seismic Analysis
Through Streaming and High Throughput Workflows

Contenu connexe

Tendances

A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanA Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanDatabricks
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveJoydeep Sen Sarma
 
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangOptimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangDatabricks
 
Apache REEF - stdlib for big data
Apache REEF - stdlib for big dataApache REEF - stdlib for big data
Apache REEF - stdlib for big dataSergiy Matusevych
 
Sparkling Water 5 28-14
Sparkling Water 5 28-14Sparkling Water 5 28-14
Sparkling Water 5 28-14Sri Ambati
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemDanny Yuan
 
Toward Semantic Sensor Data Archives on the Web
Toward Semantic Sensor Data Archives on the WebToward Semantic Sensor Data Archives on the Web
Toward Semantic Sensor Data Archives on the WebJean-Paul Calbimonte
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesDataWorks Summit
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Databricks
 
Deep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in SparkDeep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in SparkDataWorks Summit/Hadoop Summit
 
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native KubernetesSimplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native KubernetesDatabricks
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Stormviirya
 
Apache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and PresentApache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and PresentDatabricks
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Databricks
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
S4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformS4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformAleksandar Bradic
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High AvailabilityCloudera, Inc.
 
How Machine Learning and AI Can Support the Fight Against COVID-19
How Machine Learning and AI Can Support the Fight Against COVID-19How Machine Learning and AI Can Support the Fight Against COVID-19
How Machine Learning and AI Can Support the Fight Against COVID-19Databricks
 

Tendances (20)

A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanA Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen Fan
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspective
 
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangOptimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
 
Apache REEF - stdlib for big data
Apache REEF - stdlib for big dataApache REEF - stdlib for big data
Apache REEF - stdlib for big data
 
Sparkling Water 5 28-14
Sparkling Water 5 28-14Sparkling Water 5 28-14
Sparkling Water 5 28-14
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing system
 
Toward Semantic Sensor Data Archives on the Web
Toward Semantic Sensor Data Archives on the WebToward Semantic Sensor Data Archives on the Web
Toward Semantic Sensor Data Archives on the Web
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL Releases
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
 
MAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodianMAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodian
 
Deep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in SparkDeep recurrent neutral networks for Sequence Learning in Spark
Deep recurrent neutral networks for Sequence Learning in Spark
 
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native KubernetesSimplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Apache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and PresentApache Spark Performance: Past, Future and Present
Apache Spark Performance: Past, Future and Present
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
S4: Distributed Stream Computing Platform
S4: Distributed Stream Computing PlatformS4: Distributed Stream Computing Platform
S4: Distributed Stream Computing Platform
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
 
How Machine Learning and AI Can Support the Fight Against COVID-19
How Machine Learning and AI Can Support the Fight Against COVID-19How Machine Learning and AI Can Support the Fight Against COVID-19
How Machine Learning and AI Can Support the Fight Against COVID-19
 

En vedette

Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Rafael Ferreira da Silva
 
Delph v29 2 - DELPH Seismic
Delph v29   2 - DELPH SeismicDelph v29   2 - DELPH Seismic
Delph v29 2 - DELPH SeismicIXSEA-DELPH
 
Extended seismic processing sequence lecture 24
Extended seismic processing sequence lecture 24Extended seismic processing sequence lecture 24
Extended seismic processing sequence lecture 24Amin khalil
 
Seismic data processing (mathematical foundations)
Seismic data processing (mathematical foundations)Seismic data processing (mathematical foundations)
Seismic data processing (mathematical foundations)Amin khalil
 
Seismic data processing 14, stacking&migration2
Seismic data processing 14, stacking&migration2Seismic data processing 14, stacking&migration2
Seismic data processing 14, stacking&migration2Amin khalil
 
Extended seismic data processing lec25, fk filtering
Extended seismic data processing lec25, fk filteringExtended seismic data processing lec25, fk filtering
Extended seismic data processing lec25, fk filteringAmin khalil
 
Seismic data processing 12
Seismic data processing 12Seismic data processing 12
Seismic data processing 12Amin khalil
 
Lecture 19, marine seismic surveying
Lecture 19, marine seismic surveyingLecture 19, marine seismic surveying
Lecture 19, marine seismic surveyingAmin khalil
 
Seismic data processing 15, kirchhof migration
Seismic data processing 15, kirchhof migrationSeismic data processing 15, kirchhof migration
Seismic data processing 15, kirchhof migrationAmin khalil
 
Lecture 20, marine surveying 2
Lecture 20, marine surveying 2Lecture 20, marine surveying 2
Lecture 20, marine surveying 2Amin khalil
 
WesternGeco presentation - Seismic Data Processing
WesternGeco presentation - Seismic Data ProcessingWesternGeco presentation - Seismic Data Processing
WesternGeco presentation - Seismic Data ProcessingHatem Radwan
 
Seismic data processing 10
Seismic data processing 10Seismic data processing 10
Seismic data processing 10Amin khalil
 
Seismic data processing 16, migration&land seismic survey
Seismic data processing 16, migration&land seismic surveySeismic data processing 16, migration&land seismic survey
Seismic data processing 16, migration&land seismic surveyAmin khalil
 
Seismic data processing lecture 3
Seismic data processing lecture 3Seismic data processing lecture 3
Seismic data processing lecture 3Amin khalil
 
Application of integrated geophysical technique for the mapping
Application of integrated geophysical technique for the mappingApplication of integrated geophysical technique for the mapping
Application of integrated geophysical technique for the mappingAmin khalil
 
introductory seismology course for undergraduate students
introductory seismology course for undergraduate studentsintroductory seismology course for undergraduate students
introductory seismology course for undergraduate studentsAmin khalil
 
Seismic refraction method lec22
Seismic refraction method lec22Seismic refraction method lec22
Seismic refraction method lec22Amin khalil
 
Seismic data processing lecture 4
Seismic data processing lecture 4Seismic data processing lecture 4
Seismic data processing lecture 4Amin khalil
 
Seismic data processing introductory lecture
Seismic data processing introductory lectureSeismic data processing introductory lecture
Seismic data processing introductory lectureAmin khalil
 
Seismic refraction method lecture 21
Seismic refraction method lecture 21Seismic refraction method lecture 21
Seismic refraction method lecture 21Amin khalil
 

En vedette (20)

Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
 
Delph v29 2 - DELPH Seismic
Delph v29   2 - DELPH SeismicDelph v29   2 - DELPH Seismic
Delph v29 2 - DELPH Seismic
 
Extended seismic processing sequence lecture 24
Extended seismic processing sequence lecture 24Extended seismic processing sequence lecture 24
Extended seismic processing sequence lecture 24
 
Seismic data processing (mathematical foundations)
Seismic data processing (mathematical foundations)Seismic data processing (mathematical foundations)
Seismic data processing (mathematical foundations)
 
Seismic data processing 14, stacking&migration2
Seismic data processing 14, stacking&migration2Seismic data processing 14, stacking&migration2
Seismic data processing 14, stacking&migration2
 
Extended seismic data processing lec25, fk filtering
Extended seismic data processing lec25, fk filteringExtended seismic data processing lec25, fk filtering
Extended seismic data processing lec25, fk filtering
 
Seismic data processing 12
Seismic data processing 12Seismic data processing 12
Seismic data processing 12
 
Lecture 19, marine seismic surveying
Lecture 19, marine seismic surveyingLecture 19, marine seismic surveying
Lecture 19, marine seismic surveying
 
Seismic data processing 15, kirchhof migration
Seismic data processing 15, kirchhof migrationSeismic data processing 15, kirchhof migration
Seismic data processing 15, kirchhof migration
 
Lecture 20, marine surveying 2
Lecture 20, marine surveying 2Lecture 20, marine surveying 2
Lecture 20, marine surveying 2
 
WesternGeco presentation - Seismic Data Processing
WesternGeco presentation - Seismic Data ProcessingWesternGeco presentation - Seismic Data Processing
WesternGeco presentation - Seismic Data Processing
 
Seismic data processing 10
Seismic data processing 10Seismic data processing 10
Seismic data processing 10
 
Seismic data processing 16, migration&land seismic survey
Seismic data processing 16, migration&land seismic surveySeismic data processing 16, migration&land seismic survey
Seismic data processing 16, migration&land seismic survey
 
Seismic data processing lecture 3
Seismic data processing lecture 3Seismic data processing lecture 3
Seismic data processing lecture 3
 
Application of integrated geophysical technique for the mapping
Application of integrated geophysical technique for the mappingApplication of integrated geophysical technique for the mapping
Application of integrated geophysical technique for the mapping
 
introductory seismology course for undergraduate students
introductory seismology course for undergraduate studentsintroductory seismology course for undergraduate students
introductory seismology course for undergraduate students
 
Seismic refraction method lec22
Seismic refraction method lec22Seismic refraction method lec22
Seismic refraction method lec22
 
Seismic data processing lecture 4
Seismic data processing lecture 4Seismic data processing lecture 4
Seismic data processing lecture 4
 
Seismic data processing introductory lecture
Seismic data processing introductory lectureSeismic data processing introductory lecture
Seismic data processing introductory lecture
 
Seismic refraction method lecture 21
Seismic refraction method lecture 21Seismic refraction method lecture 21
Seismic refraction method lecture 21
 

Similaire à Automating Real-time Seismic Analysis Through Streaming and High Throughput Workflows

Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowDaniel S. Katz
 
Scientific
Scientific Scientific
Scientific marpierc
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningRafael Ferreira da Silva
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesIan Foster
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...Rafael Ferreira da Silva
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayQAware GmbH
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilitiesIan Foster
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAmazon Web Services
 
Hardware- and Network-Enhanced Software Systems for Cloud Computing, OW2 Open...
Hardware- and Network-Enhanced Software Systems for Cloud Computing, OW2 Open...Hardware- and Network-Enhanced Software Systems for Cloud Computing, OW2 Open...
Hardware- and Network-Enhanced Software Systems for Cloud Computing, OW2 Open...Ocean Project
 
Maria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data streamMaria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data streamPyData
 
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
Scalable Whole-Exome Sequence Data Processing Using Workflow On A CloudScalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud Paolo Missier
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsAntonio Severien
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...University of California, San Diego
 

Similaire à Automating Real-time Seismic Analysis Through Streaming and High Throughput Workflows (20)

Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 
Scientific
Scientific Scientific
Scientific
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
 
Tutorial5
Tutorial5Tutorial5
Tutorial5
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice Way
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Big data at experimental facilities
Big data at experimental facilitiesBig data at experimental facilities
Big data at experimental facilities
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache Storm
 
Hardware- and Network-Enhanced Software Systems for Cloud Computing, OW2 Open...
Hardware- and Network-Enhanced Software Systems for Cloud Computing, OW2 Open...Hardware- and Network-Enhanced Software Systems for Cloud Computing, OW2 Open...
Hardware- and Network-Enhanced Software Systems for Cloud Computing, OW2 Open...
 
Maria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data streamMaria Patterson - Building a community fountain around your data stream
Maria Patterson - Building a community fountain around your data stream
 
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
Scalable Whole-Exome Sequence Data Processing Using Workflow On A CloudScalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
 

Plus de Rafael Ferreira da Silva

Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Rafael Ferreira da Silva
 
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Rafael Ferreira da Silva
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Rafael Ferreira da Silva
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringRafael Ferreira da Silva
 
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsAccurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsRafael Ferreira da Silva
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Rafael Ferreira da Silva
 
WRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchWRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchRafael Ferreira da Silva
 
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsRafael Ferreira da Silva
 
Automating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsAutomating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsRafael Ferreira da Silva
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCAnalysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCRafael Ferreira da Silva
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresExperiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresRafael Ferreira da Silva
 
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...Rafael Ferreira da Silva
 
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific WorkflowsLeveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific WorkflowsRafael Ferreira da Silva
 
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...Rafael Ferreira da Silva
 
On-line, non-clairvoyant optimization of workflow activity granularity task o...
On-line, non-clairvoyant optimization of workflow activity granularity task o...On-line, non-clairvoyant optimization of workflow activity granularity task o...
On-line, non-clairvoyant optimization of workflow activity granularity task o...Rafael Ferreira da Silva
 
Workflow fairness control on online and non-clairvoyant distributed computing...
Workflow fairness control on online and non-clairvoyant distributed computing...Workflow fairness control on online and non-clairvoyant distributed computing...
Workflow fairness control on online and non-clairvoyant distributed computing...Rafael Ferreira da Silva
 
VIP: design and implementation of the portal and execution service
VIP: design and implementation of the portal and execution serviceVIP: design and implementation of the portal and execution service
VIP: design and implementation of the portal and execution serviceRafael Ferreira da Silva
 
A science-gateway workload archive application to the self-healing of workflo...
A science-gateway workload archive application to the self-healing of workflo...A science-gateway workload archive application to the self-healing of workflo...
A science-gateway workload archive application to the self-healing of workflo...Rafael Ferreira da Silva
 
A science-gateway workload archive to study pilot jobs, user activity, bag of...
A science-gateway workload archive to study pilot jobs, user activity, bag of...A science-gateway workload archive to study pilot jobs, user activity, bag of...
A science-gateway workload archive to study pilot jobs, user activity, bag of...Rafael Ferreira da Silva
 
Multi-infrastructure workflow execution for medical simulation in the Virtual...
Multi-infrastructure workflow execution for medical simulation in the Virtual...Multi-infrastructure workflow execution for medical simulation in the Virtual...
Multi-infrastructure workflow execution for medical simulation in the Virtual...Rafael Ferreira da Silva
 

Plus de Rafael Ferreira da Silva (20)

Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...
 
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
 
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsAccurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
 
WRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchWRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation Workbench
 
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
 
Automating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsAutomating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific Workflows
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCAnalysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTC
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresExperiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
 
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
 
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific WorkflowsLeveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
 
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
 
On-line, non-clairvoyant optimization of workflow activity granularity task o...
On-line, non-clairvoyant optimization of workflow activity granularity task o...On-line, non-clairvoyant optimization of workflow activity granularity task o...
On-line, non-clairvoyant optimization of workflow activity granularity task o...
 
Workflow fairness control on online and non-clairvoyant distributed computing...
Workflow fairness control on online and non-clairvoyant distributed computing...Workflow fairness control on online and non-clairvoyant distributed computing...
Workflow fairness control on online and non-clairvoyant distributed computing...
 
VIP: design and implementation of the portal and execution service
VIP: design and implementation of the portal and execution serviceVIP: design and implementation of the portal and execution service
VIP: design and implementation of the portal and execution service
 
A science-gateway workload archive application to the self-healing of workflo...
A science-gateway workload archive application to the self-healing of workflo...A science-gateway workload archive application to the self-healing of workflo...
A science-gateway workload archive application to the self-healing of workflo...
 
A science-gateway workload archive to study pilot jobs, user activity, bag of...
A science-gateway workload archive to study pilot jobs, user activity, bag of...A science-gateway workload archive to study pilot jobs, user activity, bag of...
A science-gateway workload archive to study pilot jobs, user activity, bag of...
 
Multi-infrastructure workflow execution for medical simulation in the Virtual...
Multi-infrastructure workflow execution for medical simulation in the Virtual...Multi-infrastructure workflow execution for medical simulation in the Virtual...
Multi-infrastructure workflow execution for medical simulation in the Virtual...
 

Dernier

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Dernier (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Automating Real-time Seismic Analysis Through Streaming and High Throughput Workflows

  • 1. Automating Real-time Seismic Analysis Through Streaming and High Throughput Workflows Rafael Ferreira da Silva, Ph.D. http://pegasus.isi.edu
  • 2. Pegasus http://pegasus.isi.edu 2 Do we need seismic analysis?
  • 3. Pegasus http://pegasus.isi.edu 3 USArray A continental-scale Seismic Observatory US Array TA (IRIS Service) 836 stations 394 stations have online data available http://ds.iris.edu/ds/nodes/dmc/earthscope/usarray/_US-TA-operational/
  • 4. Pegasus http://pegasus.isi.edu 4 The development of reliable risk assessment methods for these hazards requires real-time analysis of seismic data
  • 5. Pegasus http://pegasus.isi.edu 5 So, how to efficiently process these data?
  • 6. Pegasus http://pegasus.isi.edu 6 Scientific Problem Analytical Solution Computational Scripts Automation Experiment Timeline Monitoring and Debug Scientific Result Models, Quality Control, Image Analysis, etc. Fault-tolerance, Provenance, etc. Shell scripts, Python, Matlab, etc. Earth Science, Astronomy, Neuroinformatics, Bioinformatics, etc. Distributed Computing Workflows, MapReduce, etc. Clusters, HPC, Cloud, Grid, etc.
  • 7. Pegasus http://pegasus.isi.edu 7 What is involved in an experiment execution?
  • 8. Pegasus http://pegasus.isi.edu 8 Automate Recover Debug Why Scientific Workflows? Automates complex, multi-stage processing pipelines Enables parallel, distributed computations Automatically executes data transfers Reusable, aids reproducibility Records how data was produced (provenance) Handles failures with to provide reliability Keeps track of data and files
  • 9. Pegasus http://pegasus.isi.edu 9 Taking a closer look into a workflow… job dependency Usually data dependencies split merge pipeline Command-line programs DAGdirected-acyclic graphs abstract workflow executable workflow storage constraints optimizations
  • 10. Pegasus http://pegasus.isi.edu 10 From the abstraction to execution! stage-in job stage-out job registration job Transfers the workflow input data Transfers the workflow output data Registers the workflow output data abstract workflow executable workflow storage constraints optimizations
  • 11. Pegasus http://pegasus.isi.edu 11 Optimizing storage usage… cleanup job Removes unused data abstract workflow executable workflow storage constraints optimizations
  • 12. Pegasus http://pegasus.isi.edu 12 Workflow systems provide tools to generate the abstract workflow dax = ADAG("test_dax") firstJob = Job(name="first_job") firstInputFile = File("input.txt") firstOutputFile = File("tmp.txt") firstJob.addArgument("input=input.txt", "output=tmp.txt") firstJob.uses(firstInputFile, link=Link.INPUT) firstJob.uses(firstOutputFile, link=Link.OUTPUT) dax.addJob(firstJob) for i in range(0, 5): simulJob = Job(id="%s" % (i+1), name="simul_job”) simulInputFile = File("tmp.txt”) simulOutputFile = File("output.%d.dat" % i) simulJob.addArgument("parameter=%d" % i, "input=tmp.txt”, output=%s" % simulOutputFile.getName()) simulJob.uses(simulInputFile, link=Link.INPUT) simulJob.uses(simulOutputFile, line=Link.OUTPUT) dax.addJob(simulJob) dax.depends(parent=firstJob, child=simulJob) fp = open("test.dax", "w”) dax.writeXML(fp) fp.close() abstract workflow
  • 13. W h i c h W o r k f l o w Management System? Pegasus http://pegasus.isi.edu 13 PegasusNimrod
  • 14. Pegasus http://pegasus.isi.edu 14 …and which model to use? task-oriented stream-based files parallelism no concurrency data transfers via files heterogeneous execution tasks can run in heterogeneous resources streams concurrency tasks run concurrently data transfers via memory or message homogeneous execution tasks should run in homogeneous resources
  • 15. Pegasus http://pegasus.isi.edu 15 What does Pegasus provide? Automation Automates pipeline executions Parallel, distributed computations Automatically executes data transfers Heterogeneous resources Task-oriented model Application is seen as a black box Debug Workflow execution and job performance metrics Set of debugging tools to unveil issues Real-time monitoring, graphs, provenance Optimization Job clustering Data cleanup Recovery Job failure detection Checkpoint Files Job Retry Rescue DAGs
  • 16. Pegasus http://pegasus.isi.edu 16 …and dispel4py? Automation Automates pipeline executions Concurrent, distributed computations Stream-based model Workflow Composition Python Library Grouping (all-to-all, all-to-one, one-to-all) Optimization Multiple streams (in/out) Avoids I/O (shared memory or message passing) Mapping Sequential Multiprocessing (shared memory) Distributed memory, message passing (MPI) Distributed Real-time (Apache Storm) Apache Spark (Prototype)
  • 17. Pegasus http://pegasus.isi.edu 17 Asterism greatly simplifies the effort required to develop data-intensive applications that run across multiple heterogeneous resources distributed in the wide area Pegasus sub-workflow dispel4py workflow ASTERISM
  • 18. Pegasus http://pegasus.isi.edu 18 Where to run scientific workflows?
  • 19. High Performance Computing http://pegasus.isi.edu 19Pegasus shared filesystem submit host (e.g., user’s laptop) There are several possible configurations… typically most HPC sites Workflow Engine
  • 20. Cloud Computing http://pegasus.isi.edu 20Pegasus object storage submit host (e.g., user’s laptop) High-scalable object storages Typical cloud computing deployment (Amazon S3, Google Storage) Workflow Engine
  • 21. http://pegasus.isi.edu 21Pegasus submit host (e.g., user’s laptop) local data management Typical OSG sites Open Science Grid Workflow Engine Grid Computing
  • 22. http://pegasus.isi.edu 22Pegasus shared filesystem submit host (e.g., user’s laptop) And yes… you can mix everything! Compute site B Compute site A object storage Workflow Engine
  • 23. Pegasus http://pegasus.isi.edu 23 How do we use Asterism to automate seismic analysis?
  • 24. WORKFLOW Pegasus http://pegasus.isi.edu 24 Phase 1 (pre-process) Phase 2 (cross-correlation) Seismic Ambient Noise Cross-Correlation Preprocesses and cross-correlates traces (sequences of measurements of acceleration in three dimensions) from multiple seismic stations (IRIS database) Phase 1: data preparation using statistics for extracting information from the noise Phase 2: compute correlation, identifying the time for signals to travel between stations. Infers properties of the intervening rock
  • 25. Seismic Ambient Noise Cross-Correlation Pegasus http://pegasus.isi.edu 25 Distributed computation framework for event stream processing Designed for massive scalability, supports fault-tolerance with a “fail fast, auto restart” approach to processes Rich array of available spouts specialized for receiving data from all types of sources Hadoop of real-time processing, very scalable spout bolt
  • 26. http://pegasus.isi.edu 26 Seismic workflow execution Pegasus IRIS database (stations) data transfers between sites performed by Pegasus Compute site B (Apache Storm) Compute site A (MPI-based) input data (~150MB) submit host (e.g., user’s laptop) output data (~40GB) Phase 1 Phase 2
  • 27. WORKFLOW Pegasus http://pegasus.isi.edu 27 Southern California Earthquake Center’s CyberShake Builders ask seismologists: What will the peak ground motion be at my new building in the next 50 years? Seismologists answer this question using Probabilistic Seismic Hazard Analysis (PSHA) 286 sites, 4 models each workflow has 420,000 tasks
  • 28. Pegasus http://pegasus.isi.edu 28 A few more workflow features…
  • 29. Pegasus http://pegasus.isi.edu 29 Performance, why not improve it? clustered job Groups small jobs together to improve performance task small granularity workflow restructuring workflow reduction pegasus-mpi-cluster hierarchical workflows
  • 30. Pegasus http://pegasus.isi.edu 30 What about data reuse? data already available Jobs which output data is already available are pruned from the DAG data reuse workflow restructuring workflow reduction pegasus-mpi-cluster hierarchical workflows workflow reduction data also available data reuse
  • 31. http://pegasus.isi.edu 31 Handling large-scale workflows pegasus-mpi-cluster recursion ends when workflow with only compute jobs is encountered sub-workflow sub-workflow workflow restructuring workflow reduction hierarchical workflows
  • 32. Pegasus http://pegasus.isi.edu 32 Running fine-grained workflows on HPC systems… pegasus-mpi-cluster HPC System submit host (e.g., user’s laptop) workflow wrapped as an MPI job Allows sub-graphs of a Pegasus workflow to be submitted as monolithic jobs to remote resources workflow restructuring workflow reduction hierarchical workflows
  • 33. Pegasus Automate, recover, and debug scientific computations. Get Started Pegasus Website http://pegasus.isi.edu Users Mailing List pegasus-users@isi.edu Support pegasus-support@isi.edu HipChat
  • 34. Thank You Questions? Rafael Ferreira da Silva, Ph.D. rafsilva@isi.edu Karan Vahi Rafael Ferreira da Silva Rajiv Mayani Mats Rynge Ewa Deelman Automating Real-time Seismic Analysis Through Streaming and High Throughput Workflows

Notes de l'éditeur

  1. Handle failures with to provide reliability
  2. Handle failures with to provide reliability
  3. Handle failures with to provide reliability
  4. Handle failures with to provide reliability