SlideShare une entreprise Scribd logo
1  sur  39
Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire
Data61 and LAND & WATER
Standard Proveance Reporting and
Scientifc Software Management in
Virtual Labs
What are VLs?
What is VHIRL?
What is provenance?
How does VHIRL manage provenance (or not)?
How do we represent VHIRL’s actions to standardised provenance?
What work, other than representation, is needed for provenance?
What benefits do we get from this work?
Outline
What are VLs?
From https://nectar.org.au/virtual-laboratories-1, they are:
data repositories and computational tools and streamlining research workflows
What are VLs?
What is VHIRL?
• Virtual Hazards Impact & Risk Laboratory (VHIRL) is a scientific
workflow portal
• Gives researchers access to a cloud computing for natural
hazards research
• data from a variety of sources
• uses cloud computing resources
• currently has tools for the earthquakes, tsunamis & tropical
cyclones in the Asia-Pacific region
What is VHIRL?
Components of the Virtual Lab: Virtual Hazard
Impact & Risk Laboratory (VHIRL)
Data Services Processing
Services
Compute
Services
Enablers
Virtual
Laboratories
/AppsData
Analytics
Magnetics
Gravity
DEM
eScript
ANUGA
NCI
Petascale
NCI
Cloud
NeCTAR
Cloud
Amazon
Cloud
Desktop
Service
Orchestration
Provenance
Metadata
Auth.
Coastal
Inundation
Tsuanmi
Inundation
Scenario
Cyclone Wind
Path Calculation
Landsat
Bathymetry
Cyclone Wind
Model
Surface Wave
Propagation
(earthquake)
TCRM
Connectivity via Provenance | Melanie Ayre | eResearch Australiasia 2015, Brisbane
What is provenance?
From http://en.wikipedia.org/wiki/Provenance#Computer_Science:
What is provenance?
“Computer science uses the term provenance to mean the
lineage of data or processes, as per data provenance.
However there is a field of informatics research within
computer science called provenance that studies how
provenance of data and processes should be characterised,
stored and used. Semantic web standards bodies, such as the
World Wide Web Consortium, ratified a standard for
provenance representation in 2014, known as PROV.”
How do we represent VLs using
standardised provenance?
• Natively tracks ‘everything’ used for scenario (re)runs
• Is not a: Data store, Software repo, Records mgt system
• Externalises as much information mgt as possible
• Code managed by the SSSC
VHIRL’s own data management
• SSSC is a web-based system to
manage code & dependencies
• Contains Problems &
Solutions that define a
workflow
• Solutions consists of a Toolbox
• Toolboxes are code wrapped
in a Python script +
description of the required
inputs
Scientific Solutions Software Centre (SSSC)
Class diagram for the SSSC
Scientific Solutions Software Centre (SSSC)
• Beautiful, RESTful API
this example:
http://vhirl-dev.csiro.au/scm/toolbox/2
• Solution  prov:Plan
• No RDF metadata, yet!
Mapping VHIRL to PROV 1
Input Data Process
Output
Data
Mapping VHIRL to PROV 2
Code Process
Output
Data
Config
Input Data
“Ontology Design Pattern”
Mapping VHIRL to PROV 3
Code Process
Output
Data
Config
Input Data
Who/
which
system
Who
used
Entity Activity Agent
Mapping VHIRL to PROMS
Report N
Entity Activity Agent
Reporting
System X
R.S. Report
Mapping VHIRL to PROMS
VHIRL provenance into PROMS Server
Report N
Entity Activity Agent
Reporting
System X
R.S. Report
Report N
Report N
Report M
Report NReporting
System Y
Report N
Report N
Report N
Organisational
Provenance
Store
reported and stored
Modelling VHIRL’s data types
VL Run
output
data
userThe VL
Report N
managed
data
web
service
data
user
supplied
data
managed
code
user
supplied
code
PROMS Reporting Toolkits
VHIRL’s native PROV output
RDF file
What work other, than
representation, is needed for
provenance?
Provenance effort (step) pyramid
Data Management
Establishing Reporting
Continued
Reporting
managed
data
web
service
data
user
supplied
data
managed
code
user
supplied
code
Data Management
output
data
all Entities need to
be ID’d (via URI)
and persisted
VL Run
each VL run is
reported as an
Activity within a
Report
each VL instance
has/needs an ID and
is modelled as a
Reporting System
user
each VL user is
known by their login
(account) details.
Modelled as a
Reporter
The VL
Report N
each VL Report is ID’d
and persisted in the VL
Provenance Store
managed
data
web
service
data
user
supplied
data
managed
code
user
supplied
code
Data Management
VL ID’d and persisted
output
data
cited using PROMS-O format
soon to be VL ID’d and persisted, with
minimal metadata recorded too
SSSC ID’s and persisted
perhaps SSSC ID’s and persisted,
perhaps VL managed
soon to be VL ID’d and persisted, if required,
perhaps with time limits
managed
data
web
service
data
user
supplied
data
managed
code
user
supplied
code
Data Management
VL ID’d and persisted
output
data
cited using PROMS-O format
soon to be VL ID’d and persisted, with
minimal metadata recorded too
SSSC ID’s and persisted
perhaps SSSC ID’s and persisted,
perhaps VL managed
soon to be VL ID’d and persisted, if required,
perhaps with time limits
Virtual Labs Service Citation Example
[{ref}] {service title}
{service endpoint URI}
{query}
{time queried}
{cached copy ID}
[1] “Subset of elevation”
http://pid.csiro.au/service/anuga-thredds
“bussleton.nc?var=elevation&spatial=bb&
north=-33.06495205829679&south=-
33.551573283840156&west=114.849678
74597227&east=115.70661233971667&t
emporal=all&time_start=&time_end=&hor
izStride”
“2014-12-15T13:15:11”
http://pid.csiro.au/dataset/abcd1234
Establishing Reporting
VL
Report
Organisational
Provenance
Store
querying & redelivery
ProvenanceReportingToolkit
C#
Java
Python
Establishing Reporting - Reporting Toolkits
managed
data
web
service
data
VL Run
“Grid X”
“Service Y”
“Run 456”
e1 = Entity(title='Grid X',
description='netCDF grid of property X',
uri='http://eg-vl.org.au/dataset/123',
downloadURL='http://eg-vl.org.au/dataset/123?_view=dl',
wasAttributedTo='http://data.ga.gov.au/id/person/john.doe')
Agent
N
Report N
Report for
Run 456
Establishing Reporting - Reporting Toolkits
managed
data
web
service
data
VL Run
“Grid X”
“Service Y”
“Run 456”
e1 = Entity(title='Grid X',
description='netCDF grid of property X',
uri='http://eg-vl.org.au/dataset/123',
downloadURL='http://eg-vl.org.au/dataset/123?_view=dl',
wasAttributedTo='http://data.ga.gov.au/id/person/john.doe')
Agent
N
e2 = ServiceEntity(
title='Subset of elevation',
description='5km solar radiation interpolated raster service',
serviceBaseUri='http://siss2.anu.edu.au/anuga/busselton.nc',
query='var=elevation&spatial=bb&north=-33.06495205&south=-
33.551573283&west=114.84967874&east=115.70661233&tempor
al=all&time_start=&time_end=&horizStride',
queriedAtTime='2014-12-15T13:15:11'
chachedCopy='http://bom.gov.au/dataset/678')
Report N
Report for
Run 456
Establishing Reporting - Reporting Toolkits
managed
data
web
service
data
VL Run
“Grid X”
“Service Y”
“Run 456”
Agent
N
a0 = Activity(
title='Run 456',
description='Upper bound run, full Grid X use',
wasAssociatedWith={VL added automatically},
startedAtTime={VL added automatically},
endedAtTime={VL added automatically},
usedEntities= [e1, e2],
generatedEntities={VL added automatically})Report N
Report for
Run 456
Establishing Reporting - Reporting Toolkits
managed
data
web
service
data
VL Run
“Grid X”
“Service Y”
“Run 456”
Agent
N
Report N
Report for
Run 456
r0 = Report(
title='Report for Run 456',
description='Upper bound run, full Grid X use',
startingActivity={VL added automatically},
endingActivity={VL added automatically})
rs0 = ReportSender('http://provstore.vl.org.au/report/')
rs.send(r0)
What do we get from this work?
Graph power!
Report NReporting
System X
...
URI power!
Report NReporting
System X
corporate
staff DB
temp repo
public web
service
DAP-style
repo
PROMS
instance
Distributed graphs!
GA PROMS
instance
VL PROMS
instance
Uni Prov
Store
Distributed Querying via endpoint cache

Contenu connexe

Tendances

Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta LakeKnoldus Inc.
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with SparkDataStax Academy
 
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...Spark Summit
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Data Con LA
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Monitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDB
Monitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDBMonitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDB
Monitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDBLeandro Totino Pereira
 
Case Study: Realtime Analytics with Druid
Case Study: Realtime Analytics with DruidCase Study: Realtime Analytics with Druid
Case Study: Realtime Analytics with DruidSalil Kalia
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsSriskandarajah Suhothayan
 
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...DataWorks Summit
 
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Data Con LA
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark Summit
 
Clear story _spark_
Clear story _spark_Clear story _spark_
Clear story _spark_Geetanjali G
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTGuido Schmutz
 
Reference architecture for Internet of Things
Reference architecture for Internet of ThingsReference architecture for Internet of Things
Reference architecture for Internet of ThingsSujee Maniyam
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSpark Summit
 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsKinetica
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaDataWorks Summit
 

Tendances (20)

Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Big Data Analytics with Spark
Big Data Analytics with SparkBig Data Analytics with Spark
Big Data Analytics with Spark
 
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Monitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDB
Monitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDBMonitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDB
Monitoring at scale - Sensu Kafka Kafka-connect Cassandra PrestoDB
 
Case Study: Realtime Analytics with Druid
Case Study: Realtime Analytics with DruidCase Study: Realtime Analytics with Druid
Case Study: Realtime Analytics with Druid
 
Data Privacy at Scale
Data Privacy at ScaleData Privacy at Scale
Data Privacy at Scale
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needs
 
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
 
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
 
Clear story _spark_
Clear story _spark_Clear story _spark_
Clear story _spark_
 
Reliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoTReliable Data Intestion in BigData / IoT
Reliable Data Intestion in BigData / IoT
 
Reference architecture for Internet of Things
Reference architecture for Internet of ThingsReference architecture for Internet of Things
Reference architecture for Internet of Things
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
 
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSetsWebinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and Kafka
 

Similaire à Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories

WOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsWOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsAndreas Kamilaris
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
Building Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSABuilding Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSADenim Group
 
Talavant Data Lake Analytics
Talavant Data Lake Analytics Talavant Data Lake Analytics
Talavant Data Lake Analytics Sean Forgatch
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
 
Database Firewall from Scratch
Database Firewall from ScratchDatabase Firewall from Scratch
Database Firewall from ScratchDenis Kolegov
 
Instrumentation and measurement
Instrumentation and measurementInstrumentation and measurement
Instrumentation and measurementDr.M.Prasad Naidu
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet
 
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...confluent
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
 
Как разработать DBFW с нуля
Как разработать DBFW с нуляКак разработать DBFW с нуля
Как разработать DBFW с нуляPositive Hack Days
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
State of Florida Neo4j Graph Briefing - Cyber IAM
State of Florida Neo4j Graph Briefing - Cyber IAMState of Florida Neo4j Graph Briefing - Cyber IAM
State of Florida Neo4j Graph Briefing - Cyber IAMNeo4j
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesSwami Sundaramurthy
 
SQL Server Ground to Cloud.pptx
SQL Server Ground to          Cloud.pptxSQL Server Ground to          Cloud.pptx
SQL Server Ground to Cloud.pptxsaidbilgen
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWSAmazon Web Services
 
Modern application development with oracle cloud sangam17
Modern application development with oracle cloud sangam17Modern application development with oracle cloud sangam17
Modern application development with oracle cloud sangam17Vinay Kumar
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Sourceaspyker
 
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...Lucas Jellema
 

Similaire à Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories (20)

WOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsWOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of Things
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Building Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSABuilding Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSA
 
Talavant Data Lake Analytics
Talavant Data Lake Analytics Talavant Data Lake Analytics
Talavant Data Lake Analytics
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
Database Firewall from Scratch
Database Firewall from ScratchDatabase Firewall from Scratch
Database Firewall from Scratch
 
Instrumentation and measurement
Instrumentation and measurementInstrumentation and measurement
Instrumentation and measurement
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Как разработать DBFW с нуля
Как разработать DBFW с нуляКак разработать DBFW с нуля
Как разработать DBFW с нуля
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
State of Florida Neo4j Graph Briefing - Cyber IAM
State of Florida Neo4j Graph Briefing - Cyber IAMState of Florida Neo4j Graph Briefing - Cyber IAM
State of Florida Neo4j Graph Briefing - Cyber IAM
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines
 
SQL Server Ground to Cloud.pptx
SQL Server Ground to          Cloud.pptxSQL Server Ground to          Cloud.pptx
SQL Server Ground to Cloud.pptx
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWS
 
Modern application development with oracle cloud sangam17
Modern application development with oracle cloud sangam17Modern application development with oracle cloud sangam17
Modern application development with oracle cloud sangam17
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
50 Shades of Data – how, when and why Big,Relational,NoSQL,Elastic,Graph,Even...
 

Dernier

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 

Dernier (20)

CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 

Standard Provenance Reporting and Scientific Software Management in Virtual Laboratories

  • 1. Catherine Wise, Nicholas J Car, Ryan Fraser and Geoff Squire Data61 and LAND & WATER Standard Proveance Reporting and Scientifc Software Management in Virtual Labs
  • 2. What are VLs? What is VHIRL? What is provenance? How does VHIRL manage provenance (or not)? How do we represent VHIRL’s actions to standardised provenance? What work, other than representation, is needed for provenance? What benefits do we get from this work? Outline
  • 4. From https://nectar.org.au/virtual-laboratories-1, they are: data repositories and computational tools and streamlining research workflows What are VLs?
  • 6. • Virtual Hazards Impact & Risk Laboratory (VHIRL) is a scientific workflow portal • Gives researchers access to a cloud computing for natural hazards research • data from a variety of sources • uses cloud computing resources • currently has tools for the earthquakes, tsunamis & tropical cyclones in the Asia-Pacific region What is VHIRL?
  • 7. Components of the Virtual Lab: Virtual Hazard Impact & Risk Laboratory (VHIRL) Data Services Processing Services Compute Services Enablers Virtual Laboratories /AppsData Analytics Magnetics Gravity DEM eScript ANUGA NCI Petascale NCI Cloud NeCTAR Cloud Amazon Cloud Desktop Service Orchestration Provenance Metadata Auth. Coastal Inundation Tsuanmi Inundation Scenario Cyclone Wind Path Calculation Landsat Bathymetry Cyclone Wind Model Surface Wave Propagation (earthquake) TCRM Connectivity via Provenance | Melanie Ayre | eResearch Australiasia 2015, Brisbane
  • 8.
  • 9.
  • 10.
  • 12. From http://en.wikipedia.org/wiki/Provenance#Computer_Science: What is provenance? “Computer science uses the term provenance to mean the lineage of data or processes, as per data provenance. However there is a field of informatics research within computer science called provenance that studies how provenance of data and processes should be characterised, stored and used. Semantic web standards bodies, such as the World Wide Web Consortium, ratified a standard for provenance representation in 2014, known as PROV.”
  • 13. How do we represent VLs using standardised provenance?
  • 14. • Natively tracks ‘everything’ used for scenario (re)runs • Is not a: Data store, Software repo, Records mgt system • Externalises as much information mgt as possible • Code managed by the SSSC VHIRL’s own data management
  • 15. • SSSC is a web-based system to manage code & dependencies • Contains Problems & Solutions that define a workflow • Solutions consists of a Toolbox • Toolboxes are code wrapped in a Python script + description of the required inputs Scientific Solutions Software Centre (SSSC) Class diagram for the SSSC
  • 16. Scientific Solutions Software Centre (SSSC) • Beautiful, RESTful API this example: http://vhirl-dev.csiro.au/scm/toolbox/2 • Solution  prov:Plan • No RDF metadata, yet!
  • 17. Mapping VHIRL to PROV 1 Input Data Process Output Data
  • 18. Mapping VHIRL to PROV 2 Code Process Output Data Config Input Data “Ontology Design Pattern”
  • 19. Mapping VHIRL to PROV 3 Code Process Output Data Config Input Data Who/ which system Who used Entity Activity Agent
  • 20. Mapping VHIRL to PROMS Report N Entity Activity Agent Reporting System X R.S. Report
  • 22. VHIRL provenance into PROMS Server Report N Entity Activity Agent Reporting System X R.S. Report Report N Report N Report M Report NReporting System Y Report N Report N Report N Organisational Provenance Store reported and stored
  • 23. Modelling VHIRL’s data types VL Run output data userThe VL Report N managed data web service data user supplied data managed code user supplied code
  • 25. VHIRL’s native PROV output RDF file
  • 26. What work other, than representation, is needed for provenance?
  • 27. Provenance effort (step) pyramid Data Management Establishing Reporting Continued Reporting
  • 28. managed data web service data user supplied data managed code user supplied code Data Management output data all Entities need to be ID’d (via URI) and persisted VL Run each VL run is reported as an Activity within a Report each VL instance has/needs an ID and is modelled as a Reporting System user each VL user is known by their login (account) details. Modelled as a Reporter The VL Report N each VL Report is ID’d and persisted in the VL Provenance Store
  • 29. managed data web service data user supplied data managed code user supplied code Data Management VL ID’d and persisted output data cited using PROMS-O format soon to be VL ID’d and persisted, with minimal metadata recorded too SSSC ID’s and persisted perhaps SSSC ID’s and persisted, perhaps VL managed soon to be VL ID’d and persisted, if required, perhaps with time limits
  • 30. managed data web service data user supplied data managed code user supplied code Data Management VL ID’d and persisted output data cited using PROMS-O format soon to be VL ID’d and persisted, with minimal metadata recorded too SSSC ID’s and persisted perhaps SSSC ID’s and persisted, perhaps VL managed soon to be VL ID’d and persisted, if required, perhaps with time limits Virtual Labs Service Citation Example [{ref}] {service title} {service endpoint URI} {query} {time queried} {cached copy ID} [1] “Subset of elevation” http://pid.csiro.au/service/anuga-thredds “bussleton.nc?var=elevation&spatial=bb& north=-33.06495205829679&south=- 33.551573283840156&west=114.849678 74597227&east=115.70661233971667&t emporal=all&time_start=&time_end=&hor izStride” “2014-12-15T13:15:11” http://pid.csiro.au/dataset/abcd1234
  • 31. Establishing Reporting VL Report Organisational Provenance Store querying & redelivery ProvenanceReportingToolkit C# Java Python
  • 32. Establishing Reporting - Reporting Toolkits managed data web service data VL Run “Grid X” “Service Y” “Run 456” e1 = Entity(title='Grid X', description='netCDF grid of property X', uri='http://eg-vl.org.au/dataset/123', downloadURL='http://eg-vl.org.au/dataset/123?_view=dl', wasAttributedTo='http://data.ga.gov.au/id/person/john.doe') Agent N Report N Report for Run 456
  • 33. Establishing Reporting - Reporting Toolkits managed data web service data VL Run “Grid X” “Service Y” “Run 456” e1 = Entity(title='Grid X', description='netCDF grid of property X', uri='http://eg-vl.org.au/dataset/123', downloadURL='http://eg-vl.org.au/dataset/123?_view=dl', wasAttributedTo='http://data.ga.gov.au/id/person/john.doe') Agent N e2 = ServiceEntity( title='Subset of elevation', description='5km solar radiation interpolated raster service', serviceBaseUri='http://siss2.anu.edu.au/anuga/busselton.nc', query='var=elevation&spatial=bb&north=-33.06495205&south=- 33.551573283&west=114.84967874&east=115.70661233&tempor al=all&time_start=&time_end=&horizStride', queriedAtTime='2014-12-15T13:15:11' chachedCopy='http://bom.gov.au/dataset/678') Report N Report for Run 456
  • 34. Establishing Reporting - Reporting Toolkits managed data web service data VL Run “Grid X” “Service Y” “Run 456” Agent N a0 = Activity( title='Run 456', description='Upper bound run, full Grid X use', wasAssociatedWith={VL added automatically}, startedAtTime={VL added automatically}, endedAtTime={VL added automatically}, usedEntities= [e1, e2], generatedEntities={VL added automatically})Report N Report for Run 456
  • 35. Establishing Reporting - Reporting Toolkits managed data web service data VL Run “Grid X” “Service Y” “Run 456” Agent N Report N Report for Run 456 r0 = Report( title='Report for Run 456', description='Upper bound run, full Grid X use', startingActivity={VL added automatically}, endingActivity={VL added automatically}) rs0 = ReportSender('http://provstore.vl.org.au/report/') rs.send(r0)
  • 36. What do we get from this work?
  • 38. URI power! Report NReporting System X corporate staff DB temp repo public web service DAP-style repo PROMS instance
  • 39. Distributed graphs! GA PROMS instance VL PROMS instance Uni Prov Store Distributed Querying via endpoint cache