SlideShare a Scribd company logo
1 of 23
Download to read offline
SHARP: Harmonizing Galaxy and Taverna workflow
provenance
SeWeBMeDA’17 - Demonstration
Alban Gaignard1
, Khalid Belhajjame2
, Hala Skaf-Molli3
May 28, 2017
1
Nantes Academic Hospital, France
2
LAMSADE Paris-Dauphine University, France
3
LS2N - Nantes University, France
Multiple workflow engines
Taverna workflow
@research-lab
Galaxy workflow
@sequencing-facility
Variant effect
prediction
VCF file
Exon filtering
output
Merge
Alignment
sample
1.a.R1
sample
1.a.R2
Alignment
sample
1.b.R1
sample
1.b.R2
Alignment
sample
2.R1
sample
2.R2
Sort Sort
Variant calling
GRCh37
go to owl:sameAs
A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 1
SHARP approach
owl:sameAs
inferred
PROV
PROV
trace
PROV
trace
nanopub
PROV interlinking PROV harmonization PROV summarization
11 12 13
…
14
A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 5
Demonstration scenario
– Provenance capture
— Provenance interlinking
˜ Provenance harmonization
™ Provenance summarization (influence graphs,
nanopublications)
• https://github.com/albangaignard/galaxy-PROV
• https://github.com/albangaignard/sharp-prov-toolbox
A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 6
– Provenance capture
Taverna
Built-in when saving workflow execution results.
Galaxy
GALAXY-PROV tool + web interface:
• API key
• list Galaxy data processing histories
• generate PROV (turtle)
• visualize PROV (D3.js)
https://github.com/albangaignard/galaxy-PROV
A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 7
Galaxy workflow provenance capture demo
A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 7
— Provenance interlinking
1. SHA-512 fingerprint of files
2. annotating PROV entities with SHA-512 digest
3. producing owl:sameAs → SPARQL CONSTRUCT-WHERE query
Command line tool
java -jar SharpProvToolbox/target/SHARP-1.0-SNAPSHOT-launcher.jar
-ri sample-data/control_mm9_chr15_Plekhh2-PigF_forward.fastq
sample-data/control_mm9_chr15_Plekhh2-PigF_reverse.fastq
sample-data/drugged_mm9_chr15_Plekhh2-PigF_forward.fastq
sample-data/drugged_mm9_chr15_Plekhh2-PigF_reverse.fastq
sample-data/unknown.fastq
A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 10
˜ Provenance harmonization
1. OWL entailments, Jena API
ReasonerRegistry.getOWLMiniReasoner()
2. PROV inferences (TGD), Jena rule engine
new GenericRuleReasoner(all prov rules)
3. Blank nodes removing (EGD)
Command line tool
java -jar SharpProvToolbox/target/SHARP-1.0-SNAPSHOT-launcher.jar
-i sample-data/taverna.prov.ttl
sample-data/galaxy.prov.ttl
sample-data/sameas.ttl
A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 12
™ Provenance summarization: influence graph
CONSTRUCT {
?x ?p ?y .
?x rdfs:label ?lx .
?y rdfs:label ?ly .
} WHERE {
?x ?p ?y .
FILTER (?p IN (prov:wasInfluencedBy)) .
?x rdfs:label ?lx .
?y rdfs:label ?ly .
}
+ HTML/D3.js code generation
Command line tool
java -jar SharpProvToolbox/target/SHARP-1.0-SNAPSHOT-launcher.jar
-i sample-data/taverna.prov.ttl
sample-data/galaxy.prov.ttl
sample-data/sameas.ttl
-s
A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 14
™ Provenance summarization: nanopublication
CONSTRUCT {
GRAPH :assertion {
?ref_genome a sio:Genome .
?sample a sio:Sample ;
sio:is-variant-of ?ref_genome ;
sio:has-phenotype ?out .
[...]
}
} WHERE {
[...] ?out ( prov:wasInfluencedBy )+ ?sample . [...]
}
Command line tool
java -jar SharpProvToolbox/target/SHARP-1.0-SNAPSHOT-launcher.jar
-i sample-data/taverna.prov.ttl
sample-data/galaxy.prov.ttl
sample-data/sameas.ttl
-sq sample-data/nanopub.query
A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 16
Questions ?
alban.gaignard@univ-nantes.fr
Acknowledgments
Backup slides
A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 19
PROV-O ontology
https://www.w3.org/TR/prov-o
A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 20

More Related Content

What's hot

computer notes - Data Structures - 7
computer notes - Data Structures - 7computer notes - Data Structures - 7
computer notes - Data Structures - 7
ecomputernotes
 
Pressure drop model presentation april 19th
Pressure drop model presentation april 19thPressure drop model presentation april 19th
Pressure drop model presentation april 19th
Yen Nguyen
 

What's hot (17)

Chapter 6
Chapter 6Chapter 6
Chapter 6
 
VHDL PROGRAMS FEW EXAMPLES
VHDL PROGRAMS FEW EXAMPLESVHDL PROGRAMS FEW EXAMPLES
VHDL PROGRAMS FEW EXAMPLES
 
Verifikation - Metoder og Libraries
Verifikation - Metoder og LibrariesVerifikation - Metoder og Libraries
Verifikation - Metoder og Libraries
 
Digital system design practical file
Digital system design practical fileDigital system design practical file
Digital system design practical file
 
PVS-Studio is there to help CERN: analysis of Geant4 project
PVS-Studio is there to help CERN: analysis of Geant4 projectPVS-Studio is there to help CERN: analysis of Geant4 project
PVS-Studio is there to help CERN: analysis of Geant4 project
 
Communicating Sequential Processes (CSP) in JavaScript
Communicating Sequential Processes (CSP) in JavaScriptCommunicating Sequential Processes (CSP) in JavaScript
Communicating Sequential Processes (CSP) in JavaScript
 
computer notes - Data Structures - 7
computer notes - Data Structures - 7computer notes - Data Structures - 7
computer notes - Data Structures - 7
 
Fighting async JavaScript (CSP)
Fighting async JavaScript (CSP)Fighting async JavaScript (CSP)
Fighting async JavaScript (CSP)
 
Applying QbD to Biotech Process Validation
Applying QbD to Biotech Process ValidationApplying QbD to Biotech Process Validation
Applying QbD to Biotech Process Validation
 
Agile Iphone Development
Agile Iphone DevelopmentAgile Iphone Development
Agile Iphone Development
 
Kubernetes で実現するインフラ自動構築パイプライン
Kubernetes で実現するインフラ自動構築パイプラインKubernetes で実現するインフラ自動構築パイプライン
Kubernetes で実現するインフラ自動構築パイプライン
 
6.Process Synchronization
6.Process Synchronization6.Process Synchronization
6.Process Synchronization
 
XpUg Coding Dojo: KataYahtzee in Ocp way
XpUg Coding Dojo: KataYahtzee in Ocp wayXpUg Coding Dojo: KataYahtzee in Ocp way
XpUg Coding Dojo: KataYahtzee in Ocp way
 
4bit parity
4bit parity4bit parity
4bit parity
 
Deep Dumpster Diving
Deep Dumpster DivingDeep Dumpster Diving
Deep Dumpster Diving
 
RxJava и Android. Плюсы, минусы, подводные камни
RxJava и Android. Плюсы, минусы, подводные камниRxJava и Android. Плюсы, минусы, подводные камни
RxJava и Android. Плюсы, минусы, подводные камни
 
Pressure drop model presentation april 19th
Pressure drop model presentation april 19thPressure drop model presentation april 19th
Pressure drop model presentation april 19th
 

Similar to SHARP: Harmonizing Galaxy and Taverna workflow provenance

Combinational & Sequential ATPG.pdf
Combinational & Sequential ATPG.pdfCombinational & Sequential ATPG.pdf
Combinational & Sequential ATPG.pdf
MoinPasha12
 
SherLog: Error Diagnosis Through Connecting Clues from Run-time Logs
SherLog:  Error Diagnosis Through Connecting Clues from Run-time Logs SherLog:  Error Diagnosis Through Connecting Clues from Run-time Logs
SherLog: Error Diagnosis Through Connecting Clues from Run-time Logs
Lisong Guo
 
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Sangmin Park
 

Similar to SHARP: Harmonizing Galaxy and Taverna workflow provenance (15)

SHARP: Harmonizing cross-workflow Provenance
SHARP: Harmonizing cross-workflow ProvenanceSHARP: Harmonizing cross-workflow Provenance
SHARP: Harmonizing cross-workflow Provenance
 
SHARP: harmonizing cross-workflow provenance
SHARP: harmonizing cross-workflow provenanceSHARP: harmonizing cross-workflow provenance
SHARP: harmonizing cross-workflow provenance
 
PoemTapp16
PoemTapp16PoemTapp16
PoemTapp16
 
Argumentation Extensions Enumeration as a Constraint Satisfaction Problem: a ...
Argumentation Extensions Enumeration as a Constraint Satisfaction Problem: a ...Argumentation Extensions Enumeration as a Constraint Satisfaction Problem: a ...
Argumentation Extensions Enumeration as a Constraint Satisfaction Problem: a ...
 
PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbo...
PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbo...PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbo...
PSOk-NN: A Particle Swarm Optimization Approach to Optimize k-Nearest Neighbo...
 
Combinational & Sequential ATPG.pdf
Combinational & Sequential ATPG.pdfCombinational & Sequential ATPG.pdf
Combinational & Sequential ATPG.pdf
 
SherLog: Error Diagnosis Through Connecting Clues from Run-time Logs
SherLog:  Error Diagnosis Through Connecting Clues from Run-time Logs SherLog:  Error Diagnosis Through Connecting Clues from Run-time Logs
SherLog: Error Diagnosis Through Connecting Clues from Run-time Logs
 
第2回LinkedData勉強会@yayamamo
第2回LinkedData勉強会@yayamamo第2回LinkedData勉強会@yayamamo
第2回LinkedData勉強会@yayamamo
 
Cyber-physical system with machine learning (Poster)
Cyber-physical system with machine learning (Poster)Cyber-physical system with machine learning (Poster)
Cyber-physical system with machine learning (Poster)
 
Optimization of Test Pattern Using Genetic Algorithm for Testing SRAM
Optimization of Test Pattern Using Genetic Algorithm for Testing SRAMOptimization of Test Pattern Using Genetic Algorithm for Testing SRAM
Optimization of Test Pattern Using Genetic Algorithm for Testing SRAM
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
 
Db36619623
Db36619623Db36619623
Db36619623
 
Adaptive Parallelization of Queries over Dependent Web Service Calls
Adaptive Parallelization of Queries over Dependent Web Service CallsAdaptive Parallelization of Queries over Dependent Web Service Calls
Adaptive Parallelization of Queries over Dependent Web Service Calls
 
A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...
A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...
A SCC Recursive Meta-Algorithm for Computing Preferred Labellings in Abstract...
 
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
Griffin: Grouping Suspicious Memory-Access Patterns to Improve Understanding...
 

More from Syed Muhammad Ali Hasnain

Fair data vs 5 star open data final
Fair data vs 5 star open data finalFair data vs 5 star open data final
Fair data vs 5 star open data final
Syed Muhammad Ali Hasnain
 

More from Syed Muhammad Ali Hasnain (10)

Fair data vs 5 star open data final
Fair data vs 5 star open data finalFair data vs 5 star open data final
Fair data vs 5 star open data final
 
Quantifying the content of biomedical semantic resources as a core for drug d...
Quantifying the content of biomedical semantic resources as a core for drug d...Quantifying the content of biomedical semantic resources as a core for drug d...
Quantifying the content of biomedical semantic resources as a core for drug d...
 
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...
 
An Approach for Discovering and Exploring Semantic Relationships between Genes
An Approach for Discovering and Exploring Semantic Relationships between GenesAn Approach for Discovering and Exploring Semantic Relationships between Genes
An Approach for Discovering and Exploring Semantic Relationships between Genes
 
Federated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFedFederated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFed
 
Processing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web TechnologiesProcessing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web Technologies
 
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data CloudA Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
 
Improving discovery in Life Sciences Linked Open Data Cloud
Improving discovery in Life Sciences Linked Open Data CloudImproving discovery in Life Sciences Linked Open Data Cloud
Improving discovery in Life Sciences Linked Open Data Cloud
 
Knowledge Processing with Big Data and Semantic Web Technologies
Knowledge Processing with Big Data and  Semantic Web TechnologiesKnowledge Processing with Big Data and  Semantic Web Technologies
Knowledge Processing with Big Data and Semantic Web Technologies
 
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and ExecutionFedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
 

Recently uploaded

Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 

Recently uploaded (20)

Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 

SHARP: Harmonizing Galaxy and Taverna workflow provenance

  • 1. SHARP: Harmonizing Galaxy and Taverna workflow provenance SeWeBMeDA’17 - Demonstration Alban Gaignard1 , Khalid Belhajjame2 , Hala Skaf-Molli3 May 28, 2017 1 Nantes Academic Hospital, France 2 LAMSADE Paris-Dauphine University, France 3 LS2N - Nantes University, France
  • 2. Multiple workflow engines Taverna workflow @research-lab Galaxy workflow @sequencing-facility Variant effect prediction VCF file Exon filtering output Merge Alignment sample 1.a.R1 sample 1.a.R2 Alignment sample 1.b.R1 sample 1.b.R2 Alignment sample 2.R1 sample 2.R2 Sort Sort Variant calling GRCh37 go to owl:sameAs A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 1
  • 3.
  • 4.
  • 5.
  • 6. SHARP approach owl:sameAs inferred PROV PROV trace PROV trace nanopub PROV interlinking PROV harmonization PROV summarization 11 12 13 … 14 A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 5
  • 7. Demonstration scenario – Provenance capture — Provenance interlinking ˜ Provenance harmonization ™ Provenance summarization (influence graphs, nanopublications) • https://github.com/albangaignard/galaxy-PROV • https://github.com/albangaignard/sharp-prov-toolbox A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 6
  • 8. – Provenance capture Taverna Built-in when saving workflow execution results. Galaxy GALAXY-PROV tool + web interface: • API key • list Galaxy data processing histories • generate PROV (turtle) • visualize PROV (D3.js) https://github.com/albangaignard/galaxy-PROV A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 7
  • 9. Galaxy workflow provenance capture demo A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 7
  • 10.
  • 11.
  • 12. — Provenance interlinking 1. SHA-512 fingerprint of files 2. annotating PROV entities with SHA-512 digest 3. producing owl:sameAs → SPARQL CONSTRUCT-WHERE query Command line tool java -jar SharpProvToolbox/target/SHARP-1.0-SNAPSHOT-launcher.jar -ri sample-data/control_mm9_chr15_Plekhh2-PigF_forward.fastq sample-data/control_mm9_chr15_Plekhh2-PigF_reverse.fastq sample-data/drugged_mm9_chr15_Plekhh2-PigF_forward.fastq sample-data/drugged_mm9_chr15_Plekhh2-PigF_reverse.fastq sample-data/unknown.fastq A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 10
  • 13.
  • 14. ˜ Provenance harmonization 1. OWL entailments, Jena API ReasonerRegistry.getOWLMiniReasoner() 2. PROV inferences (TGD), Jena rule engine new GenericRuleReasoner(all prov rules) 3. Blank nodes removing (EGD) Command line tool java -jar SharpProvToolbox/target/SHARP-1.0-SNAPSHOT-launcher.jar -i sample-data/taverna.prov.ttl sample-data/galaxy.prov.ttl sample-data/sameas.ttl A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 12
  • 15.
  • 16. ™ Provenance summarization: influence graph CONSTRUCT { ?x ?p ?y . ?x rdfs:label ?lx . ?y rdfs:label ?ly . } WHERE { ?x ?p ?y . FILTER (?p IN (prov:wasInfluencedBy)) . ?x rdfs:label ?lx . ?y rdfs:label ?ly . } + HTML/D3.js code generation Command line tool java -jar SharpProvToolbox/target/SHARP-1.0-SNAPSHOT-launcher.jar -i sample-data/taverna.prov.ttl sample-data/galaxy.prov.ttl sample-data/sameas.ttl -s A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 14
  • 17.
  • 18. ™ Provenance summarization: nanopublication CONSTRUCT { GRAPH :assertion { ?ref_genome a sio:Genome . ?sample a sio:Sample ; sio:is-variant-of ?ref_genome ; sio:has-phenotype ?out . [...] } } WHERE { [...] ?out ( prov:wasInfluencedBy )+ ?sample . [...] } Command line tool java -jar SharpProvToolbox/target/SHARP-1.0-SNAPSHOT-launcher.jar -i sample-data/taverna.prov.ttl sample-data/galaxy.prov.ttl sample-data/sameas.ttl -sq sample-data/nanopub.query A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 16
  • 19.
  • 20.
  • 22. Backup slides A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 19
  • 23. PROV-O ontology https://www.w3.org/TR/prov-o A. Gaignard, K. Belhajjame, H. Skaff Molli – SeWeBMeDA’17 20