SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Date: 22/10/2014 
Workflow Reuse in Practice: 
A Study of Neuroimaging Pipeline Users 
Daniel Garijo *, Oscar Corcho *, Yolanda Gil Ŧ, Meredith N. Braskieⱡ, Derrek Hibarⱡ, Xue Huaⱡ, Neda Jahanshadⱡ, Paul Thompsonⱡ, and Arthur W. Togaⱡ 
* Universidad Politécnica de Madrid, 
Ŧ USC Information Sciences Institute, 
ⱡ USC Laboratory of Neuroimaging
Main Contributions 
•Highlight the benefits of workflows and workflow fragments 
reported by users in a neuroscience research lab 
•Survey of workflow users 
•Quantitative perspective on the identified benefits. 
IEEE eScience 2014. Guarujá, Brasil 
2 
repurpose 
reuse 
repository 
Create, collaborate
Background 
•Workflows are software artifacts that capture computational experiments 
•Addition to paper publication 
•Provenance of results 
•Reuse 
•Existing repositories of workflows (Galaxy, myExperiment, the LONI Pipeline, CrowdLabs, etc.) 
•Sharing workflows 
•Exploring existing workflows 
•PROBLEMS to address: 
•How does workflow reuse happen in a research lab environment? 
•Are workflow fragments more useful than workflows? 
3 
IEEE eScience 2014. Guarujá, Brasil
Use case: The LONI Pipeline 
Workflow system for neuroimaging analysis 
http://pipeline.loni.usc.edu/explore/library-navigator/ 
IEEE eScience 2014. Guarujá, Brasil 
4
Why LONI Pipeline? 
•Need for reuse 
•Grouping Tools 
•Manual annotation of workflow fragments 
•Workflow Miner 
5 
IEEE eScience 2014. Guarujá, Brasil
Approach 
IEEE eScience 2014. Guarujá, Brasil 
6 
Discussions with scientists 
User survey 
Collect responses from users 
21 responses 
Discuss results
Possible benefits of workflows and workflow fragments 
•Sharing workflows with collaborators 
•Time savings 
•Copy & paste fragments of workflows 
•Reuse existent workflows 
•Teaching 
•Reduce the learning curve of new students 
•Visualization 
•Simplify workflows 
•Design for modularity 
•Highlight the most relevant steps on a workflow 
IEEE eScience 2014. Guarujá, Brasil 
7
Possible benefits of workflows and workflow fragments (2) 
•Design for understandability 
•Design for standardization 
•Debugging 
•Provenance exploration 
•Paper writing 
•Linking papers to pipelines 
•Reproducibility and inspectability 
IEEE eScience 2014. Guarujá, Brasil 
8
Survey Analysis 
9 
IEEE eScience 2014. Guarujá, Brasil
Writing and Sharing Code 
•Writing code is considered very important for this area of research. 
•Sharing code is not considered to be as important. 
10 
IEEE eScience 2014. Guarujá, Brasil
Adopting a Workflow System 
The overwhelming majority of responders found the workflow system useful. 
•Creation of workflows. 
IEEE eScience 2014. Guarujá, Brasil 
11
Adopting a workflow system: workflow size 
•Workflows of fewer than 10 steps seem to be the most preferred by scientists 
IEEE eScience 2014. Guarujá, Brasil 
12 
0	 2	 4	 6	 8	 10	 12	 14	 1	2	3	4	1-5	5-10	10-20	>20	 Number	of	workflow	components
Reusing workflows 
•Respondents answered that creating workflows is very useful 
•Reuse of workflows was seen as less useful 
•Reuse is not the only reason why workflows are created 
•Reusing workflows from a user’s prior work is considered as useful as reusing workflows from others 
IEEE eScience 2014. Guarujá, Brasil 
13
Reusing workflows (2) 
According to the respondents, the major benefits of workflows include: 
• Time savings 
•Organizing and storing code 
• Having a visualization of the overall analysis 
•Facilitating reproducibility 
IEEE eScience 2014. Guarujá, Brasil 
14 
Workflows save time 
13 
Easier to track and debug complex code 
9 
Convenient way to organize/store code 
11 
Help write more organized code 
6 
Help make code more modular/reusable 
4 
Help make methods more understandable 
8 
Visualization of overall analysis 
11 
Workflows facilitate reproducibility 
10
Reusing workflows (3) 
•The overwhelming majority of respondents said workflows are useful for both non-programmers and for teaching new students 
IEEE eScience 2014. Guarujá, Brasil 
15 
Non-programmers can use them 
20 
New students can easily learn 
19 
No need for others to re-implement code 
14 
Adoption of standard ways to do things 
9
Reusing workflows (4) 
•Respondents did not offer very overwhelming reasons for not sharing workflows 
•Respondents did not offer very overwhelming reasons for not reusing workflows from others 
IEEE eScience 2014. Guarujá, Brasil 
16 
Others would not want to use them 
1 
Others ask too many questions of the creators 
2 
Workflows from others are difficult to understand 
3 
It is difficult to understand how to prepare data for a workflow 
3 
Workflows from others are difficult to understand 
4 
It is difficult to understand how to prepare data for a workflow 
2 
Workflows created by others are too specific 
1 
It is hard to take workflows created by others and make them work 
2
Reusing groupings 
•Reuse is not the only reason why groupings are created. Unlike workflows, reusing groupings from one’s own work is more useful than reusing groupings from others 
IEEE eScience 2014. Guarujá, Brasil 
17
Reusing groupings (2) 
•Most respondents agreed that groupings help simplify workflows. Groupings also make workflows more understandable by others 
•Other grouping benefits: 
•Time savings 
•Help making modular and understandable code, more so than workflows 
•Seen as useful to non-programmers and students 
IEEE eScience 2014. Guarujá, Brasil 
18 
Visualization of the analysis 
10 
To simplify workflows that are complex overall 
12 
To make workflows more understandable to others 
12 
Groupings save time 
12 
Help make code more modular/reusable 
10 
Help make methods more understandable 
7
Reusing groupings (3) 
Very few responses motivated any reasons for not sharing groupings or not reusing groupings from others In general, workflows are considered generally more useful than groupings. On the other hand, more respondents said that groupings help make their code more modular and understandable 
IEEE eScience 2014. Guarujá, Brasil 
19 
Others would not want to use them 
0 
Others ask too many questions of the creators 
1 
Workflows from others are difficult to understand 
4 
It is difficult to understand how to prepare data for a grouping 
1 
Groupings from others are difficult to understand 
2 
It is difficult to understand how to prepare data for a grouping 
3 
Groupings created by others are too specific 
1 
It is hard to take groupings created by others and make them work 
4
Paper Writing 
Workflows are not systematically linked to publications 
•Most responders believe that the link between a workflow and a publication is kept in private laboratory notes, rather than in a publicly accessible manner 
IEEE eScience 2014. Guarujá, Brasil 
20
Discussion 
Workflows have a clear benefit to the lab. There are important directions of future research suggested by this work: 
•Improve the use of groupings. 
•If users had more assistance in specifying and finding groupings, it is possible that workflows and fragments would be more reused 
•Debugging and checking results 
•Better mechanisms to handle checking intermediate execution results would allow users to define larger workflows 
•Better documentation of workflows. 
•Documentation of workflows tends to be private and scattered, and not usually linked to papers 
•Facilitating workflows publication and linking to papers 
•Papers provide important context and documentation for workflows 
IEEE eScience 2014. Guarujá, Brasil 
21
Conclusions 
•Contributions: 
•Highlight the benefits of workflows and workflow fragments reported by users in a neuroscience research lab 
•Quantitative survey of the benefits by workflow users 
•Our work can be expanded by 
•Validating our findings with more respondents 
•Reflecting the experience level of the respondents on the questionnaire 
•Including statistics of the groupings usage on the workflows they create 
•There are clear opportunities to develop best practices for designing workflow components and modularizing code, encouraging standards adoption, and facilitating understanding by other users 
IEEE eScience 2014. Guarujá, Brasil 
22 
All materials used and the survey are available at: http://purl.org/net/wfSurvey-eScience2014
23 
Who are we? 
•Daniel Garijo, Oscar Corcho Ontology Engineering Group, UPM 
•Yolanda Gil Information Sciences Institute, USC 
•Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad, Paul Thompson Arthur W. Toga. USC Laboratory of Neuro Imaging 
IEEE eScience 2014. Guarujá, Brasil
24 
Questions? 
IEEE eScience 2014. Guarujá, Brasil
Date: 22/10/2014 
Workflow Reuse in Practice: 
A Study of Neuroimaging Pipeline Users 
Daniel Garijo *, Oscar Corcho *, Yolanda Gil Ŧ, Meredith N. Braskieⱡ, Derrek Hibarⱡ, Xue Huaⱡ, Neda Jahanshadⱡ, Paul Thompsonⱡ, and Arthur W. Togaⱡ 
* Universidad Politécnica de Madrid, 
Ŧ USC Information Sciences Institute, 
ⱡ USC Laboratory of Neuroimaging

Contenu connexe

Similaire à Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015dgarijo
 
Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods dgarijo
 
User experience at Imperial: a case study of qualitative approaches to Primo ...
User experience at Imperial: a case study of qualitative approaches to Primo ...User experience at Imperial: a case study of qualitative approaches to Primo ...
User experience at Imperial: a case study of qualitative approaches to Primo ...Andrew Preater
 
A personal journey towards more reproducible networking research
A personal journey towards more reproducible networking researchA personal journey towards more reproducible networking research
A personal journey towards more reproducible networking researchOlivier Bonaventure
 
Proposing a Scientific Paper Retrieval and Recommender Framework
Proposing a Scientific Paper Retrieval and Recommender FrameworkProposing a Scientific Paper Retrieval and Recommender Framework
Proposing a Scientific Paper Retrieval and Recommender FrameworkAravind Sesagiri Raamkumar
 
Introduction to Usability Testing for Survey Research
Introduction to Usability Testing for Survey ResearchIntroduction to Usability Testing for Survey Research
Introduction to Usability Testing for Survey ResearchCaroline Jarrett
 
Lessons Learned Model for Projects Supported by Web 2.0 Tools: a Mixed Method...
Lessons Learned Model for Projects Supported by Web 2.0 Tools: a Mixed Method...Lessons Learned Model for Projects Supported by Web 2.0 Tools: a Mixed Method...
Lessons Learned Model for Projects Supported by Web 2.0 Tools: a Mixed Method...Marcirio Chaves
 
Bioinformatic core facilities discussion
Bioinformatic core facilities discussionBioinformatic core facilities discussion
Bioinformatic core facilities discussionJennifer Shelton
 
RDM Roadmap to the Future, or: Lords and Ladies of the Data
RDM Roadmap to the Future, or: Lords and Ladies of the DataRDM Roadmap to the Future, or: Lords and Ladies of the Data
RDM Roadmap to the Future, or: Lords and Ladies of the DataRobin Rice
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 
Usability Testing for Qualitative Researchers - QRCA NYC Chapter event
Usability Testing for Qualitative Researchers - QRCA NYC Chapter eventUsability Testing for Qualitative Researchers - QRCA NYC Chapter event
Usability Testing for Qualitative Researchers - QRCA NYC Chapter eventKay Aubrey
 
How to Conduct Usability Studies: A Librarian Primer
How to Conduct Usability Studies: A Librarian PrimerHow to Conduct Usability Studies: A Librarian Primer
How to Conduct Usability Studies: A Librarian PrimerTao Zhang
 
Current trends in online delivery and assessment in ANZ
Current trends in online delivery and assessment in ANZCurrent trends in online delivery and assessment in ANZ
Current trends in online delivery and assessment in ANZCharles Darwin University
 
ECE695DVisualAnalyticsprojectproposal (2)
ECE695DVisualAnalyticsprojectproposal (2)ECE695DVisualAnalyticsprojectproposal (2)
ECE695DVisualAnalyticsprojectproposal (2)Shweta Gupte
 
NCompass Live: ACRL Outcome Measurement Made Easy: Project Outcome for Academ...
NCompass Live: ACRL Outcome Measurement Made Easy: Project Outcome for Academ...NCompass Live: ACRL Outcome Measurement Made Easy: Project Outcome for Academ...
NCompass Live: ACRL Outcome Measurement Made Easy: Project Outcome for Academ...Nebraska Library Commission
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 

Similaire à Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users (20)

Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
 
Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods Is preserving data enough? Towards the preservation of scientific methods
Is preserving data enough? Towards the preservation of scientific methods
 
User experience at Imperial: a case study of qualitative approaches to Primo ...
User experience at Imperial: a case study of qualitative approaches to Primo ...User experience at Imperial: a case study of qualitative approaches to Primo ...
User experience at Imperial: a case study of qualitative approaches to Primo ...
 
Credible workshop
Credible workshopCredible workshop
Credible workshop
 
Magic or myth
Magic or mythMagic or myth
Magic or myth
 
A personal journey towards more reproducible networking research
A personal journey towards more reproducible networking researchA personal journey towards more reproducible networking research
A personal journey towards more reproducible networking research
 
Cpjwece2011
Cpjwece2011Cpjwece2011
Cpjwece2011
 
Proposing a Scientific Paper Retrieval and Recommender Framework
Proposing a Scientific Paper Retrieval and Recommender FrameworkProposing a Scientific Paper Retrieval and Recommender Framework
Proposing a Scientific Paper Retrieval and Recommender Framework
 
Introduction to Usability Testing for Survey Research
Introduction to Usability Testing for Survey ResearchIntroduction to Usability Testing for Survey Research
Introduction to Usability Testing for Survey Research
 
Lessons Learned Model for Projects Supported by Web 2.0 Tools: a Mixed Method...
Lessons Learned Model for Projects Supported by Web 2.0 Tools: a Mixed Method...Lessons Learned Model for Projects Supported by Web 2.0 Tools: a Mixed Method...
Lessons Learned Model for Projects Supported by Web 2.0 Tools: a Mixed Method...
 
Bioinformatic core facilities discussion
Bioinformatic core facilities discussionBioinformatic core facilities discussion
Bioinformatic core facilities discussion
 
RDM Roadmap to the Future, or: Lords and Ladies of the Data
RDM Roadmap to the Future, or: Lords and Ladies of the DataRDM Roadmap to the Future, or: Lords and Ladies of the Data
RDM Roadmap to the Future, or: Lords and Ladies of the Data
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
Usability Testing for Qualitative Researchers - QRCA NYC Chapter event
Usability Testing for Qualitative Researchers - QRCA NYC Chapter eventUsability Testing for Qualitative Researchers - QRCA NYC Chapter event
Usability Testing for Qualitative Researchers - QRCA NYC Chapter event
 
How to Conduct Usability Studies: A Librarian Primer
How to Conduct Usability Studies: A Librarian PrimerHow to Conduct Usability Studies: A Librarian Primer
How to Conduct Usability Studies: A Librarian Primer
 
Current trends in online delivery and assessment in ANZ
Current trends in online delivery and assessment in ANZCurrent trends in online delivery and assessment in ANZ
Current trends in online delivery and assessment in ANZ
 
ECE695DVisualAnalyticsprojectproposal (2)
ECE695DVisualAnalyticsprojectproposal (2)ECE695DVisualAnalyticsprojectproposal (2)
ECE695DVisualAnalyticsprojectproposal (2)
 
NCompass Live: ACRL Outcome Measurement Made Easy: Project Outcome for Academ...
NCompass Live: ACRL Outcome Measurement Made Easy: Project Outcome for Academ...NCompass Live: ACRL Outcome Measurement Made Easy: Project Outcome for Academ...
NCompass Live: ACRL Outcome Measurement Made Easy: Project Outcome for Academ...
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 

Plus de dgarijo

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Softwaredgarijo
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...dgarijo
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Datadgarijo
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...dgarijo
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019dgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...dgarijo
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologiesdgarijo
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narrativesdgarijo
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflowsdgarijo
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Softwaredgarijo
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overviewdgarijo
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsdgarijo
 

Plus de dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 

Dernier

Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 

Dernier (20)

Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 

Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

  • 1. Date: 22/10/2014 Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users Daniel Garijo *, Oscar Corcho *, Yolanda Gil Ŧ, Meredith N. Braskieⱡ, Derrek Hibarⱡ, Xue Huaⱡ, Neda Jahanshadⱡ, Paul Thompsonⱡ, and Arthur W. Togaⱡ * Universidad Politécnica de Madrid, Ŧ USC Information Sciences Institute, ⱡ USC Laboratory of Neuroimaging
  • 2. Main Contributions •Highlight the benefits of workflows and workflow fragments reported by users in a neuroscience research lab •Survey of workflow users •Quantitative perspective on the identified benefits. IEEE eScience 2014. Guarujá, Brasil 2 repurpose reuse repository Create, collaborate
  • 3. Background •Workflows are software artifacts that capture computational experiments •Addition to paper publication •Provenance of results •Reuse •Existing repositories of workflows (Galaxy, myExperiment, the LONI Pipeline, CrowdLabs, etc.) •Sharing workflows •Exploring existing workflows •PROBLEMS to address: •How does workflow reuse happen in a research lab environment? •Are workflow fragments more useful than workflows? 3 IEEE eScience 2014. Guarujá, Brasil
  • 4. Use case: The LONI Pipeline Workflow system for neuroimaging analysis http://pipeline.loni.usc.edu/explore/library-navigator/ IEEE eScience 2014. Guarujá, Brasil 4
  • 5. Why LONI Pipeline? •Need for reuse •Grouping Tools •Manual annotation of workflow fragments •Workflow Miner 5 IEEE eScience 2014. Guarujá, Brasil
  • 6. Approach IEEE eScience 2014. Guarujá, Brasil 6 Discussions with scientists User survey Collect responses from users 21 responses Discuss results
  • 7. Possible benefits of workflows and workflow fragments •Sharing workflows with collaborators •Time savings •Copy & paste fragments of workflows •Reuse existent workflows •Teaching •Reduce the learning curve of new students •Visualization •Simplify workflows •Design for modularity •Highlight the most relevant steps on a workflow IEEE eScience 2014. Guarujá, Brasil 7
  • 8. Possible benefits of workflows and workflow fragments (2) •Design for understandability •Design for standardization •Debugging •Provenance exploration •Paper writing •Linking papers to pipelines •Reproducibility and inspectability IEEE eScience 2014. Guarujá, Brasil 8
  • 9. Survey Analysis 9 IEEE eScience 2014. Guarujá, Brasil
  • 10. Writing and Sharing Code •Writing code is considered very important for this area of research. •Sharing code is not considered to be as important. 10 IEEE eScience 2014. Guarujá, Brasil
  • 11. Adopting a Workflow System The overwhelming majority of responders found the workflow system useful. •Creation of workflows. IEEE eScience 2014. Guarujá, Brasil 11
  • 12. Adopting a workflow system: workflow size •Workflows of fewer than 10 steps seem to be the most preferred by scientists IEEE eScience 2014. Guarujá, Brasil 12 0 2 4 6 8 10 12 14 1 2 3 4 1-5 5-10 10-20 >20 Number of workflow components
  • 13. Reusing workflows •Respondents answered that creating workflows is very useful •Reuse of workflows was seen as less useful •Reuse is not the only reason why workflows are created •Reusing workflows from a user’s prior work is considered as useful as reusing workflows from others IEEE eScience 2014. Guarujá, Brasil 13
  • 14. Reusing workflows (2) According to the respondents, the major benefits of workflows include: • Time savings •Organizing and storing code • Having a visualization of the overall analysis •Facilitating reproducibility IEEE eScience 2014. Guarujá, Brasil 14 Workflows save time 13 Easier to track and debug complex code 9 Convenient way to organize/store code 11 Help write more organized code 6 Help make code more modular/reusable 4 Help make methods more understandable 8 Visualization of overall analysis 11 Workflows facilitate reproducibility 10
  • 15. Reusing workflows (3) •The overwhelming majority of respondents said workflows are useful for both non-programmers and for teaching new students IEEE eScience 2014. Guarujá, Brasil 15 Non-programmers can use them 20 New students can easily learn 19 No need for others to re-implement code 14 Adoption of standard ways to do things 9
  • 16. Reusing workflows (4) •Respondents did not offer very overwhelming reasons for not sharing workflows •Respondents did not offer very overwhelming reasons for not reusing workflows from others IEEE eScience 2014. Guarujá, Brasil 16 Others would not want to use them 1 Others ask too many questions of the creators 2 Workflows from others are difficult to understand 3 It is difficult to understand how to prepare data for a workflow 3 Workflows from others are difficult to understand 4 It is difficult to understand how to prepare data for a workflow 2 Workflows created by others are too specific 1 It is hard to take workflows created by others and make them work 2
  • 17. Reusing groupings •Reuse is not the only reason why groupings are created. Unlike workflows, reusing groupings from one’s own work is more useful than reusing groupings from others IEEE eScience 2014. Guarujá, Brasil 17
  • 18. Reusing groupings (2) •Most respondents agreed that groupings help simplify workflows. Groupings also make workflows more understandable by others •Other grouping benefits: •Time savings •Help making modular and understandable code, more so than workflows •Seen as useful to non-programmers and students IEEE eScience 2014. Guarujá, Brasil 18 Visualization of the analysis 10 To simplify workflows that are complex overall 12 To make workflows more understandable to others 12 Groupings save time 12 Help make code more modular/reusable 10 Help make methods more understandable 7
  • 19. Reusing groupings (3) Very few responses motivated any reasons for not sharing groupings or not reusing groupings from others In general, workflows are considered generally more useful than groupings. On the other hand, more respondents said that groupings help make their code more modular and understandable IEEE eScience 2014. Guarujá, Brasil 19 Others would not want to use them 0 Others ask too many questions of the creators 1 Workflows from others are difficult to understand 4 It is difficult to understand how to prepare data for a grouping 1 Groupings from others are difficult to understand 2 It is difficult to understand how to prepare data for a grouping 3 Groupings created by others are too specific 1 It is hard to take groupings created by others and make them work 4
  • 20. Paper Writing Workflows are not systematically linked to publications •Most responders believe that the link between a workflow and a publication is kept in private laboratory notes, rather than in a publicly accessible manner IEEE eScience 2014. Guarujá, Brasil 20
  • 21. Discussion Workflows have a clear benefit to the lab. There are important directions of future research suggested by this work: •Improve the use of groupings. •If users had more assistance in specifying and finding groupings, it is possible that workflows and fragments would be more reused •Debugging and checking results •Better mechanisms to handle checking intermediate execution results would allow users to define larger workflows •Better documentation of workflows. •Documentation of workflows tends to be private and scattered, and not usually linked to papers •Facilitating workflows publication and linking to papers •Papers provide important context and documentation for workflows IEEE eScience 2014. Guarujá, Brasil 21
  • 22. Conclusions •Contributions: •Highlight the benefits of workflows and workflow fragments reported by users in a neuroscience research lab •Quantitative survey of the benefits by workflow users •Our work can be expanded by •Validating our findings with more respondents •Reflecting the experience level of the respondents on the questionnaire •Including statistics of the groupings usage on the workflows they create •There are clear opportunities to develop best practices for designing workflow components and modularizing code, encouraging standards adoption, and facilitating understanding by other users IEEE eScience 2014. Guarujá, Brasil 22 All materials used and the survey are available at: http://purl.org/net/wfSurvey-eScience2014
  • 23. 23 Who are we? •Daniel Garijo, Oscar Corcho Ontology Engineering Group, UPM •Yolanda Gil Information Sciences Institute, USC •Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad, Paul Thompson Arthur W. Toga. USC Laboratory of Neuro Imaging IEEE eScience 2014. Guarujá, Brasil
  • 24. 24 Questions? IEEE eScience 2014. Guarujá, Brasil
  • 25. Date: 22/10/2014 Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users Daniel Garijo *, Oscar Corcho *, Yolanda Gil Ŧ, Meredith N. Braskieⱡ, Derrek Hibarⱡ, Xue Huaⱡ, Neda Jahanshadⱡ, Paul Thompsonⱡ, and Arthur W. Togaⱡ * Universidad Politécnica de Madrid, Ŧ USC Information Sciences Institute, ⱡ USC Laboratory of Neuroimaging