SlideShare une entreprise Scribd logo
1  sur  31
Trends in Use of Scientific Workflows:




                                                            DataONE
Insights from a Public Repository and
Recommendations for Best Practices
Richard Littauer, Karthik Ram, Bertram Ludäscher, William
Michener, Rebecca Koskela

                                                             1
Scientific Workflows
Tools that help scientists:

  • Automate repetitive or
    difficult work




                                       DataONE
  • Provide reproducibility to their
    experiments

  • Track provenance

  • Share their data with other         2
    scientists
Workflow Workbenches




                       DataONE
                        3
Workflow Workbenches
These facilitate:
  • Creation

  • Mapping




                       DataONE
  • Scheduling

  • Execution

  • Visualization

  • Re-Use              4
Example Workflow




                                                 DataONE
                                                  5


http://www.myexperiment.org/workflows/140.html
Our Study


                                                • How are workflows being used?




                                                                                  DataONE
                                                                                   6


http://www.flickr.com/photos/eleaf/2536358399
Our Study


                                                • How are workflows being used?




                                                                                  DataONE
                                                • How are they being shared?




                                                                                   7


http://www.flickr.com/photos/eleaf/2536358399
Our Study


                                                • How are workflows being used?




                                                                                    DataONE
                                                • How are they being shared?
                                                • What sort of best practices can
                                                  researchers follow to maximize
                                                  the longevity and use of their
                                                  work?

                                                                                     8


http://www.flickr.com/photos/eleaf/2536358399
Our Study

• www.myexperiment.org
  • Est. 2007




                                                            DataONE
  • 5000+ users
  • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner)




                                                             9
Our Study

• www.myexperiment.org
  • Est. 2007




                                                                      DataONE
  • 5000+ users
  • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner)

  • Minable RDF storage for workflows, groups, packs, users, files.
  • Minable data gathered through the SCUFLE XML language for the
    Taverna workflows
  • Taverna 1 - 479 workflows; Taverna 2 - 684 workflows.
                                                                      10
Our Study

• We harvested information using a combination of SPARQL and
  Python (https://github.com/RichardLitt/Understanding-Workflows)




                                                                    DataONE
                                                                    11
Our Study

• We harvested information using a combination of SPARQL and
  Python (https://github.com/RichardLitt/Understanding-Workflows)




                                                                    DataONE
• Gathered user, workflow, files, packs, groups view and
  download statistics, metadata, descriptions, tags, and so on
  (http://thedatahub.org/dataset/myexperiment-screenscrape)




                                                                    12
Findings
           • A large percentage of
             workflows consist of few
             components.

           • The amount of components




                                         DataONE
             ranges from 1 to 250. The
             average workflow supports
             24.3 tasks.

           • Complex workflows are
             downloaded more.
                                         13
Findings
           • Most workflow contributors
             submit a single workflow.

           • Only 13 users have uploaded
             more than 30 workflows.




                                            DataONE
           • Just over 5% of the users on
             myExperiment have uploaded
             workflows.



                                            14
Findings
           • Most workflows have only
             one version uploaded.

           • When several versions do




                                           DataONE
             exist, the workflow is more
             frequently downloaded than
             “single-edition” workflows.




                                           15
Findings
           • Workflow use declined
             significantly a month after
             initial upload.




                                           DataONE
                                           16
Findings

• A large percentage of workflow components – approx. 38% -
  are shims.




                                                              DataONE
  • Components that are used to make output from one step
    conform to the format expected by a subsequent step.




                                                              17
Findings

• A large percentage of workflow components – approx. 38% -
  are shims.




                                                              DataONE
  • Components that are used to make output from one step
    conform to the format expected by a subsequent step.

  • This is a problem for developers.




                                                              18
Findings

• A large percentage of workflow components – approx. 38% -
  are shims.




                                                              DataONE
  • Components that are used to make output from one step
    conform to the format expected by a subsequent step.

  • This is a problem for developers.

  • 8% more than previous studies (Lin et al.)

                                                              19
Findings

• 60% of workflows have embedded workflows within them.




                                                          DataONE
                                                          20
Findings

• 60% of workflows have embedded workflows within them.

• Documentation on site (tags, description) does not improve




                                                               DataONE
  use…




                                                               21
Findings

• 60% of workflows have embedded workflows within them.

• Documentation on site (tags, description) does not improve




                                                               DataONE
  use…

• … but community engagement does.




                                                               22
Recommendations


Remember workflows are evolving entities.




                                            DataONE
They are updated in response to user
feedback, engagement, and improvements in
methodology.

                                            23
Recommendations


Use relevant social annotation tools.




                                                DataONE
But they need to be constrained; for
instance, through the use of a controlled tag
vocabulary.

                                                24
Recommendations


Talk about them.




                                     DataONE
Cite the workflow in publications.
Share with colleagues
Advertise the workflow.

                                     25
Recommendations


Provide sufficient descriptions of your workflows.




                                                     DataONE
                                                     26
Recommendations


Keep in mind that one size does not fit all.




                                               DataONE
                                               27
Recommendations


Workflow re-use could benefit significantly from




                                                     DataONE
the assignment of stable identifiers, like Digital
Object Identifiers (DOI).




                                                     28
Recommendations


Education is the key to more use.




                                                   DataONE
i.e. in professional society meetings, online
courses, and undergraduate and graduate courses.



                                                   29
Impact on Science
Following these recommendations can help:
• Make science more efficient.
• Facilitate reproducible science.




                                                   DataONE
• Help with collaborative research.
• Speed up the peer review process.
• Your impact. (For instance, NSF has said these
  are valuable contributions.)

                                                   30
Links
• Mendeley Research Group:
  http://www.mendeley.com/groups/1189721/scientific-workflows-
  and-workflow-systems/
• Github https://github.com/RichardLitt/Understanding-Workflows
• Data http://thedatahub.org/dataset/myexperiment-screenscrape




                                                                  DataONE
• Notebook https://notebooks.dataone.org/workflows




                                                                  31


http://www.flickr.com/photos/wwworks/4759535950/

Contenu connexe

Similaire à Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices

Scientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informaticsScientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informatics
Khaled Tumbi
 
Pain points for preservation services / workflows in repositories
Pain points for preservation services /  workflows in repositories Pain points for preservation services /  workflows in repositories
Pain points for preservation services / workflows in repositories
prwheatley
 
Jumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsJumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutions
Arpan Bhandari
 
SPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 SearchSPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 Search
Agnes Molnar
 
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
Paris Open Source Summit
 

Similaire à Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices (20)

Scientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informaticsScientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informatics
 
M&R Poster P1 Event Manager Website Rajopadhye Mhatre Nimmagadda
M&R Poster P1 Event Manager Website Rajopadhye Mhatre NimmagaddaM&R Poster P1 Event Manager Website Rajopadhye Mhatre Nimmagadda
M&R Poster P1 Event Manager Website Rajopadhye Mhatre Nimmagadda
 
Pain points for preservation services / workflows in repositories
Pain points for preservation services /  workflows in repositories Pain points for preservation services /  workflows in repositories
Pain points for preservation services / workflows in repositories
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Telemetry Onboarding
Telemetry OnboardingTelemetry Onboarding
Telemetry Onboarding
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Jumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsJumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutions
 
Jumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsJumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutions
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for Robotics
 
SPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 SearchSPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 Search
 
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
 
Planets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservationPlanets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservation
 
E-ticketing system for zoo kemaman with multimedia content
E-ticketing system for zoo kemaman with multimedia contentE-ticketing system for zoo kemaman with multimedia content
E-ticketing system for zoo kemaman with multimedia content
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
 
IETF Update: Making the Internet Work Better
IETF Update: Making the Internet Work BetterIETF Update: Making the Internet Work Better
IETF Update: Making the Internet Work Better
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 

Plus de Richard Littauer

On Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossOn Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem Isogloss
Richard Littauer
 
Evolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaEvolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche Kucha
Richard Littauer
 

Plus de Richard Littauer (12)

Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
 
Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 Presentation
 
Marcu 2000 presentation
Marcu 2000 presentationMarcu 2000 presentation
Marcu 2000 presentation
 
Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentation
 
Saarland and UdS
Saarland and UdSSaarland and UdS
Saarland and UdS
 
Building Corpora from Social Media
Building Corpora from Social MediaBuilding Corpora from Social Media
Building Corpora from Social Media
 
Visualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat MapsVisualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat Maps
 
On Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossOn Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem Isogloss
 
The Evolution of Morphological Agreement
The Evolution of Morphological AgreementThe Evolution of Morphological Agreement
The Evolution of Morphological Agreement
 
Evolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaEvolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche Kucha
 
The Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer SimulationThe Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer Simulation
 
A Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for LanguageA Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for Language
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices

  • 1. Trends in Use of Scientific Workflows: DataONE Insights from a Public Repository and Recommendations for Best Practices Richard Littauer, Karthik Ram, Bertram Ludäscher, William Michener, Rebecca Koskela 1
  • 2. Scientific Workflows Tools that help scientists: • Automate repetitive or difficult work DataONE • Provide reproducibility to their experiments • Track provenance • Share their data with other 2 scientists
  • 4. Workflow Workbenches These facilitate: • Creation • Mapping DataONE • Scheduling • Execution • Visualization • Re-Use 4
  • 5. Example Workflow DataONE 5 http://www.myexperiment.org/workflows/140.html
  • 6. Our Study • How are workflows being used? DataONE 6 http://www.flickr.com/photos/eleaf/2536358399
  • 7. Our Study • How are workflows being used? DataONE • How are they being shared? 7 http://www.flickr.com/photos/eleaf/2536358399
  • 8. Our Study • How are workflows being used? DataONE • How are they being shared? • What sort of best practices can researchers follow to maximize the longevity and use of their work? 8 http://www.flickr.com/photos/eleaf/2536358399
  • 9. Our Study • www.myexperiment.org • Est. 2007 DataONE • 5000+ users • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner) 9
  • 10. Our Study • www.myexperiment.org • Est. 2007 DataONE • 5000+ users • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner) • Minable RDF storage for workflows, groups, packs, users, files. • Minable data gathered through the SCUFLE XML language for the Taverna workflows • Taverna 1 - 479 workflows; Taverna 2 - 684 workflows. 10
  • 11. Our Study • We harvested information using a combination of SPARQL and Python (https://github.com/RichardLitt/Understanding-Workflows) DataONE 11
  • 12. Our Study • We harvested information using a combination of SPARQL and Python (https://github.com/RichardLitt/Understanding-Workflows) DataONE • Gathered user, workflow, files, packs, groups view and download statistics, metadata, descriptions, tags, and so on (http://thedatahub.org/dataset/myexperiment-screenscrape) 12
  • 13. Findings • A large percentage of workflows consist of few components. • The amount of components DataONE ranges from 1 to 250. The average workflow supports 24.3 tasks. • Complex workflows are downloaded more. 13
  • 14. Findings • Most workflow contributors submit a single workflow. • Only 13 users have uploaded more than 30 workflows. DataONE • Just over 5% of the users on myExperiment have uploaded workflows. 14
  • 15. Findings • Most workflows have only one version uploaded. • When several versions do DataONE exist, the workflow is more frequently downloaded than “single-edition” workflows. 15
  • 16. Findings • Workflow use declined significantly a month after initial upload. DataONE 16
  • 17. Findings • A large percentage of workflow components – approx. 38% - are shims. DataONE • Components that are used to make output from one step conform to the format expected by a subsequent step. 17
  • 18. Findings • A large percentage of workflow components – approx. 38% - are shims. DataONE • Components that are used to make output from one step conform to the format expected by a subsequent step. • This is a problem for developers. 18
  • 19. Findings • A large percentage of workflow components – approx. 38% - are shims. DataONE • Components that are used to make output from one step conform to the format expected by a subsequent step. • This is a problem for developers. • 8% more than previous studies (Lin et al.) 19
  • 20. Findings • 60% of workflows have embedded workflows within them. DataONE 20
  • 21. Findings • 60% of workflows have embedded workflows within them. • Documentation on site (tags, description) does not improve DataONE use… 21
  • 22. Findings • 60% of workflows have embedded workflows within them. • Documentation on site (tags, description) does not improve DataONE use… • … but community engagement does. 22
  • 23. Recommendations Remember workflows are evolving entities. DataONE They are updated in response to user feedback, engagement, and improvements in methodology. 23
  • 24. Recommendations Use relevant social annotation tools. DataONE But they need to be constrained; for instance, through the use of a controlled tag vocabulary. 24
  • 25. Recommendations Talk about them. DataONE Cite the workflow in publications. Share with colleagues Advertise the workflow. 25
  • 26. Recommendations Provide sufficient descriptions of your workflows. DataONE 26
  • 27. Recommendations Keep in mind that one size does not fit all. DataONE 27
  • 28. Recommendations Workflow re-use could benefit significantly from DataONE the assignment of stable identifiers, like Digital Object Identifiers (DOI). 28
  • 29. Recommendations Education is the key to more use. DataONE i.e. in professional society meetings, online courses, and undergraduate and graduate courses. 29
  • 30. Impact on Science Following these recommendations can help: • Make science more efficient. • Facilitate reproducible science. DataONE • Help with collaborative research. • Speed up the peer review process. • Your impact. (For instance, NSF has said these are valuable contributions.) 30
  • 31. Links • Mendeley Research Group: http://www.mendeley.com/groups/1189721/scientific-workflows- and-workflow-systems/ • Github https://github.com/RichardLitt/Understanding-Workflows • Data http://thedatahub.org/dataset/myexperiment-screenscrape DataONE • Notebook https://notebooks.dataone.org/workflows 31 http://www.flickr.com/photos/wwworks/4759535950/