Apidays New York 2024 - The value of a flexible API Management solution for O...
Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices
1. Trends in Use of Scientific Workflows:
DataONE
Insights from a Public Repository and
Recommendations for Best Practices
Richard Littauer, Karthik Ram, Bertram Ludäscher, William
Michener, Rebecca Koskela
1
2. Scientific Workflows
Tools that help scientists:
• Automate repetitive or
difficult work
DataONE
• Provide reproducibility to their
experiments
• Track provenance
• Share their data with other 2
scientists
5. Example Workflow
DataONE
5
http://www.myexperiment.org/workflows/140.html
6. Our Study
• How are workflows being used?
DataONE
6
http://www.flickr.com/photos/eleaf/2536358399
7. Our Study
• How are workflows being used?
DataONE
• How are they being shared?
7
http://www.flickr.com/photos/eleaf/2536358399
8. Our Study
• How are workflows being used?
DataONE
• How are they being shared?
• What sort of best practices can
researchers follow to maximize
the longevity and use of their
work?
8
http://www.flickr.com/photos/eleaf/2536358399
10. Our Study
• www.myexperiment.org
• Est. 2007
DataONE
• 5000+ users
• 2000+ workflows (mostly Taverna 1, 2, and RapidMiner)
• Minable RDF storage for workflows, groups, packs, users, files.
• Minable data gathered through the SCUFLE XML language for the
Taverna workflows
• Taverna 1 - 479 workflows; Taverna 2 - 684 workflows.
10
11. Our Study
• We harvested information using a combination of SPARQL and
Python (https://github.com/RichardLitt/Understanding-Workflows)
DataONE
11
12. Our Study
• We harvested information using a combination of SPARQL and
Python (https://github.com/RichardLitt/Understanding-Workflows)
DataONE
• Gathered user, workflow, files, packs, groups view and
download statistics, metadata, descriptions, tags, and so on
(http://thedatahub.org/dataset/myexperiment-screenscrape)
12
13. Findings
• A large percentage of
workflows consist of few
components.
• The amount of components
DataONE
ranges from 1 to 250. The
average workflow supports
24.3 tasks.
• Complex workflows are
downloaded more.
13
14. Findings
• Most workflow contributors
submit a single workflow.
• Only 13 users have uploaded
more than 30 workflows.
DataONE
• Just over 5% of the users on
myExperiment have uploaded
workflows.
14
15. Findings
• Most workflows have only
one version uploaded.
• When several versions do
DataONE
exist, the workflow is more
frequently downloaded than
“single-edition” workflows.
15
16. Findings
• Workflow use declined
significantly a month after
initial upload.
DataONE
16
17. Findings
• A large percentage of workflow components – approx. 38% -
are shims.
DataONE
• Components that are used to make output from one step
conform to the format expected by a subsequent step.
17
18. Findings
• A large percentage of workflow components – approx. 38% -
are shims.
DataONE
• Components that are used to make output from one step
conform to the format expected by a subsequent step.
• This is a problem for developers.
18
19. Findings
• A large percentage of workflow components – approx. 38% -
are shims.
DataONE
• Components that are used to make output from one step
conform to the format expected by a subsequent step.
• This is a problem for developers.
• 8% more than previous studies (Lin et al.)
19
20. Findings
• 60% of workflows have embedded workflows within them.
DataONE
20
21. Findings
• 60% of workflows have embedded workflows within them.
• Documentation on site (tags, description) does not improve
DataONE
use…
21
22. Findings
• 60% of workflows have embedded workflows within them.
• Documentation on site (tags, description) does not improve
DataONE
use…
• … but community engagement does.
22
23. Recommendations
Remember workflows are evolving entities.
DataONE
They are updated in response to user
feedback, engagement, and improvements in
methodology.
23
24. Recommendations
Use relevant social annotation tools.
DataONE
But they need to be constrained; for
instance, through the use of a controlled tag
vocabulary.
24
28. Recommendations
Workflow re-use could benefit significantly from
DataONE
the assignment of stable identifiers, like Digital
Object Identifiers (DOI).
28
29. Recommendations
Education is the key to more use.
DataONE
i.e. in professional society meetings, online
courses, and undergraduate and graduate courses.
29
30. Impact on Science
Following these recommendations can help:
• Make science more efficient.
• Facilitate reproducible science.
DataONE
• Help with collaborative research.
• Speed up the peer review process.
• Your impact. (For instance, NSF has said these
are valuable contributions.)
30
31. Links
• Mendeley Research Group:
http://www.mendeley.com/groups/1189721/scientific-workflows-
and-workflow-systems/
• Github https://github.com/RichardLitt/Understanding-Workflows
• Data http://thedatahub.org/dataset/myexperiment-screenscrape
DataONE
• Notebook https://notebooks.dataone.org/workflows
31
http://www.flickr.com/photos/wwworks/4759535950/