Scientific workflows help facilitate research by making complex processes easier to assemble, access diverse resources transparently, incorporate multiple tools, and ensure reproducibility. However, new challenges have emerged such as analyzing large amounts of sensor and genomic data. Workflows need to be more programmable, optimize resource usage across computing systems, and integrate with the full scientific process from data generation to publication. Next steps include specializing workflows for different domains and standards, treating workflows as publications, and catering to various hardware architectures.
1. 1Ilkay ALTINTAS- September, 2013
Ilkay ALTINTAS, Ph.D.
San Diego Supercomputer Center, UCSD
http://users.sdsc.edu/~altintas
Roles and Challenges for!
Scientific Workflows and Provenance !
in the Age of Open Science, !
Cloud Computing and Web 2.0!
2. 2Ilkay ALTINTAS- September, 2013
Workflows are a Part of Cyberinfrastructure!
Workflow
Design!
!
Reporting!
!
Workflow
Monitoring!
!
Workflow
Execution!
!
!
Workflow
Scheduling
and
Execution
Planning!
!
!
Run!
Review!
!
Provenance
Analysis!
!
!
Deploy!
and!
Publish!
!
Accelerate
Workflow Design
and Reuse via a
Drag-and-Drop
Visual Interface
Facilitate
Sharing
Schedule, Run and
Monitor Workflow
Execution
Promote Learning
Support for end-to-end computational scientific process
BUILD SHARE RUN LEARN
4. 4Ilkay ALTINTAS- September, 2013
Facilitating and Accelerating XXX-Info or
Comp-XXX Research using Scientific Workflows!
• Important Attributes"
– Assemble complex processing easily"
– Access transparently to diverse resources "
– Incorporate multiple software tools "
– Assure reproducibility "
– Build around community development model "
5. 5Ilkay ALTINTAS- September, 2013
In addition, workflows today are…!
• Encapsulations of scientific knowledge"
• Easy to share bits of scientific process"
– e.g., as research objects"
• Mostly portable"
• Facilitate and encourage reproducible science"
– Track provenance at each step of science… "
• Key integrator for (big and small) data science"
• A means to standardize scientific data
products"
7. 7Ilkay ALTINTAS- September, 2013
The ‘bioinformatics’ Bottleneck!
• Resources needed for sequence analysis far
exceed the costs of sequence generation"
– Cloud computing is an attractive on-demand
decentralized model"
– Need new scheduling capabilities"
• on-demand access to a shared configurable resources "
• networks, servers, storage, applications, and services"
– Need ability to easily combine users environment
and community tools together with workflow "
– Various tools with different profiles"
8. 8Ilkay ALTINTAS- September, 2013
The ‘sensor data’ bottleneck!
• Data streaming in at various rates"
• “Big Data” by definition in its volume, variety,
velocity and viscosity"
– Workflows can improve veracity and add value by
providing provenance- and standards-aware on-
the-fly archival capabilities"
– Workflows can QA/QC and automate (real-time)
analysis of streaming data before it is even
archived."
9. 9Ilkay ALTINTAS- September, 2013
The ‘HPC’ bottleneck!
• Scaling for exascale not happening very
naturally"
– Different memory architectures"
– Analysis codes being redeveloped"
– Just scheduling through the batch schedulers not
enough"
– HPC workflows are becoming more interactive "
– In-situ data analysis to deal with volumes of data"
10. 10Ilkay ALTINTAS- September, 2013
As users see the value, they say:!
• Increase reuse "
– best development practices by the scientific community"
– other bio packages"
• Increase programmability by end users"
– users with various skill levels "
– to formulate actual domain specific workflows"
• Increase resource utilization"
– optimize execution across available computing resources "
– in an efficient, transparent and intuitive manner"
• Make workflows a part of the end-to-end scientific
model from data generation to publication"
11. 11Ilkay ALTINTAS- September, 2013
What are some next steps?!
• Specialize workflow systems with domain-specific "
– Tools; Data models and formats; User interfaces;
Deployment "
• Workflow publications and data repositories"
– Treat workflows same as data"
– Strong virtualization capability"
• Standards for provenance needed"
– For data and for process"
• Build upon prior knowledge by detecting best
practice programming patterns and motifs"
• Cater to cater to different hardware architectures"
12. 12Ilkay ALTINTAS- September, 2013
Ilkay Altintas
altintas@sdsc.edu
@ilkayaltintas @bioKepler
@KeplerWorkflow @WIFIREProject
Thanks! & Questions…!
How to download Kepler?
https://kepler-project.org/users/downloads
Please start with the short Getting Started Guide:
https://kepler-project.org/users/documentation