SlideShare a Scribd company logo
1 of 21
Download to read offline
Wf4Ever: Advanced Workflow Preservation
   Technologies for Enhanced Science
                 Grant agreement no.: 27092


     Data Curation & Preservation Session
                     Jose Enrique Ruiz
                         IAA-CSIC

                    8th December 2010
        IVOA 2010 Fall Interoperability Meeting - Nara
Introduction
                                                          The Team



             1.  Intelligent Software Components (ISOCO, Spain)
             2.  University of Manchester (UNIMAN, UK)
             3.  Universidad Politécnica de Madrid (UPM, Spain)
2    7
 5       4   4.  Poznan Supercomputing and Networking Centre
                 (PSNC, Poland)
             5.  Universisty of Oxford (OXF, UK)
             6.  Instituto de Astrofísica de Andalucía (IAA, Spain)
13           7.  Leiden University Medical Centre (LUMC, NL)
6




                                                                      2
Introduction
                                          The Consortium Classified



Partners                     Technological Core Competencies

» One SME                    » Digital Libraries
» Six public organizations   » Workflow Management
                             » Semantic Web
                             » Integrity & Authenticity
                             » Provenance



Major Sectors                Case Studies

» Education                  » Workflow Preservation in Astronomy (IAA)
» IT                         » Workflow Preservation for Genome-wide
» Astronomy                    Analysis and Biobanking (LUMC)
» Bioinfiormatics




                                                                          3
4
5
6
7
8
9
YOUR
 FAVOURITE
MASS MODEL
    TOOL
             10
Scientific Workflows
                                                                                              State of the art
A Scientific Workflow can be seen as the combination of      » Central in experimental science
 data and processes into a configurable, structured set of      ›  Enable automation
   steps that implement semi-automated computational            ›  Make science repeatable (and sometimes
           solutions in scientific problem-solving
                                                                   reproducible)
                                                                ›  Encourage best practices
                                                             » Scientist-friendly
                                                                ›  Aimed at (some types of) scientists, possibly
                                                                   even without strong computational skills
                                                             » Communities: need for scientific data
                                                               preservation
                                                                ›  Enhance scientific development by building on,
                                                                   sharing, and extending previous results within
                                                                   scientific communities
                                                             » However, workflow preservation is especially
                                                               complex
                                                                ›  Workflows not only specified statically at design
                                                                   time but also interpreted through their execution
                                                                ›  Complex models are required to describe
                                                                   workflows and related resources, including
                                                                   documents, data and services
                                                                ›  Resources often beyond control of scientists
                                                                                                                   11
Project Objectives
                                                                                           Goals

Technological infrastructure for the preservation and efficient retrieval and reuse of scientific
                              workflows in a range of disciplines



                                                 » Creation and management of complex
                                                   Research Objects that take into account
                                                   the dual nature (static and dynamic) of
                                                   scientific workflows

                                                 » Archival, classification, and indexing of
                                                   scientific workflows and their associated
                                                   materials in scalable semantic repositories,
                                                   providing advanced access and
                                                   recommendation capabilities

                                                 » Creation of scientific communities to
                                                   collaboratively share, reuse and evolve
                                                   workflows and their parts, stimulating the
                                                   development of new scientific knowledge
                                                                                                    12
Integrity & Authenticy
                                                                                 Definitions

                 Integrity                                   Authenticity



» The quality or condition of being whole,   » Authenticity has a twofold dimension: data
  complete and unaltered                       origin and entity authenticity

» Crucial for ensuring the quality of        » Data origin: Proof of the origin of data,
  preserved data in Research Objects           their genuineness, trustworthiness and
                                               realness
                                             » Entity: ensuring that an entity, e.g. a
                                               person or other kind of actor, is the one it
                                               claims to be




                                                                                              13
Integrity and Authenticity Maintenance
                                                                                Objectives


    Evaluate and preserve the integrity and authenticity of archived Research Objects



» To ensure the data can be accessed and
  interpreted unchanged, complete, and correct
  today and in the future                                     Provenance-based
                                                              means to calculate
» To preserve the integrity of archived Research
  Objects by tracking and verifying changes in               measures of integrity
  archived objects as well as related resources                 and authenticity
» To assist scientists in anticipating potential
  inconsistencies caused by uncontrolled changes
  in such resources

» To verify and proof the authenticity of authors and
  contributors to Research Objects as well as of
  internal and related resources
                                                                                           14
Project Objectives
                                              Common needs from the Community


                What do you want to know when accessing a workflow ?




» If I can use it for my purposes (in my words)
» If I can expect it to run, when it was last run, by whom
» What it does quickly, by one of
   » example input / output (and trying it)
   » a description
   » ‘reading’ its key parts
   » what is was used for
   » related workflows
   » its creator
   » contacting the creator or last user
» How I need to cite the author and workflow



                                                                                      15
Project Objectives
                                              Common needs from the Community


                 What do you want to know when sharing a workflow ?




» What rights others have
» What a good workflow is to get a good score
   » Make my workflow findable, reusable, and ready for review
   » Instructions to authors
   » Two types of contributions: serious science, preliminary/playing around
» If my workflow may have issues
» What the system or other users think it does
   » How it relates to other things
» Share freely or anonymously upon request




                                                                                         16
Project Objectives
                                                               Main Challenges



                                      Quality


•  Workflow + resources
                                    Preservation        •  Sharing & Reuse
•  Manipulation                                         •  Affinity
•  Classification,          •  Store                    •  Incremental
   categorization           •  Access                      development
•  Similarity               •  Evolution                •  Credit, citation
•  Abstraction              •  Scale
                            •  Versioning

      Representation                                             Community

                          Attribution, accountability

                                                                                 17
Project Objectives
                                                                Main Challenges



                                      Quality           INTEGRITY


•  Workflow + resources
                                    Preservation         •  Sharing & Reuse
•  Manipulation                                          •  Affinity
•  Classification,          •  Store                     •  Incremental
   categorization           •  Access                       development
•  Similarity               •  Evolution                 •  Credit, citation
•  Abstraction              •  Scale
                            •  Versioning

      Representation                  AUTHENTICITY                Community

                          Attribution, accountability

                                                                                  18
Evaluation, Validation & Community Building
                                                                           Wf4Ever Case Studies

Two workflow-intensive scientific case studies in the domains of Astronomy and Genomics

                 Astronomy                                             Genomics

» Application area: Virtual Observatory (VO)          » Application areas: Biobanking and
  data processing                                       Genome-Wide analisys
   ›  Astrophysical quantities propagation               ›  Interpretation of GWAS data
   ›  Source extraction on CCD images                    ›  Gene expression studies
   ›  Modeling of galaxy 3D data                      » Focus on authenticity and experimental
» Focus on bringing workflow-based                      reproducibility
  methodologies into Astronomy                        » Community! Lots of available workflows
» Creation of Golden Exemplars                           ›  myExperiment
» Beachhead                                              ›  SysMO-DB
                                                      » Long tradition of workflow application

                                             Overall goals

» To collect and preserve existing workflows and their related objects in each area
» To create scientific communities around the use and preservation of scientific workflows
» To apply, evaluate, and provide feedback on the results obtained from system and component-
  level research
                                                                                                 19
Astronomy WorkPackage
                                                                                Main Goals

WP5: Workflow Preservation in Astronomy

Scientific contribution

»  evelopment of an online community of scientists working on Astronomy
 D
» ntroduction of workflow and workflow preservation needs in Astronomy and the
 I
Virtual Observatory
»  rovide a set of workflows for frequently used complex task-combinations and
 P
demands in the Astronomy domain

Technological contribution

»  nline repository of preserved Astronomy workflows identifying preservation needs
 O
»  reation of three Golden Exemplars workflows:
 C
        ›  using Wf4Ever results
        ›  involving additional implementations for wrapping VO Web services


Workflow-based methodology deployed in the VO community through exemplars
and preservation methodologies
                                                                                          20
Thanks for your Attention!
                                              Questions




http://www.wf4ever-project.org



                                                        21

More Related Content

Viewers also liked

Viewers also liked (7)

Use of CharDM in an archive of velocity cubes
Use of CharDM in an archive of velocity cubesUse of CharDM in an archive of velocity cubes
Use of CharDM in an archive of velocity cubes
 
Riesgos eléctricos
Riesgos eléctricosRiesgos eléctricos
Riesgos eléctricos
 
B0DEGA 3D VO Archive - IVOA 2010 Fall Interop
B0DEGA 3D VO Archive - IVOA 2010 Fall InteropB0DEGA 3D VO Archive - IVOA 2010 Fall Interop
B0DEGA 3D VO Archive - IVOA 2010 Fall Interop
 
C. Lai Ping
C. Lai Ping C. Lai Ping
C. Lai Ping
 
Ipfa2012 photo contest winners
Ipfa2012 photo contest winnersIpfa2012 photo contest winners
Ipfa2012 photo contest winners
 
SVO Activities - SEA 2008
SVO Activities - SEA 2008SVO Activities - SEA 2008
SVO Activities - SEA 2008
 
Research Objects in Wf4Ever
Research Objects in Wf4EverResearch Objects in Wf4Ever
Research Objects in Wf4Ever
 

Similar to Wf4Ever: Advanced Workflow Preservation Technologies for Enhanced Science i

OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objectsseanb
 
OeRC Seminar
OeRC SeminarOeRC Seminar
OeRC Seminarseanb
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and SharingJisc
 
Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...
Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...
Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...Neil Chue Hong
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the webJose Manuel Gómez-Pérez
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional RepositoriesJoshua Parker
 
2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objectsStian Soiland-Reyes
 
Doing Science Properly In The Digital Age - Rutgers Seminar
Doing Science Properly In The Digital Age - Rutgers SeminarDoing Science Properly In The Digital Age - Rutgers Seminar
Doing Science Properly In The Digital Age - Rutgers SeminarNeil Chue Hong
 
Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Jian Qin
 
Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12
Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12
Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12ASIS&T
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNADaniel S. Katz
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?Graham Pryor
 
Research Solutions for Education
Research Solutions for EducationResearch Solutions for Education
Research Solutions for EducationLee Stott
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...SEAD
 
Dc sheridan dlf_2011_final
Dc sheridan dlf_2011_finalDc sheridan dlf_2011_final
Dc sheridan dlf_2011_finalSayeed Choudhury
 
Data Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsData Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsJian Qin
 
Innovation, community, sustainability
Innovation, community, sustainabilityInnovation, community, sustainability
Innovation, community, sustainabilityPaul Walk
 
Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesKerstin Forsberg
 

Similar to Wf4Ever: Advanced Workflow Preservation Technologies for Enhanced Science i (20)

OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objects
 
OeRC Seminar
OeRC SeminarOeRC Seminar
OeRC Seminar
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and Sharing
 
Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...
Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...
Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the web
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional Repositories
 
2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects
 
Doing Science Properly In The Digital Age - Rutgers Seminar
Doing Science Properly In The Digital Age - Rutgers SeminarDoing Science Properly In The Digital Age - Rutgers Seminar
Doing Science Properly In The Digital Age - Rutgers Seminar
 
Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012
 
Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12
Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12
Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNA
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
 
Research Solutions for Education
Research Solutions for EducationResearch Solutions for Education
Research Solutions for Education
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
 
User engagement in research data curation
User engagement in research data curationUser engagement in research data curation
User engagement in research data curation
 
Dc sheridan dlf_2011_final
Dc sheridan dlf_2011_finalDc sheridan dlf_2011_final
Dc sheridan dlf_2011_final
 
2013-01-17 Research Object
2013-01-17 Research Object2013-01-17 Research Object
2013-01-17 Research Object
 
Data Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsData Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future Jobs
 
Innovation, community, sustainability
Innovation, community, sustainabilityInnovation, community, sustainability
Innovation, community, sustainability
 
Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiences
 

More from Jose Enrique Ruiz

Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroidsJose Enrique Ruiz
 
IPython Notebooks - Hacia los papers ejecutables
IPython Notebooks - Hacia los papers ejecutablesIPython Notebooks - Hacia los papers ejecutables
IPython Notebooks - Hacia los papers ejecutablesJose Enrique Ruiz
 
Implementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxiesImplementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxiesJose Enrique Ruiz
 
Open Science and Executable Papers
Open Science and Executable PapersOpen Science and Executable Papers
Open Science and Executable PapersJose Enrique Ruiz
 
Digital Science: Towards the executable paper
Digital Science: Towards the executable paperDigital Science: Towards the executable paper
Digital Science: Towards the executable paperJose Enrique Ruiz
 
Digital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyDigital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyJose Enrique Ruiz
 
Workflows to access and massage VOData
Workflows to access and massage VODataWorkflows to access and massage VOData
Workflows to access and massage VODataJose Enrique Ruiz
 
Curation and Characterization of Web Services
Curation and Characterization of Web ServicesCuration and Characterization of Web Services
Curation and Characterization of Web ServicesJose Enrique Ruiz
 
Wf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationWf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationJose Enrique Ruiz
 
Workflows in the Virtual Observatory
Workflows in the Virtual ObservatoryWorkflows in the Virtual Observatory
Workflows in the Virtual ObservatoryJose Enrique Ruiz
 
VO web-services-based astronomy workflows
VO web-services-based astronomy workflowsVO web-services-based astronomy workflows
VO web-services-based astronomy workflowsJose Enrique Ruiz
 
Curating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsCurating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsJose Enrique Ruiz
 
Collaborative Digital Experiments
Collaborative Digital ExperimentsCollaborative Digital Experiments
Collaborative Digital ExperimentsJose Enrique Ruiz
 
El Observatorio Virtual - eCA
El Observatorio Virtual - eCAEl Observatorio Virtual - eCA
El Observatorio Virtual - eCAJose Enrique Ruiz
 
Multidimensional Data in the VO
Multidimensional Data in the VOMultidimensional Data in the VO
Multidimensional Data in the VOJose Enrique Ruiz
 

More from Jose Enrique Ruiz (18)

Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroids
 
IPython Notebooks - Hacia los papers ejecutables
IPython Notebooks - Hacia los papers ejecutablesIPython Notebooks - Hacia los papers ejecutables
IPython Notebooks - Hacia los papers ejecutables
 
Velocity cubes of galaxies
Velocity cubes of galaxiesVelocity cubes of galaxies
Velocity cubes of galaxies
 
Implementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxiesImplementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxies
 
Open Science and Executable Papers
Open Science and Executable PapersOpen Science and Executable Papers
Open Science and Executable Papers
 
Digital Science: Towards the executable paper
Digital Science: Towards the executable paperDigital Science: Towards the executable paper
Digital Science: Towards the executable paper
 
Digital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyDigital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in Astronomy
 
Workflows to access and massage VOData
Workflows to access and massage VODataWorkflows to access and massage VOData
Workflows to access and massage VOData
 
Curation and Characterization of Web Services
Curation and Characterization of Web ServicesCuration and Characterization of Web Services
Curation and Characterization of Web Services
 
Digital Science
Digital ScienceDigital Science
Digital Science
 
Wf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationWf4Ever: Workflow Preservation
Wf4Ever: Workflow Preservation
 
Workflows in the Virtual Observatory
Workflows in the Virtual ObservatoryWorkflows in the Virtual Observatory
Workflows in the Virtual Observatory
 
Workflow Preservation
Workflow PreservationWorkflow Preservation
Workflow Preservation
 
VO web-services-based astronomy workflows
VO web-services-based astronomy workflowsVO web-services-based astronomy workflows
VO web-services-based astronomy workflows
 
Curating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsCurating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital Experiments
 
Collaborative Digital Experiments
Collaborative Digital ExperimentsCollaborative Digital Experiments
Collaborative Digital Experiments
 
El Observatorio Virtual - eCA
El Observatorio Virtual - eCAEl Observatorio Virtual - eCA
El Observatorio Virtual - eCA
 
Multidimensional Data in the VO
Multidimensional Data in the VOMultidimensional Data in the VO
Multidimensional Data in the VO
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Wf4Ever: Advanced Workflow Preservation Technologies for Enhanced Science i

  • 1. Wf4Ever: Advanced Workflow Preservation Technologies for Enhanced Science Grant agreement no.: 27092 Data Curation & Preservation Session Jose Enrique Ruiz IAA-CSIC 8th December 2010 IVOA 2010 Fall Interoperability Meeting - Nara
  • 2. Introduction The Team 1.  Intelligent Software Components (ISOCO, Spain) 2.  University of Manchester (UNIMAN, UK) 3.  Universidad Politécnica de Madrid (UPM, Spain) 2 7 5 4 4.  Poznan Supercomputing and Networking Centre (PSNC, Poland) 5.  Universisty of Oxford (OXF, UK) 6.  Instituto de Astrofísica de Andalucía (IAA, Spain) 13 7.  Leiden University Medical Centre (LUMC, NL) 6 2
  • 3. Introduction The Consortium Classified Partners Technological Core Competencies » One SME » Digital Libraries » Six public organizations » Workflow Management » Semantic Web » Integrity & Authenticity » Provenance Major Sectors Case Studies » Education » Workflow Preservation in Astronomy (IAA) » IT » Workflow Preservation for Genome-wide » Astronomy Analysis and Biobanking (LUMC) » Bioinfiormatics 3
  • 4. 4
  • 5. 5
  • 6. 6
  • 7. 7
  • 8. 8
  • 9. 9
  • 11. Scientific Workflows State of the art A Scientific Workflow can be seen as the combination of » Central in experimental science data and processes into a configurable, structured set of ›  Enable automation steps that implement semi-automated computational ›  Make science repeatable (and sometimes solutions in scientific problem-solving reproducible) ›  Encourage best practices » Scientist-friendly ›  Aimed at (some types of) scientists, possibly even without strong computational skills » Communities: need for scientific data preservation ›  Enhance scientific development by building on, sharing, and extending previous results within scientific communities » However, workflow preservation is especially complex ›  Workflows not only specified statically at design time but also interpreted through their execution ›  Complex models are required to describe workflows and related resources, including documents, data and services ›  Resources often beyond control of scientists 11
  • 12. Project Objectives Goals Technological infrastructure for the preservation and efficient retrieval and reuse of scientific workflows in a range of disciplines » Creation and management of complex Research Objects that take into account the dual nature (static and dynamic) of scientific workflows » Archival, classification, and indexing of scientific workflows and their associated materials in scalable semantic repositories, providing advanced access and recommendation capabilities » Creation of scientific communities to collaboratively share, reuse and evolve workflows and their parts, stimulating the development of new scientific knowledge 12
  • 13. Integrity & Authenticy Definitions Integrity Authenticity » The quality or condition of being whole, » Authenticity has a twofold dimension: data complete and unaltered origin and entity authenticity » Crucial for ensuring the quality of » Data origin: Proof of the origin of data, preserved data in Research Objects their genuineness, trustworthiness and realness » Entity: ensuring that an entity, e.g. a person or other kind of actor, is the one it claims to be 13
  • 14. Integrity and Authenticity Maintenance Objectives Evaluate and preserve the integrity and authenticity of archived Research Objects » To ensure the data can be accessed and interpreted unchanged, complete, and correct today and in the future Provenance-based means to calculate » To preserve the integrity of archived Research Objects by tracking and verifying changes in measures of integrity archived objects as well as related resources and authenticity » To assist scientists in anticipating potential inconsistencies caused by uncontrolled changes in such resources » To verify and proof the authenticity of authors and contributors to Research Objects as well as of internal and related resources 14
  • 15. Project Objectives Common needs from the Community What do you want to know when accessing a workflow ? » If I can use it for my purposes (in my words) » If I can expect it to run, when it was last run, by whom » What it does quickly, by one of » example input / output (and trying it) » a description » ‘reading’ its key parts » what is was used for » related workflows » its creator » contacting the creator or last user » How I need to cite the author and workflow 15
  • 16. Project Objectives Common needs from the Community What do you want to know when sharing a workflow ? » What rights others have » What a good workflow is to get a good score » Make my workflow findable, reusable, and ready for review » Instructions to authors » Two types of contributions: serious science, preliminary/playing around » If my workflow may have issues » What the system or other users think it does » How it relates to other things » Share freely or anonymously upon request 16
  • 17. Project Objectives Main Challenges Quality •  Workflow + resources Preservation •  Sharing & Reuse •  Manipulation •  Affinity •  Classification, •  Store •  Incremental categorization •  Access development •  Similarity •  Evolution •  Credit, citation •  Abstraction •  Scale •  Versioning Representation Community Attribution, accountability 17
  • 18. Project Objectives Main Challenges Quality INTEGRITY •  Workflow + resources Preservation •  Sharing & Reuse •  Manipulation •  Affinity •  Classification, •  Store •  Incremental categorization •  Access development •  Similarity •  Evolution •  Credit, citation •  Abstraction •  Scale •  Versioning Representation AUTHENTICITY Community Attribution, accountability 18
  • 19. Evaluation, Validation & Community Building Wf4Ever Case Studies Two workflow-intensive scientific case studies in the domains of Astronomy and Genomics Astronomy Genomics » Application area: Virtual Observatory (VO) » Application areas: Biobanking and data processing Genome-Wide analisys ›  Astrophysical quantities propagation ›  Interpretation of GWAS data ›  Source extraction on CCD images ›  Gene expression studies ›  Modeling of galaxy 3D data » Focus on authenticity and experimental » Focus on bringing workflow-based reproducibility methodologies into Astronomy » Community! Lots of available workflows » Creation of Golden Exemplars ›  myExperiment » Beachhead ›  SysMO-DB » Long tradition of workflow application Overall goals » To collect and preserve existing workflows and their related objects in each area » To create scientific communities around the use and preservation of scientific workflows » To apply, evaluate, and provide feedback on the results obtained from system and component- level research 19
  • 20. Astronomy WorkPackage Main Goals WP5: Workflow Preservation in Astronomy Scientific contribution »  evelopment of an online community of scientists working on Astronomy D » ntroduction of workflow and workflow preservation needs in Astronomy and the I Virtual Observatory »  rovide a set of workflows for frequently used complex task-combinations and P demands in the Astronomy domain Technological contribution »  nline repository of preserved Astronomy workflows identifying preservation needs O »  reation of three Golden Exemplars workflows: C ›  using Wf4Ever results ›  involving additional implementations for wrapping VO Web services Workflow-based methodology deployed in the VO community through exemplars and preservation methodologies 20
  • 21. Thanks for your Attention! Questions http://www.wf4ever-project.org 21