SlideShare une entreprise Scribd logo
1  sur  32
The 10 Best Practices
                 for Workflow Design
                               BioVeL M6 Workshop
                             Göteborg, May 10-11, 2012
         Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft , Carole Goble (myGrid)
Thanks: BioSemantics Group (LUMC), myGrid team (UoM), Yassene Mohamed, Harish Dharuri (LUMC)
Our specialty: Knowledge Discovery
                                                                       http://biosemantics.org




                   Disambiguation*
                     Text Mining




                             Substrates for
                              Knowledge
                              Discovery

                                                         Methods for
                                                      Knowledge Discovery


                       Applications
                       •Predict protein-protein, protein-disease associations, gene prioritization
                       •Genotype-phenotype studies, e.g. Huntington’s Disease, Metabolic Syndrome
                       •Yours?


* Global disambiguation initiative: http://snipurl.com/conceptweballiance                            2
Introduction
                               Why build good workflows?


Good workflow design = good science!




                                                         3
Introduction
                      Best practices for workflow design




 Best Practices for workflow design
                  =
Best Practices experimental science
                  +
Best Practices software engineering



                                                        4
1
Make a sketch workflow




                         5
Best practice 1
                                     Sketch an Abstract Workflow




Powerpoint courtersy of Eleni Mina

                                                                 6
2
Use modules




              7
http://www.myexperiment.org/workflows/74.html


                                                8
3
     Think about the output
(and the data in your workflow in general)




                                             9
Best practice 3
Think about the output




    ?
            http://...




                         10
4
Provide example inputs and outputs




                                     11
Taverna 2.3 Recipe
      Taverna 2.4
  Select input/output
Right-click input/output
  Select tab ‘Details’
  Select ‘Annotation’
   Click ‘Annotation’
     Add Example
     Add Example



                           12
5
Annotate




           13
Best practice 5
                       Annotate
Each component in
 Taverna can be
    annotated




                                14
Best practice 5
Annotate and help your users




                            15
6
Make workflow executable from
 outside the local environment



                                 16
Best practice 6
                                         Make workflow executable by others

How to check that others can execute your workflow?


» Try it!                                               Proof of executability
   › Ask a colleague
   › Use an external t2web runner

» Tips
   › Use Web Services
   › If you use local command line tools
      • Install tools on a publicly accessible server (e.g. applies to Rserve)
      • Use system that your users can set up (e.g. BioLinux)



                                                                                 17
7
Choose services carefully




                            18
Best practice 7
Choose services carefully




                         19
Best practice 7
Choose services carefully




                         20
8
Reuse existing workflows




                           21
Best practice 8
                                                             The reuse workflow


                                                           Not a best practice,
                                                           but a tip: know-how is
   Check                                                   important for reuse
                        Contact authors
workflows on
                 Neg.       Retry
myExperiment
       Pos.                                       Use scripts from
                                           Neg.
                                                    colleagues

Check services                                               Search the
                        Contact authors
     on                                                       internet
                 Neg.       Retry
 BioCatalogue
       Pos.
                                                                     Invent a new
                                                                        wheel


                        Reuse, Attribute
                        Respect licences


                                                                                    22
9
Advertise




            23
Advertise




 Unique reference for
in your papers and for
     others to cite




                                     24
10
Maintain




           25
Best Practice 10
                                                                     Maintain

Best practices to support maintenance

» Regularly check your workflow
   › Ask colleagues
» Enable support for maintenance
   › Register your workflow on myExperiment
   › Register Web Services on
» Enable peers to repair: annotate!

» Note about versioning
   › No need to register all edits on myExperiment: use subversion
   › Register important updates on myExperiment


                                                                             26
Bonus tip
Use common sense as scientist




                                27
Workflow Forever
                Preservation of good workflows for
                        future applications
 Workflow 74
 “Protein Discovery”
 2005




Workflow 2876
“Match gene lists
by literature” 2012




  Workflow 2805
  “Get Pathway genes”
   2012



                                                     28
Wf4Ever
  Outcomes for BioVeL




myExperiment 2.0
BioCatalogue
Taverna



Research Objects
Linked Data

Methods
Protocols for
   Preservation
   and
   Conservation


                  29
The 10 Best Practices of Workflow Design
                                                                                Thank you

Thank you for your attention
More information:
http://snipurl.com/workflowbestpractices

1.    Make a sketch workflow
2.    Use modules
3.    Think about the output
4.    Provide example inputs and outputs
5.    Annotate
6.    Make it executable from outside the local environment
7.    Choose services carefully
8.    Reuse existing workflows
9.    Advertise
10.   Maintain


                                                                                          30
Wf4Ever tooling
Sneak preview




             31
Supporting information
                                                             Workflow jargon



› Scientific workflow
  Paradigm to describe, manage, and share complex scientific analyses
› Workflow system
  Software to design, execute, and monitor scientific workflows
› Module
  = nested workflow = workflow in a workflow = workflow component
› Beanshell script
  A Java-based scripting language.
  Typically used for data type conversions in Taverna.
› Provenance
  History or trace of a workflow run.
  Allows you to look at intermediate data, which workflows and services
  were run, with what data.


                                                                              32

Contenu connexe

En vedette

Workflow Presentation
Workflow PresentationWorkflow Presentation
Workflow Presentationmmcdevitt
 
Workflow Strategies ppt
Workflow Strategies pptWorkflow Strategies ppt
Workflow Strategies pptPeter Chanda
 
급대출//BU797。СΟΜ//법인신용대출 제3금융기관
급대출//BU797。СΟΜ//법인신용대출 제3금융기관급대출//BU797。СΟΜ//법인신용대출 제3금융기관
급대출//BU797。СΟΜ//법인신용대출 제3금융기관hsldfsod
 
Phát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt nam
Phát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt namPhát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt nam
Phát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt namhttps://www.facebook.com/garmentspace
 
Accounts Payable (AP) Process Flow
Accounts Payable (AP) Process FlowAccounts Payable (AP) Process Flow
Accounts Payable (AP) Process FlowMukeshkumar Raju
 
THE ADIDAS SLOGAN: FROM ‘’IMPOSSIBLE IS NOTHING’’ TO ‘’ADIDAS IS ALL IN ‘’
THE ADIDAS SLOGAN: FROM ‘’IMPOSSIBLE IS NOTHING’’  TO ‘’ADIDAS IS ALL IN ‘’THE ADIDAS SLOGAN: FROM ‘’IMPOSSIBLE IS NOTHING’’  TO ‘’ADIDAS IS ALL IN ‘’
THE ADIDAS SLOGAN: FROM ‘’IMPOSSIBLE IS NOTHING’’ TO ‘’ADIDAS IS ALL IN ‘’Irem Guler
 
Curettes Clinical Application Guide
Curettes Clinical Application GuideCurettes Clinical Application Guide
Curettes Clinical Application GuideHu-Friedy Mfg.
 
Oracle R12 Upgrade Lessons Learned
Oracle R12 Upgrade Lessons LearnedOracle R12 Upgrade Lessons Learned
Oracle R12 Upgrade Lessons Learnedbpellot
 
Composite restoration
Composite restorationComposite restoration
Composite restorationHazhar Ahmed
 
Avaya one touch video customer presentation march 1 2012
Avaya one touch video customer presentation march 1 2012Avaya one touch video customer presentation march 1 2012
Avaya one touch video customer presentation march 1 2012troysp
 
B2B Branding from Tata steel
B2B Branding from Tata steelB2B Branding from Tata steel
B2B Branding from Tata steelKIIT University
 
types and classification of dental implants
types and classification of dental implantstypes and classification of dental implants
types and classification of dental implantsDesa Ghanavi
 
Customer Relationship Marketing CRM
Customer Relationship Marketing CRMCustomer Relationship Marketing CRM
Customer Relationship Marketing CRMDR. SHAJAHAN mba phd
 
Automotive Industry Analysis of the Big 3
Automotive Industry Analysis of the Big 3Automotive Industry Analysis of the Big 3
Automotive Industry Analysis of the Big 3Matt Blair
 
Introduction to basic principles of pharmacology
Introduction to basic principles of pharmacologyIntroduction to basic principles of pharmacology
Introduction to basic principles of pharmacologyBalmukund Thakkar
 
Customer Relationship Management - Case Study [Mercedes Benz]
Customer Relationship Management - Case Study [Mercedes Benz]Customer Relationship Management - Case Study [Mercedes Benz]
Customer Relationship Management - Case Study [Mercedes Benz]Jas Singh Bhasin
 
Planning for New Hospital
Planning for New HospitalPlanning for New Hospital
Planning for New HospitalNc Das
 

En vedette (20)

Workflow Presentation
Workflow PresentationWorkflow Presentation
Workflow Presentation
 
Workflow Strategies ppt
Workflow Strategies pptWorkflow Strategies ppt
Workflow Strategies ppt
 
Workflow
WorkflowWorkflow
Workflow
 
급대출//BU797。СΟΜ//법인신용대출 제3금융기관
급대출//BU797。СΟΜ//법인신용대출 제3금융기관급대출//BU797。СΟΜ//법인신용대출 제3금융기관
급대출//BU797。СΟΜ//법인신용대출 제3금융기관
 
Phát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt nam
Phát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt namPhát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt nam
Phát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt nam
 
Accounts Payable (AP) Process Flow
Accounts Payable (AP) Process FlowAccounts Payable (AP) Process Flow
Accounts Payable (AP) Process Flow
 
DENTAL PLASTER
DENTAL PLASTERDENTAL PLASTER
DENTAL PLASTER
 
THE ADIDAS SLOGAN: FROM ‘’IMPOSSIBLE IS NOTHING’’ TO ‘’ADIDAS IS ALL IN ‘’
THE ADIDAS SLOGAN: FROM ‘’IMPOSSIBLE IS NOTHING’’  TO ‘’ADIDAS IS ALL IN ‘’THE ADIDAS SLOGAN: FROM ‘’IMPOSSIBLE IS NOTHING’’  TO ‘’ADIDAS IS ALL IN ‘’
THE ADIDAS SLOGAN: FROM ‘’IMPOSSIBLE IS NOTHING’’ TO ‘’ADIDAS IS ALL IN ‘’
 
Curettes Clinical Application Guide
Curettes Clinical Application GuideCurettes Clinical Application Guide
Curettes Clinical Application Guide
 
Servicenow ppt
Servicenow pptServicenow ppt
Servicenow ppt
 
Oracle R12 Upgrade Lessons Learned
Oracle R12 Upgrade Lessons LearnedOracle R12 Upgrade Lessons Learned
Oracle R12 Upgrade Lessons Learned
 
Composite restoration
Composite restorationComposite restoration
Composite restoration
 
Avaya one touch video customer presentation march 1 2012
Avaya one touch video customer presentation march 1 2012Avaya one touch video customer presentation march 1 2012
Avaya one touch video customer presentation march 1 2012
 
B2B Branding from Tata steel
B2B Branding from Tata steelB2B Branding from Tata steel
B2B Branding from Tata steel
 
types and classification of dental implants
types and classification of dental implantstypes and classification of dental implants
types and classification of dental implants
 
Customer Relationship Marketing CRM
Customer Relationship Marketing CRMCustomer Relationship Marketing CRM
Customer Relationship Marketing CRM
 
Automotive Industry Analysis of the Big 3
Automotive Industry Analysis of the Big 3Automotive Industry Analysis of the Big 3
Automotive Industry Analysis of the Big 3
 
Introduction to basic principles of pharmacology
Introduction to basic principles of pharmacologyIntroduction to basic principles of pharmacology
Introduction to basic principles of pharmacology
 
Customer Relationship Management - Case Study [Mercedes Benz]
Customer Relationship Management - Case Study [Mercedes Benz]Customer Relationship Management - Case Study [Mercedes Benz]
Customer Relationship Management - Case Study [Mercedes Benz]
 
Planning for New Hospital
Planning for New HospitalPlanning for New Hospital
Planning for New Hospital
 

Similaire à 10 Best Practices for Workflow Design

Get Out Of The Frying Pan
Get Out Of The Frying PanGet Out Of The Frying Pan
Get Out Of The Frying Panloriayre
 
2012 icse program comprehension
2012 icse program comprehension2012 icse program comprehension
2012 icse program comprehensionWalid Maalej
 
Accelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab Management
Accelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab ManagementAccelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab Management
Accelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab ManagementBIOVIA
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and SharingJisc
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpCarole Goble
 
The View - Lotusscript coding best practices
The View - Lotusscript coding best practicesThe View - Lotusscript coding best practices
The View - Lotusscript coding best practicesBill Buchan
 
Devnology back toschool software reengineering
Devnology back toschool software reengineeringDevnology back toschool software reengineering
Devnology back toschool software reengineeringDevnology
 
Apollon - 22/5/12 - 11:30 - Local SME's - Innovating Across borders
Apollon - 22/5/12 - 11:30 - Local SME's - Innovating Across bordersApollon - 22/5/12 - 11:30 - Local SME's - Innovating Across borders
Apollon - 22/5/12 - 11:30 - Local SME's - Innovating Across bordersimec.archive
 
Break out: Project Communication and Dissemination - Koen De Vos
Break out: Project Communication and Dissemination - Koen De VosBreak out: Project Communication and Dissemination - Koen De Vos
Break out: Project Communication and Dissemination - Koen De Vosimec.archive
 
Devtest: using Lean and Devops practices to bring QA and coders together by L...
Devtest: using Lean and Devops practices to bring QA and coders together by L...Devtest: using Lean and Devops practices to bring QA and coders together by L...
Devtest: using Lean and Devops practices to bring QA and coders together by L...Institut Lean France
 
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...Matthew Skelton
 
Usability testing for qualitative researchers
Usability testing for qualitative researchersUsability testing for qualitative researchers
Usability testing for qualitative researchersKay Corry Aubrey
 
Usability testing for qualitative researchers
Usability testing for qualitative researchersUsability testing for qualitative researchers
Usability testing for qualitative researchersResearchShare
 
The View - 30 proven Lotuscript tips
The View - 30 proven Lotuscript tipsThe View - 30 proven Lotuscript tips
The View - 30 proven Lotuscript tipsBill Buchan
 
Piloting agile project management
Piloting agile project managementPiloting agile project management
Piloting agile project managementNatalie Collins
 

Similaire à 10 Best Practices for Workflow Design (20)

Methodologies for Cross-Border Living Labs Networking Hans Schaffer
Methodologies for Cross-Border Living Labs Networking  Hans SchafferMethodologies for Cross-Border Living Labs Networking  Hans Schaffer
Methodologies for Cross-Border Living Labs Networking Hans Schaffer
 
Get Out Of The Frying Pan
Get Out Of The Frying PanGet Out Of The Frying Pan
Get Out Of The Frying Pan
 
2012 icse program comprehension
2012 icse program comprehension2012 icse program comprehension
2012 icse program comprehension
 
Accelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab Management
Accelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab ManagementAccelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab Management
Accelrys Announces Experiment Knowledge Base (EKB) for Enterprise Lab Management
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and Sharing
 
Usability Testing Basics
Usability Testing BasicsUsability Testing Basics
Usability Testing Basics
 
Django in the Real World
Django in the Real WorldDjango in the Real World
Django in the Real World
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects help
 
The View - Lotusscript coding best practices
The View - Lotusscript coding best practicesThe View - Lotusscript coding best practices
The View - Lotusscript coding best practices
 
Devnology back toschool software reengineering
Devnology back toschool software reengineeringDevnology back toschool software reengineering
Devnology back toschool software reengineering
 
Apollon - 22/5/12 - 11:30 - Local SME's - Innovating Across borders
Apollon - 22/5/12 - 11:30 - Local SME's - Innovating Across bordersApollon - 22/5/12 - 11:30 - Local SME's - Innovating Across borders
Apollon - 22/5/12 - 11:30 - Local SME's - Innovating Across borders
 
Grandma's recipe for DevOps adoption
Grandma's recipe for DevOps adoptionGrandma's recipe for DevOps adoption
Grandma's recipe for DevOps adoption
 
Break out: Project Communication and Dissemination - Koen De Vos
Break out: Project Communication and Dissemination - Koen De VosBreak out: Project Communication and Dissemination - Koen De Vos
Break out: Project Communication and Dissemination - Koen De Vos
 
Devtest: using Lean and Devops practices to bring QA and coders together by L...
Devtest: using Lean and Devops practices to bring QA and coders together by L...Devtest: using Lean and Devops practices to bring QA and coders together by L...
Devtest: using Lean and Devops practices to bring QA and coders together by L...
 
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
Practical operability techniques for teams - Matthew Skelton - Conflux - Cont...
 
Taverna as a service
Taverna as a serviceTaverna as a service
Taverna as a service
 
Usability testing for qualitative researchers
Usability testing for qualitative researchersUsability testing for qualitative researchers
Usability testing for qualitative researchers
 
Usability testing for qualitative researchers
Usability testing for qualitative researchersUsability testing for qualitative researchers
Usability testing for qualitative researchers
 
The View - 30 proven Lotuscript tips
The View - 30 proven Lotuscript tipsThe View - 30 proven Lotuscript tips
The View - 30 proven Lotuscript tips
 
Piloting agile project management
Piloting agile project managementPiloting agile project management
Piloting agile project management
 

Dernier

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Dernier (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

10 Best Practices for Workflow Design

  • 1. The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft , Carole Goble (myGrid) Thanks: BioSemantics Group (LUMC), myGrid team (UoM), Yassene Mohamed, Harish Dharuri (LUMC)
  • 2. Our specialty: Knowledge Discovery http://biosemantics.org Disambiguation* Text Mining Substrates for Knowledge Discovery Methods for Knowledge Discovery Applications •Predict protein-protein, protein-disease associations, gene prioritization •Genotype-phenotype studies, e.g. Huntington’s Disease, Metabolic Syndrome •Yours? * Global disambiguation initiative: http://snipurl.com/conceptweballiance 2
  • 3. Introduction Why build good workflows? Good workflow design = good science! 3
  • 4. Introduction Best practices for workflow design Best Practices for workflow design = Best Practices experimental science + Best Practices software engineering 4
  • 5. 1 Make a sketch workflow 5
  • 6. Best practice 1 Sketch an Abstract Workflow Powerpoint courtersy of Eleni Mina 6
  • 9. 3 Think about the output (and the data in your workflow in general) 9
  • 10. Best practice 3 Think about the output ? http://... 10
  • 11. 4 Provide example inputs and outputs 11
  • 12. Taverna 2.3 Recipe Taverna 2.4 Select input/output Right-click input/output Select tab ‘Details’ Select ‘Annotation’ Click ‘Annotation’ Add Example Add Example 12
  • 14. Best practice 5 Annotate Each component in Taverna can be annotated 14
  • 15. Best practice 5 Annotate and help your users 15
  • 16. 6 Make workflow executable from outside the local environment 16
  • 17. Best practice 6 Make workflow executable by others How to check that others can execute your workflow? » Try it! Proof of executability › Ask a colleague › Use an external t2web runner » Tips › Use Web Services › If you use local command line tools • Install tools on a publicly accessible server (e.g. applies to Rserve) • Use system that your users can set up (e.g. BioLinux) 17
  • 19. Best practice 7 Choose services carefully 19
  • 20. Best practice 7 Choose services carefully 20
  • 22. Best practice 8 The reuse workflow Not a best practice, but a tip: know-how is Check important for reuse Contact authors workflows on Neg. Retry myExperiment Pos. Use scripts from Neg. colleagues Check services Search the Contact authors on internet Neg. Retry BioCatalogue Pos. Invent a new wheel Reuse, Attribute Respect licences 22
  • 24. Advertise Unique reference for in your papers and for others to cite 24
  • 26. Best Practice 10 Maintain Best practices to support maintenance » Regularly check your workflow › Ask colleagues » Enable support for maintenance › Register your workflow on myExperiment › Register Web Services on » Enable peers to repair: annotate! » Note about versioning › No need to register all edits on myExperiment: use subversion › Register important updates on myExperiment 26
  • 27. Bonus tip Use common sense as scientist 27
  • 28. Workflow Forever Preservation of good workflows for future applications Workflow 74 “Protein Discovery” 2005 Workflow 2876 “Match gene lists by literature” 2012 Workflow 2805 “Get Pathway genes” 2012 28
  • 29. Wf4Ever Outcomes for BioVeL myExperiment 2.0 BioCatalogue Taverna Research Objects Linked Data Methods Protocols for Preservation and Conservation 29
  • 30. The 10 Best Practices of Workflow Design Thank you Thank you for your attention More information: http://snipurl.com/workflowbestpractices 1. Make a sketch workflow 2. Use modules 3. Think about the output 4. Provide example inputs and outputs 5. Annotate 6. Make it executable from outside the local environment 7. Choose services carefully 8. Reuse existing workflows 9. Advertise 10. Maintain 30
  • 32. Supporting information Workflow jargon › Scientific workflow Paradigm to describe, manage, and share complex scientific analyses › Workflow system Software to design, execute, and monitor scientific workflows › Module = nested workflow = workflow in a workflow = workflow component › Beanshell script A Java-based scripting language. Typically used for data type conversions in Taverna. › Provenance History or trace of a workflow run. Allows you to look at intermediate data, which workflows and services were run, with what data. 32

Notes de l'éditeur

  1. Designing a good workflow is part of doing good research!
  2. This means that if you know about one or both of them, you should apply their principles to workflow design as well. (At the end we can say that using common sense about doing good science is a general best practice for creating workflows too.) Workflow design is a variant of software design Define hypothesis and approach Sketch a workflow of the approach Implement workflow Trial and error (iterate) Comment: where are the workflow design patterns?
  3. Boxes without content, can be in Taverna using e.g. empty script boxes, a powerpoint flow chart, or a napkin; if it is digital (e.g. Taverna) then we can store it digitally. < Comment: add concept mining workflow and a sketch Cite Eleni: 'helps me to share workflow while developing it, that makes it better‘ > How? In Taverna using empty beanshells In PowerPoint In a sketch book Why? Provides a reference point of the main task(s) of the workflow through the implementation process Promots sharing between computer and workflow systems due to its non-explicit nature Helps design experiment Helps communication (supervisors, colleagues)
  4. The workflow on the left explains the basic steps of a text mining process. The expanded workflow is much harder to understand. We can use each nested workflow as a workflow on its own. How? Describe and implement each of the executable processes in a workflow individually and independently In Taverna this can be done through nested workflows Why? Facilitates independent testing and validation of the execution of each of the individual modules Encourages re-use Note: Make sure that you publish the separate modules as well as the final nested workflow (unfortunately, myExperiment does not support this very well), or at least annotate the components when you publish the whole
  5. How? Consider if you want to populate data models/databases or create outputs of disconnected collections of files Consider who the results are for (overview for users, or the next workflow component) General advice: at least have a report as an output (provenance will have the separate parts anyway) Use Taverna for provenance collection (intermediate results are captured by provenance engine) Why? Easier to think about this at the design stage than trying to adjust a ready workflow Structure potentially large output data
  6. How? Example inputs and outputs can be recorded in Taverna Alternatively: add input or output files to a pack containing the workflow Use real example data Why? To help understand the workflow For validation For maintenance Note: Make sure that the input and the output examples are coupled. Keep in mind that the output has a timestamp. It may change due to changes in underlying databases.
  7. How? Choose meaningful names for the workflow title, inputs, outputs, and for the processes that constitute the workflow. Focus on how a component is used in this workflow and why it is in there. If it exists, reference to information about what the component does in general (e.g. by referencing a service on BioCatalogue) Assume that a referenced resource may disappear or change at some time in the future Use Taverna description fields and example fields*. Taverna keeps it with the workflow and myExperiment uses this information. Keep any notes that are related to the workflow, but not part of it, linked to it* Example of useful "extra" information: execution time, keywords, contact information, attribution myExperiment offers some of this, but best to put it in the workflow descriptions Why? Doing good science Record what is needed for a publication later on Increase re-usability Cite Kostas: ‘many workflows are badly documented computer programs' The wf4ever project will provide additional support (and incentives) for describing (the purpose of) workflow components, related objects and references (e.g. data sets), and support for storing the elements of an experiment with their metadata in a structured way.
  8. Facilitate understanding and reuse
  9. How? Use Web Services, any Taverna widget except external tool, and external tool only when it runs over ssh on publicly accessible server Use Taverna with local tools, but installed on a publicly accessible server with the Taverna server Use local tools from an easy to set up environment such as biolinux (only for a certain niche of users) TRY IT!! Why? Others will be able to run the workflow Proof of reproducibility
  10. How? Choose the service that is reliable based on: BioCatalogue reliability statistics (in practice: check on biocatalogue if it has a green light (momentarily not much more you can do)) How often it is used in other workflows Contact with service providers. Communicate! The reputation of the institution providing the service check trustworthiness of service provider (can also be a person, of whom you can check if they will remain at an institution to maintain the service) Why? Prevent workflow decay, prolong the life of the workflow Note to service developers: Many work around and ugly workflow practices come from having to deal with badly behaved services!
  11. Web Services are digital, their creators not. Communication saves web services and workflows from decay.
  12. A common misconception is that because they are workflows, they are automatically stable. It takes effort and often communication to reuse work, especially when using ‘state-of-the-art’ products made by scientists. How? Make your own workflows modular since this promotes reuse Search myExperiment and filter on most downloaded or most viewed Check if it has been used in a publication Use your contacts: maybe someone has tried to solve something similar before using a workflow? Try and try harder, contact authors! Why? Another user that is familiar with one of your workflows, is more likely to understand another workflow that you designed Beneficial when repairing workflows: By repairing a given workflow may entails repairing the workflows in which it is used as a subworkflow Fights redundancy Note: attribute others and respect licenses
  13. http://myExperiment.org/workflows/74?version=12 http://myExperiment.org/packs/258 How? Share your workflow on (don’t forget contact info!): myExperiment other social media e-mailing it around to colleagues Cite your workflow when publishing, using a stable identifier like myExperiment Make use of the pack functionality in myExperiment to bundle your workflow with other important documents such as a publication Why? Good science – share your results Get cited – fame! Progress, let others build on your work without reinventing it
  14. How? Act on information about services that are deprecated by changing services providing a note that that specific process in the workflow in not executable anymore Put your services on BioCatalogue (don't have to be the owner) and your workflows on myExperiment (notification iits planned) Regularly test the workfow (like 'unit tests') Why? Good practice – this is already demanded for some types of publications, like an application note in Bioinformatics Fight workflow decay, prolong the life of the workflow
  15. A Scientific Workflow can be seen as the combination of data and processes into a configurable, structured set of steps that implement semi-automated computational solutions in scientific problem-solving i.e. the implementation of a scientific method Need to be preserved (and conserved). More on this later.
  16. Could we skip this slide to save time?