SlideShare a Scribd company logo
1 of 21
Workflows on the Cloud:
        Scaling for National Service
 Katy Wolstencroft, Robert Haines, Helen Hulme,
Mike Cornell, Shoaib Sufi, Andy Brass, Carole Goble
            University of Manchester, UK
          Madhu Donepudi, Nick James
               Eagle Genomics Ltd, UK
Motivation: Workflows for
                                        Diagnostics
NHS genetic testing, e.g. colon disease
Annotation of SNPs in patient data, ready for interpretation by clinician.
Diagnostic Testing Today
Purify DNA. PCRs exons of relevant genes (MLH1, MSH2, MSH6).
Sequence, identify variants, classify: (pathogenic, not pathogenic,
unknown significance etc.).
Writes report to clinician
Diagnostic Testing Tomorrow (or later today) uses whole genome
sequencing
                                                        ANNOTATE, FILTER,
                                                            DISPLAY
                         Next
                         Gen
                         Seq         Variation
                         data          data




     New problem: How do we classify all the variants that we
     discover?
Taverna Workflows
   Sophisticated analysis pipelines
   A set of services to analyse or
    manage data (either local or
    remote)
   Workflows run through the
    workbench or via a server
   Automation of data flow through
    services
   Control of service invocation
   Iteration over data sets
   Provenance collection
   Extensible and open source
Taverna
                                    http://www.taverna.org.uk/
     Freely available
       open source
   Current Version 2.4

  80,000+ downloads
     across version



Part of the myGrid Toolkit



  Windows/Mac OS X/
      Linux/unix


     Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W729-32.
     Taverna: a tool for building and running workflows of services.
     Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T.
SNP annotation


Annotation task
Location, Gene, Transcript
Present in public databases,
dbSNP etc                            Workflows are good
Frequency in e.g. 1000 genome       for collecting and
data                                 integrating data from
                                     a variety of sources,
Conservation data (cross species)
                                     into one place
Variant Classification
                              SNP


Nonsense: base         Synonymous            Missense: Non-
insertion, causing a                         synonymous
frameshift


                                               Affects on function
                       Affects on splicing
                                               or splicing
Premature Stop
Nonsense codon
SNP Filtering / Triage
Which SNPs are the most important?
Reduction of 80K data points to those with
potential clinical significance.
Criteria
Reduce   to (disease)-specific gene list
Sense < Missense < Stop codon etc

Based on prediction tool scores

Frequency in population (based on 1000 genome data etc)
(high frequency implies non deleterious)
Conservation across species (implies that change is
deleterious)
Workflow
   Taverna’s “Tool Service” feature –
    used to wrap Perl scripts and other
    command line applications
   Uses VEP (Ensembl)
   Passes references to files
Workflow Provenance

Record inferences in clinical decisions

   What were the parameters used to build the
    dataset
   What versions of databases, genome assembly,
    machine
   Where does each piece of evidence for/against
    pathogenicity originate from?
Infrastructure Requirements


   Execute analysis workflows
   Accessible to clinicians and genetic testers
   Cope with expanding demands on compute
   Provide a secure environment
   Collect provenance
Architecture overview
All user interaction      User data stored in          Data for all tools and Web Services
via web interface         the Cloud                    stored in the Cloud


 Input
 SNPs
                Web                        Storage                      Ensembl       Cache
              interface                      (S3)                       (mySQL)        (S3)


Results
             Workflow       Taverna       Taverna            Application specific tools
                                            Taverna
                                             Taverna
              engine
                             e-Hive        Server
                                             Server             and Web Services
                                                              Application specific tools and
                                                               Application specific tools and
                                              Server
            orchestrato
                                                                    WSWebServices
                                                                     WebWS     Too      Too
                                                              WS          Services
                 r           other                                              l        l




          Unified access to different           Tools and Web Services for
          workflow engines with our             each workflow are installed
          common REST API                       together for easy replication
Workflow engine orchestration
  Workflow engine                     Orchestrator is workflow
   orchestrator
                                       executor agnostic
                                      Uses common API to:
   Common REST API
                                          List workflows
                                          Configure runs
  e-Hive        Taverna
Interface      Interface   Cache          Start runs
                                          Manage current runs
   Engine specific APIs                       Status
                                              Progress
 e-Hive         Taverna
                                          Delete runs
Additional Taverna Functionality
   Integration with Cloud infrastructure
       AWS first


   Read/write files securely to S3
   Start and stop Cloud instances if required
       Tool and Web Service scaling
       Self-scaling


   Released as part of Taverna 3
The user’s view
   Curated set of workflows
       Designed, built and tested by domain experts
       Quality assurance tested (if appropriate)
   Workflows are presented as applications
       The workflows themselves are hidden
       Configured and run via a web interface
   All user data stored securely in the Cloud
       User separation


   Workflows as a Service
Web interface: Overview
   Upload input data
   Configure workflow runs with
       Input parameters
       Uploaded data
       Reused output data
   Start workflow runs
   Monitor workflow runs
   View results preview
   Download complete results
Web interface: Getting started
Web interface: Creating a Run
Web interface: Checking run progress
A Typical Workflow
   Parse files from SNP calling
    machines
   Annotate SNPs
   Predict effects (BioMart, VEP,
    polyphen)
Workflow as a Service

   The workflow IS the service
    Run restricted sets of Taverna workflows in the cloud
    Connects to other cloud based resources – storage, tools
      etc
    Users can tweak parameters, but not design their own
    Web portal access for scientists
    Data passed by reference instead of file
    Pay as you go – cheap at the point of use
    Elastic and available now
Acknowledgements/Partners
   University of
    Manchester
   Eagle Genomics
   Technology Strategy
    Board
       100932 - Cloud Analytics
        for Life Sciences
   National Health
    Service
   Amazon Web Services

More Related Content

Viewers also liked

Opportunities and Challenges for Running Scientific Workflows on the Cloud
Opportunities and Challenges for Running Scientific Workflows on the Cloud Opportunities and Challenges for Running Scientific Workflows on the Cloud
Opportunities and Challenges for Running Scientific Workflows on the Cloud lyingcom
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralPaolo Missier
 
The Case For Docker In Multi-Cloud Enabled Bioinformatics Applications
The Case For Docker In Multi-Cloud Enabled Bioinformatics ApplicationsThe Case For Docker In Multi-Cloud Enabled Bioinformatics Applications
The Case For Docker In Multi-Cloud Enabled Bioinformatics ApplicationsAhmed Abdullah
 
CloudFlow: Computational Cloud Services and Workflows for Agile Engineering
CloudFlow: Computational Cloud Services and Workflows for Agile EngineeringCloudFlow: Computational Cloud Services and Workflows for Agile Engineering
CloudFlow: Computational Cloud Services and Workflows for Agile EngineeringI4MS_eu
 
Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013
Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013
Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013Amazon Web Services
 
An optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingAn optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingDIGVIJAY SHINDE
 
Cloud Workflows for Procurement
Cloud Workflows for ProcurementCloud Workflows for Procurement
Cloud Workflows for ProcurementScatterwork GmbH
 
Auto-Scaling to Minimize Cost and Meet Application Deadlines in Cloud Workflows
Auto-Scaling to Minimize Cost and Meet Application Deadlines in Cloud WorkflowsAuto-Scaling to Minimize Cost and Meet Application Deadlines in Cloud Workflows
Auto-Scaling to Minimize Cost and Meet Application Deadlines in Cloud Workflowsmingtemp
 
Scaling wix with microservices architecture devoxx London 2015
Scaling wix with microservices architecture devoxx London 2015Scaling wix with microservices architecture devoxx London 2015
Scaling wix with microservices architecture devoxx London 2015Aviran Mordo
 
Scalable Media Workflows on the Cloud
Scalable Media Workflows on the Cloud Scalable Media Workflows on the Cloud
Scalable Media Workflows on the Cloud Amazon Web Services
 

Viewers also liked (10)

Opportunities and Challenges for Running Scientific Workflows on the Cloud
Opportunities and Challenges for Running Scientific Workflows on the Cloud Opportunities and Challenges for Running Scientific Workflows on the Cloud
Opportunities and Challenges for Running Scientific Workflows on the Cloud
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
 
The Case For Docker In Multi-Cloud Enabled Bioinformatics Applications
The Case For Docker In Multi-Cloud Enabled Bioinformatics ApplicationsThe Case For Docker In Multi-Cloud Enabled Bioinformatics Applications
The Case For Docker In Multi-Cloud Enabled Bioinformatics Applications
 
CloudFlow: Computational Cloud Services and Workflows for Agile Engineering
CloudFlow: Computational Cloud Services and Workflows for Agile EngineeringCloudFlow: Computational Cloud Services and Workflows for Agile Engineering
CloudFlow: Computational Cloud Services and Workflows for Agile Engineering
 
Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013
Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013
Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013
 
An optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computingAn optimized scientific workflow scheduling in cloud computing
An optimized scientific workflow scheduling in cloud computing
 
Cloud Workflows for Procurement
Cloud Workflows for ProcurementCloud Workflows for Procurement
Cloud Workflows for Procurement
 
Auto-Scaling to Minimize Cost and Meet Application Deadlines in Cloud Workflows
Auto-Scaling to Minimize Cost and Meet Application Deadlines in Cloud WorkflowsAuto-Scaling to Minimize Cost and Meet Application Deadlines in Cloud Workflows
Auto-Scaling to Minimize Cost and Meet Application Deadlines in Cloud Workflows
 
Scaling wix with microservices architecture devoxx London 2015
Scaling wix with microservices architecture devoxx London 2015Scaling wix with microservices architecture devoxx London 2015
Scaling wix with microservices architecture devoxx London 2015
 
Scalable Media Workflows on the Cloud
Scalable Media Workflows on the Cloud Scalable Media Workflows on the Cloud
Scalable Media Workflows on the Cloud
 

Similar to Wolstencroft K - Workflows on the Cloud: scaling for national service

Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013anpawlik
 
Asynchronous Mobile Web Services:
Asynchronous Mobile Web Services: Asynchronous Mobile Web Services:
Asynchronous Mobile Web Services: Dr. Fahad Aijaz
 
2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflows2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflowsmyGrid team
 
Arch stylesandpatternsmi
Arch stylesandpatternsmiArch stylesandpatternsmi
Arch stylesandpatternsmilord14383
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBasedarach
 
Denial of Service in Software Defined Netoworks
Denial of Service in Software Defined NetoworksDenial of Service in Software Defined Netoworks
Denial of Service in Software Defined NetoworksMohammad Faraji
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...netvis
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...GeeksLab Odessa
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifiAnshuman Ghosh
 
Communications Systems Research
Communications Systems ResearchCommunications Systems Research
Communications Systems ResearchPeter Lancaster
 
Shunra Software Add-on Modules Datasheet
Shunra Software Add-on Modules DatasheetShunra Software Add-on Modules Datasheet
Shunra Software Add-on Modules DatasheetShunra Software
 
OpenStack at Xen summit Asia
OpenStack at Xen summit Asia OpenStack at Xen summit Asia
OpenStack at Xen summit Asia Jaesuk Ahn
 
Choosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform StrategyChoosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform Strategydrmarcustillett
 
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023VMware Tanzu
 
Jonas On Windows Azure OW2con11, Nov 24-25, Paris
Jonas On Windows Azure OW2con11, Nov 24-25, ParisJonas On Windows Azure OW2con11, Nov 24-25, Paris
Jonas On Windows Azure OW2con11, Nov 24-25, ParisOW2
 
Stream analytics
Stream analyticsStream analytics
Stream analyticsrebeccatho
 
Introduction to NBL
Introduction to NBLIntroduction to NBL
Introduction to NBLFei Ji Siao
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Sourceaspyker
 
Semantically enabled standard development
Semantically enabled standard developmentSemantically enabled standard development
Semantically enabled standard developmentLaurent Lefort
 
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thonTaverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thonSCAPE Project
 

Similar to Wolstencroft K - Workflows on the Cloud: scaling for national service (20)

Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
 
Asynchronous Mobile Web Services:
Asynchronous Mobile Web Services: Asynchronous Mobile Web Services:
Asynchronous Mobile Web Services:
 
2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflows2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflows
 
Arch stylesandpatternsmi
Arch stylesandpatternsmiArch stylesandpatternsmi
Arch stylesandpatternsmi
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
 
Denial of Service in Software Defined Netoworks
Denial of Service in Software Defined NetoworksDenial of Service in Software Defined Netoworks
Denial of Service in Software Defined Netoworks
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
 
Communications Systems Research
Communications Systems ResearchCommunications Systems Research
Communications Systems Research
 
Shunra Software Add-on Modules Datasheet
Shunra Software Add-on Modules DatasheetShunra Software Add-on Modules Datasheet
Shunra Software Add-on Modules Datasheet
 
OpenStack at Xen summit Asia
OpenStack at Xen summit Asia OpenStack at Xen summit Asia
OpenStack at Xen summit Asia
 
Choosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform StrategyChoosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform Strategy
 
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023
Simplify and Scale Enterprise Spring Apps in the Cloud | March 23, 2023
 
Jonas On Windows Azure OW2con11, Nov 24-25, Paris
Jonas On Windows Azure OW2con11, Nov 24-25, ParisJonas On Windows Azure OW2con11, Nov 24-25, Paris
Jonas On Windows Azure OW2con11, Nov 24-25, Paris
 
Stream analytics
Stream analyticsStream analytics
Stream analytics
 
Introduction to NBL
Introduction to NBLIntroduction to NBL
Introduction to NBL
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
Semantically enabled standard development
Semantically enabled standard developmentSemantically enabled standard development
Semantically enabled standard development
 
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thonTaverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
 

More from Jan Aerts

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationJan Aerts
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Jan Aerts
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Jan Aerts
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Jan Aerts
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Jan Aerts
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data AnalysisJan Aerts
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualizationJan Aerts
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsJan Aerts
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...Jan Aerts
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumJan Aerts
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJan Aerts
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisJan Aerts
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...Jan Aerts
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...Jan Aerts
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...Jan Aerts
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...Jan Aerts
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsJan Aerts
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesJan Aerts
 

More from Jan Aerts (20)

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic Variation
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data Analysis
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing Consortium
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis Framework
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysis
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining components
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Wolstencroft K - Workflows on the Cloud: scaling for national service

  • 1. Workflows on the Cloud: Scaling for National Service Katy Wolstencroft, Robert Haines, Helen Hulme, Mike Cornell, Shoaib Sufi, Andy Brass, Carole Goble University of Manchester, UK Madhu Donepudi, Nick James Eagle Genomics Ltd, UK
  • 2. Motivation: Workflows for Diagnostics NHS genetic testing, e.g. colon disease Annotation of SNPs in patient data, ready for interpretation by clinician. Diagnostic Testing Today Purify DNA. PCRs exons of relevant genes (MLH1, MSH2, MSH6). Sequence, identify variants, classify: (pathogenic, not pathogenic, unknown significance etc.). Writes report to clinician Diagnostic Testing Tomorrow (or later today) uses whole genome sequencing ANNOTATE, FILTER, DISPLAY Next Gen Seq Variation data data New problem: How do we classify all the variants that we discover?
  • 3. Taverna Workflows  Sophisticated analysis pipelines  A set of services to analyse or manage data (either local or remote)  Workflows run through the workbench or via a server  Automation of data flow through services  Control of service invocation  Iteration over data sets  Provenance collection  Extensible and open source
  • 4. Taverna http://www.taverna.org.uk/ Freely available open source Current Version 2.4 80,000+ downloads across version Part of the myGrid Toolkit Windows/Mac OS X/ Linux/unix Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W729-32. Taverna: a tool for building and running workflows of services. Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T.
  • 5. SNP annotation Annotation task Location, Gene, Transcript Present in public databases, dbSNP etc Workflows are good Frequency in e.g. 1000 genome for collecting and data integrating data from a variety of sources, Conservation data (cross species) into one place
  • 6. Variant Classification SNP Nonsense: base Synonymous Missense: Non- insertion, causing a synonymous frameshift Affects on function Affects on splicing or splicing Premature Stop Nonsense codon
  • 7. SNP Filtering / Triage Which SNPs are the most important? Reduction of 80K data points to those with potential clinical significance. Criteria Reduce to (disease)-specific gene list Sense < Missense < Stop codon etc Based on prediction tool scores Frequency in population (based on 1000 genome data etc) (high frequency implies non deleterious) Conservation across species (implies that change is deleterious)
  • 8. Workflow  Taverna’s “Tool Service” feature – used to wrap Perl scripts and other command line applications  Uses VEP (Ensembl)  Passes references to files
  • 9. Workflow Provenance Record inferences in clinical decisions  What were the parameters used to build the dataset  What versions of databases, genome assembly, machine  Where does each piece of evidence for/against pathogenicity originate from?
  • 10. Infrastructure Requirements  Execute analysis workflows  Accessible to clinicians and genetic testers  Cope with expanding demands on compute  Provide a secure environment  Collect provenance
  • 11. Architecture overview All user interaction User data stored in Data for all tools and Web Services via web interface the Cloud stored in the Cloud Input SNPs Web Storage Ensembl Cache interface (S3) (mySQL) (S3) Results Workflow Taverna Taverna Application specific tools Taverna Taverna engine e-Hive Server Server and Web Services Application specific tools and Application specific tools and Server orchestrato WSWebServices WebWS Too Too WS Services r other l l Unified access to different Tools and Web Services for workflow engines with our each workflow are installed common REST API together for easy replication
  • 12. Workflow engine orchestration Workflow engine  Orchestrator is workflow orchestrator executor agnostic  Uses common API to: Common REST API  List workflows  Configure runs e-Hive Taverna Interface Interface Cache  Start runs  Manage current runs Engine specific APIs  Status  Progress e-Hive Taverna  Delete runs
  • 13. Additional Taverna Functionality  Integration with Cloud infrastructure  AWS first  Read/write files securely to S3  Start and stop Cloud instances if required  Tool and Web Service scaling  Self-scaling  Released as part of Taverna 3
  • 14. The user’s view  Curated set of workflows  Designed, built and tested by domain experts  Quality assurance tested (if appropriate)  Workflows are presented as applications  The workflows themselves are hidden  Configured and run via a web interface  All user data stored securely in the Cloud  User separation  Workflows as a Service
  • 15. Web interface: Overview  Upload input data  Configure workflow runs with  Input parameters  Uploaded data  Reused output data  Start workflow runs  Monitor workflow runs  View results preview  Download complete results
  • 18. Web interface: Checking run progress
  • 19. A Typical Workflow  Parse files from SNP calling machines  Annotate SNPs  Predict effects (BioMart, VEP, polyphen)
  • 20. Workflow as a Service  The workflow IS the service Run restricted sets of Taverna workflows in the cloud Connects to other cloud based resources – storage, tools etc Users can tweak parameters, but not design their own Web portal access for scientists Data passed by reference instead of file Pay as you go – cheap at the point of use Elastic and available now
  • 21. Acknowledgements/Partners  University of Manchester  Eagle Genomics  Technology Strategy Board  100932 - Cloud Analytics for Life Sciences  National Health Service  Amazon Web Services

Editor's Notes

  1. Diagnostics is increasingly using nex gen seq methods – these are replacing sequencing of specific exons. The key difference is that the methods now usually look at a pre-decided set of genes, and check for presence of a set of “well known” variants. The new method results in many K of SNPs which must all be triaged. Example of where next gen gives benefit: Hereditary blindness &gt;100 potential genes to look at. Less costly to NextGen than seq individual genes.
  2. Carole’s concept of “Workflows for Ensemble work”
  3. What were the parameters used to build the dataset What versions of databases, genome assembly, machine Where does each piece of evidence for/against pathogenicity originate from?
  4. OpenAM ( http://www.forgerock.com/openam.html ) Not sure the “AM” actually stands for anything specific now. It used to be called OpenSSO when Sun first created it (SSO means Single Sign-On) Used for centralized authentication, authorization, entitlements and federation services Which basically means user sign-on for what we are using it for.
  5. AWS == Amazon Web Services (ie the Amazon Cloud) S3 is Amazon’s Simple Storage Service Taverna 3 should come end 2012
  6. Variant effect predictor Biomart