SlideShare une entreprise Scribd logo
1  sur  37
Cancer Genomics
Data Pipelines
Lynn & Samantha Langit
CSIRO Bioinformatics / Australia
June 2017 - Oslo
3 Billion data points per patient DNA sample
Up to 25% of the population could be sequenced by 2025
Two Perspectives
Bioinformatics
Research
• Insight
• Reproducibility
Cloud
Architecture
• Speed
• Low Cost
• Simplicity
Cloud Data Pipeline Pattern
Problem
• Define business
problem
Data
• Quality
• Quantity
Candidate
Technologies
• Ingest
• ETL
• Biz Analytics
• ML
• Visualization
Build MVPs
• Iterate
• Learn
Assemble
Pipeline
• Validate each
section
• Test at scale
Genomic Sequencing Results
CRISPR-Cas9 for molecular engineering technology
enables the accurate editing of genomes for researchers.
It…
 Pattern-matching unique sequences of DNA
 Huge demand for large-scale computation
 Time-critical dimension to compute
 NIH-approved for human health
 Could revolutionize cancer treatments
Serverless Lambda
Architecture Pattern
Lambda
function
1
Lambda
function
2
Lambda
function
3
buckets with
objects DynamoDB
API Gateway Users
CSIRO: Commonwealth Scientific & Industrial Research Organization
GT-Scan2
Demo
GT-Scan2
Scale Genomic Analysis
GWAS = genome-wide sequencing data association
studies
 Analysis on large cohort data or imputed SNP array data
 Clustering on genomic profiles to stratify large-cohort
genomic data
 Viewing datasets with millions of features
Cloud Data Pipeline Pattern
Problem
• Define business
problem
Data
• Quality
• Quantity
Candidate
Technologies
• Ingest
• ETL
• Biz Analytics
• ML
• Visualization
Build MVPs
• Iterate
• Learn
Assemble
Pipeline
• Validate each
section
• Test at scale
Genomics (ML) Pipeline Pattern
What is CSIRO’s solution?
For Scale at
reasonable cost Use Apache Hadoop
For Scale at
speed Use Apache Spark for Hadoop
For Usability
in
bioinformatics
Create a domain-specific API (OSS library)
For global use
Leverage Cloud Pipeline Patterns
GWAS Analysis with Variant-Spark
On premise Hadoop Cluster
with Apache Spark
Genomics Analysts
corporate data center
What is Apache Spark?
What is variant-spark?
Demo
80% faster than ADAM
90% faster than R
90% faster than Python
VariantSpark
Uses Apache Spark to massively parallelize the generation of
random forests to identify disease genes efficiently
 Analyzes 3,000 samples with 80 million features in < 30 minutes
 Enables real-time diagnosis by finding similar patients
 Contributes to motor neuron disease (ALS) research in Australia
Data
Prep
Statistics
Probabilistic
Algorithms
Data Viz
Machine
Learning…
Spark ML Classification Algorithms
Wide Random Forest Ensemble
of Decision Trees
Logistic Regression
variant-spark other libraries
OSS Library variant-spark for all
 usable? performant?
 extendable? (clean code)
 using the best language
(Scala)?
 using the ‘best version’ of
Spark?
 using a version of wide
random forests that is
understandable?
Is it…
How best to Deploy Cloud Hadoop?
• IaaS
 EC2 instances with Apache Hadoop, Apache Spark, more…
• PaaS
 Elastic Map Reduce (EMR) Hadoop cluster
• SaaS
 Vendor-managed, i.e. DataBricks w/Jupyter Notebooks
What is Databricks?
DEMO: Jupyter Notebooks
Variant-Spark and
Databricks
Demo
Solving
Important
Questions…
Cancer Genomics?
DEMO: Who is a Hipster?
 AWS EC2 Spot Instances
GWAS Analysis with Variant-Spark
EC2 Hadoop Cluster with Apache Spark
Genomics Analysts
Availability Zone
1000 Genomes
GWAS input
Spot EC2 Hadoop
worker instances
EC2 Hadoop
instances
Cloud Data Pipeline Pattern
Problem Data
Candidate
Technologies
Build MVPs
Assemble
Pipeline
Analyze GWAS -> S3/Hadoop Ingest
ETL
Analyze
Viz
S3 -> Databricks DBFS
Apache Spark
Variant-Spark ML
Notebook SQL, R or Python
SaaS
Cloud Data Pipeline Pattern
Problem Data
Candidate
Technologies
Build MVPs
Assemble
Pipeline
1. Scan vcf -> S3/DynamoDB Ingest
ETL
Analyze
Viz
S3
Lambda
Lambda
Lambda/API Gateway
Serverless
2. Analyze GWAS -> S3/Hadoop Ingest
ETL
Analyze
Viz
S3 -> Databricks DBFS
Apache Spark
Variant-Spark ML
Notebook SQL, R or Python
SaaS
Modern Big Data Pipelines
• Problem #1 - Scan
• Solution: Serverless Cloud Pipeline
• Problem # 2 - Analyze
• Solution: SaaS Cloud ML Pipeline
Cancer Genomics
Data Pipelines
Lynn & Samantha Langit
CSIRO Bioinformatics & variant-spark
June 2017 - Oslo

Contenu connexe

Tendances

Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudGlobus
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceIan Foster
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsLarry Smarr
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptxIan Foster
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
 
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014
Living Outside the Comfort Zone - Daron green   florianopolis 5-7-2014Living Outside the Comfort Zone - Daron green   florianopolis 5-7-2014
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014Microsoft Azure for Research
 
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...Larry Smarr
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardPacificResearchPlatform
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemSlideCentral
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Gridnoho
 
Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated GenomicsIdan Tohami
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Anubhav Jain
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Data Con LA
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneIan Foster
 
DuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsDuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsAnubhav Jain
 
Research workflow - 4 June 2018
Research workflow - 4 June 2018Research workflow - 4 June 2018
Research workflow - 4 June 2018Zachary Labe
 
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...ATMOSPHERE .
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-finalmarpierc
 

Tendances (20)

Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare Diagnostics
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-ddd
 
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014
Living Outside the Comfort Zone - Daron green   florianopolis 5-7-2014Living Outside the Comfort Zone - Daron green   florianopolis 5-7-2014
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014
 
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie Bard
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Foss4G 2009 Scenz Grid
Foss4G 2009 Scenz GridFoss4G 2009 Scenz Grid
Foss4G 2009 Scenz Grid
 
Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated Genomics
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
 
Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
 
DuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsDuraMat Data Management and Analytics
DuraMat Data Management and Analytics
 
Research workflow - 4 June 2018
Research workflow - 4 June 2018Research workflow - 4 June 2018
Research workflow - 4 June 2018
 
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
Butler - a framework for a large-scale scientific analysis on the cloud - EOS...
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-final
 

Similaire à Bioinformatics Data Pipelines built by CSIRO on AWS

Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudAmazon Web Services
 
Utility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right ScienceUtility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right ScienceChef Software, Inc.
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 
CS Guest Lecture 2015 10-05 advanced databases
CS Guest Lecture 2015 10-05 advanced databasesCS Guest Lecture 2015 10-05 advanced databases
CS Guest Lecture 2015 10-05 advanced databasesGabe Rudy
 
Chemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionChemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionValery Tkachenko
 
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackWhat’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackTuri, Inc.
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
 
AWS Customer Presentation- University of Maryland
AWS Customer Presentation- University of MarylandAWS Customer Presentation- University of Maryland
AWS Customer Presentation- University of MarylandAmazon Web Services
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesIan Foster
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009Ian Foster
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Herman Wu
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Solutions
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaAndy Petrella
 
Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Ola Spjuth
 
VariantSpark on AWS
VariantSpark on AWSVariantSpark on AWS
VariantSpark on AWSLynn Langit
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Databricks
 

Similaire à Bioinformatics Data Pipelines built by CSIRO on AWS (20)

Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
 
Utility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right ScienceUtility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right Science
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
CS Guest Lecture 2015 10-05 advanced databases
CS Guest Lecture 2015 10-05 advanced databasesCS Guest Lecture 2015 10-05 advanced databases
CS Guest Lecture 2015 10-05 advanced databases
 
Chemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionChemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collection
 
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackWhat’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics Stack
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
AWS Customer Presentation- University of Maryland
AWS Customer Presentation- University of MarylandAWS Customer Presentation- University of Maryland
AWS Customer Presentation- University of Maryland
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architectures
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic Solutions
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
 
Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...
 
VariantSpark on AWS
VariantSpark on AWSVariantSpark on AWS
VariantSpark on AWS
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
 

Plus de Lynn Langit

Serverless Architectures
Serverless ArchitecturesServerless Architectures
Serverless ArchitecturesLynn Langit
 
10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids ProgrammingLynn Langit
 
Blastn plus jupyter on Docker
Blastn plus jupyter on DockerBlastn plus jupyter on Docker
Blastn plus jupyter on DockerLynn Langit
 
Testing in Ballerina Language
Testing in Ballerina LanguageTesting in Ballerina Language
Testing in Ballerina LanguageLynn Langit
 
Teaching Kids to create Alexa Skills
Teaching Kids to create Alexa SkillsTeaching Kids to create Alexa Skills
Teaching Kids to create Alexa SkillsLynn Langit
 
Understanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examplesUnderstanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examplesLynn Langit
 
Teaching Kids Programming
Teaching Kids ProgrammingTeaching Kids Programming
Teaching Kids ProgrammingLynn Langit
 
Serverless Reality
Serverless RealityServerless Reality
Serverless RealityLynn Langit
 
Serverless Reality
Serverless RealityServerless Reality
Serverless RealityLynn Langit
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond RelationalLynn Langit
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for BioinformaticsLynn Langit
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsLynn Langit
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformLynn Langit
 
SQL Server on Google Cloud Platform
SQL Server on Google Cloud PlatformSQL Server on Google Cloud Platform
SQL Server on Google Cloud PlatformLynn Langit
 
Redis Labs and SQL Server
Redis Labs and SQL ServerRedis Labs and SQL Server
Redis Labs and SQL ServerLynn Langit
 
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and YellowfinBuilding a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and YellowfinLynn Langit
 
What is 'Teaching Kids Programming'
What is 'Teaching Kids Programming'What is 'Teaching Kids Programming'
What is 'Teaching Kids Programming'Lynn Langit
 
Teaching Kids Programming for Developers
Teaching Kids Programming for DevelopersTeaching Kids Programming for Developers
Teaching Kids Programming for DevelopersLynn Langit
 

Plus de Lynn Langit (20)

Serverless Architectures
Serverless ArchitecturesServerless Architectures
Serverless Architectures
 
10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming10+ Years of Teaching Kids Programming
10+ Years of Teaching Kids Programming
 
Blastn plus jupyter on Docker
Blastn plus jupyter on DockerBlastn plus jupyter on Docker
Blastn plus jupyter on Docker
 
Testing in Ballerina Language
Testing in Ballerina LanguageTesting in Ballerina Language
Testing in Ballerina Language
 
Teaching Kids to create Alexa Skills
Teaching Kids to create Alexa SkillsTeaching Kids to create Alexa Skills
Teaching Kids to create Alexa Skills
 
Practical cloud
Practical cloudPractical cloud
Practical cloud
 
Understanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examplesUnderstanding Jupyter notebooks using bioinformatics examples
Understanding Jupyter notebooks using bioinformatics examples
 
Teaching Kids Programming
Teaching Kids ProgrammingTeaching Kids Programming
Teaching Kids Programming
 
Practical Cloud
Practical CloudPractical Cloud
Practical Cloud
 
Serverless Reality
Serverless RealityServerless Reality
Serverless Reality
 
Serverless Reality
Serverless RealityServerless Reality
Serverless Reality
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
 
New AWS Services for Bioinformatics
New AWS Services for BioinformaticsNew AWS Services for Bioinformatics
New AWS Services for Bioinformatics
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud Platform
 
SQL Server on Google Cloud Platform
SQL Server on Google Cloud PlatformSQL Server on Google Cloud Platform
SQL Server on Google Cloud Platform
 
Redis Labs and SQL Server
Redis Labs and SQL ServerRedis Labs and SQL Server
Redis Labs and SQL Server
 
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and YellowfinBuilding a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
 
What is 'Teaching Kids Programming'
What is 'Teaching Kids Programming'What is 'Teaching Kids Programming'
What is 'Teaching Kids Programming'
 
Teaching Kids Programming for Developers
Teaching Kids Programming for DevelopersTeaching Kids Programming for Developers
Teaching Kids Programming for Developers
 

Dernier

THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 

Dernier (20)

THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 

Bioinformatics Data Pipelines built by CSIRO on AWS

Notes de l'éditeur

  1. http://www.nature.com/news/first-crispr-clinical-trial-gets-green-light-from-us-panel-1.20137
  2. http://bioinformatics.csiro.au/ and https://www.csiro.au/en/Locations/NSW/North-Ryde
  3. https://www.gt-scan.net/ --AND- AMA with Dr, Bauer -- https://www.reddit.com/r/science/comments/5fiicm/science_ama_series_im_denis_bauer_a_team_leader/
  4. https://aws.amazon.com/blogs/aws/genome-engineering-applications-early-adopters-of-the-cloud/
  5. https://github.com/csirobigdata/variant-spark
  6. https://en.wikipedia.org/wiki/Random_forest --and-- https://spark.apache.org/docs/1.6.2/ml-classification-regression.html
  7. https://databricks.com/
  8. https://aws.amazon.com/blogs/aws/genome-engineering-applications-early-adopters-of-the-cloud/
  9. https://github.com/csirobigdata/variant-spark