SlideShare une entreprise Scribd logo
1  sur  10
Cloud-scale genomics: examples and lessons ,[object Object],Department of Biostatistics
Why? ,[object Object],[object Object],[object Object],[object Object],Why not? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Cloud debate on 1 slide 1.6 Gbp/day 1 5 Gbp/day 1 25 Gbp/day 2 1. http://www.politigenomics.com/next-generation-sequencing-informatics 2. http://www.politigenomics.com/2010/01/hiseq-2000.html Conclusion: let’s try it but hedge our bets
Crossbow GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT GTCGCAGTA N CTGTCT ||||||||| |||||| GTCGCAGTA T CTGTCT GGATCT G CGATATACC |||||| ||||||||| GGATCT - CGATATACC AATCTGATCTTATTTT |||||||||||||||| AATCTGATCTTATTTT ATATATATATATATAT |||||||||||||||| ATATATATATATATAT TCTCTCCCA NN AGAGC |||||||||  ||||| TCTCTCCCA GG AGAGC Align Aggregate Reference Call: HET A, G p-value: 0.0023 GTCGCAGTATCTGTCT GTCGCAGTATCTGT NN TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TAT A TCGCAGTATCT T TAT A TCGCAGTATCTG N AT A TCGCAGTAT N TG CCCTAT A TCGCAGTAT A CACCCTATGTCGCA A CACCCTAT C TCGCA A CACCCTATGTCGCA GA - CACCCTATGTCGC CCGGA - CACCCTAT A T CCGGA - CACCCTAT A T GCCGGA - CACCCTATG Statistics Parallel by read Handled by Hadoop Parallel by genome bin
Myrna Gene 1 GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATGTCGCA AGCACCCTATATCGCA AGCACCCTATGTCGCA GAGCACCCTATGTCGC CCGGAGCACCCTATAT CCGGAGCACCCTATAT GCCGGAGCACCCTATG GTCGCAGTA N CTGTCT ||||||||| |||||| GTCGCAGTA T CTGTCT GGATCT G CGATATACC |||||| ||||||||| GGATCT - CGATATACC AATCTGATCTTATTTT |||||||||||||||| AATCTGATCTTATTTT ATATATATATATATAT |||||||||||||||| ATATATATATATATAT TCTCTCCCA NN AGAGC |||||||||  ||||| TCTCTCCCA GG AGAGC Align Gene 1 differentially expressed?: YES p-value: 0.0012 TGTCGCAGTATCTGTC AGCACCCTATGTCGCA GCCGGAGCACCCTATG GTCGCAGTA N CTGTCT ||||||||| |||||| GTCGCAGTA T CTGTCT GGATCT G CGATATACC |||||| ||||||||| GGATCT - CGATATACC AATCTGATCTTATTTT |||||||||||||||| AATCTGATCTTATTTT ATATATATATATATAT |||||||||||||||| ATATATATATATATAT TCTCTCCCA NN AGAGC |||||||||  ||||| TCTCTCCCA GG AGAGC Sample A Sample B Align Aggregate Aggregate Overlap Aggregate Normalize Aggregate Normalize Aggregate Statistics Parallel by read Handled by Hadoop Parallel by genome bin Handled by Hadoop Parallel by sample Handled by Hadoop Parallel by gene
Myrna Table 1 . Timing and cost for a Myrna experiment with 1.1 billion 35 bp unpaired reads   from the Pickrell   et al  study as input.  Costs are approximate and based on the pricing as of this writing, that is, $0.68 per extra-large high-CPU EC2 node per hour in the Northern Virginia zone and $0.78 in other zones, plus a $0.12 per-node-per-hour surcharge for Elastic MapReduce in all zones.  Times can vary subject to, for example, congestion and Internet traffic conditions. Data transfer adds about 1hr:15m, $11 Myrna Runtime, Cost for 1.1 billion reads from Pickrell  et al  study EC2 Nodes 1 master,  10 workers 1 master,  20 workers 1 master,  40 workers Worker CPU cores 80 160 320 Wall clock time 4h:20m 2h:32m 1h:38m Cluster setup 4m 4m 3m Align 2h:56m 1h:31m 54m Overlap 52m 31m 16m Normalize 6m 7m 6m Statistics 9m 6m 6m Summarize & Postprocess 13m 14m 13m Approximate cost (N. Virginia / Elsewhere) $44.00 / $49.50 $50.40 / $56.70 $65.60 / $73.80
Myrna 71% 55%
Bet-hedging architecture Cloud driver script Wrapper bowtie Wrapper soapsnp Postprocess Hadoop Wrapper bowtie Wrapper soapsnp Postprocess Hadoop Singleton driver script Wrapper bowtie Wrapper soapsnp Postprocess Perl, fork, sort Hadoop driver script Cloud mode Hadoop mode Single-computer mode
Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Crossbow Data transfer adds about 1hr:15m, $28
Crossbow 43% 58%

Contenu connexe

En vedette

Issr plodinec
Issr plodinecIssr plodinec
Issr plodinecplodinec
 
M2 k4.2 e1 bantuan pernafasan
M2 k4.2 e1 bantuan pernafasanM2 k4.2 e1 bantuan pernafasan
M2 k4.2 e1 bantuan pernafasanAbang Ensem
 
BTM Group Overview
BTM Group OverviewBTM Group Overview
BTM Group OverviewSteve Marsh
 
любовь твоя бог
любовь твоя боглюбовь твоя бог
любовь твоя богko63ar
 
How To Use Your Website to Get Customers
How To Use Your Website to Get CustomersHow To Use Your Website to Get Customers
How To Use Your Website to Get CustomersclickTRUE
 
Aprendiendo uml en_24_horas
Aprendiendo uml en_24_horasAprendiendo uml en_24_horas
Aprendiendo uml en_24_horascesaraugusta
 
Vänsterpartiet - Tisdagens frukostseminarie i Almedalen
Vänsterpartiet - Tisdagens frukostseminarie i AlmedalenVänsterpartiet - Tisdagens frukostseminarie i Almedalen
Vänsterpartiet - Tisdagens frukostseminarie i AlmedalenInfopaq Sverige
 
605專屬搭畢業特輯
605專屬搭畢業特輯605專屬搭畢業特輯
605專屬搭畢業特輯musicghost
 
Bird oral gr 5
Bird oral gr 5Bird oral gr 5
Bird oral gr 5Damian
 
Presentacion ingles jaime torres
Presentacion ingles jaime torresPresentacion ingles jaime torres
Presentacion ingles jaime torresIE EL TESORO
 

En vedette (18)

Issr plodinec
Issr plodinecIssr plodinec
Issr plodinec
 
M2 k4.2 e1 bantuan pernafasan
M2 k4.2 e1 bantuan pernafasanM2 k4.2 e1 bantuan pernafasan
M2 k4.2 e1 bantuan pernafasan
 
Take Your Small Business Global
Take Your Small Business GlobalTake Your Small Business Global
Take Your Small Business Global
 
BTM Group Overview
BTM Group OverviewBTM Group Overview
BTM Group Overview
 
любовь твоя бог
любовь твоя боглюбовь твоя бог
любовь твоя бог
 
Linked In Power Point 2
Linked In Power Point 2Linked In Power Point 2
Linked In Power Point 2
 
How To Use Your Website to Get Customers
How To Use Your Website to Get CustomersHow To Use Your Website to Get Customers
How To Use Your Website to Get Customers
 
Pileoffruit
PileoffruitPileoffruit
Pileoffruit
 
Job
JobJob
Job
 
中秋 快 _1[1..
中秋 快 _1[1..中秋 快 _1[1..
中秋 快 _1[1..
 
Aprendiendo uml en_24_horas
Aprendiendo uml en_24_horasAprendiendo uml en_24_horas
Aprendiendo uml en_24_horas
 
Vänsterpartiet - Tisdagens frukostseminarie i Almedalen
Vänsterpartiet - Tisdagens frukostseminarie i AlmedalenVänsterpartiet - Tisdagens frukostseminarie i Almedalen
Vänsterpartiet - Tisdagens frukostseminarie i Almedalen
 
Camera care
Camera careCamera care
Camera care
 
605專屬搭畢業特輯
605專屬搭畢業特輯605專屬搭畢業特輯
605專屬搭畢業特輯
 
Bird oral gr 5
Bird oral gr 5Bird oral gr 5
Bird oral gr 5
 
Final project lourdes
Final project lourdesFinal project lourdes
Final project lourdes
 
Battery care
Battery careBattery care
Battery care
 
Presentacion ingles jaime torres
Presentacion ingles jaime torresPresentacion ingles jaime torres
Presentacion ingles jaime torres
 

Similaire à Langmead bosc2010 cloud-genomics

Towards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniquesTowards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniquesWesley De Neve
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesAltinity Ltd
 
FIWARE Global Summit - Smart City / Community Services and Infrastructures
FIWARE Global Summit - Smart City / Community Services and InfrastructuresFIWARE Global Summit - Smart City / Community Services and Infrastructures
FIWARE Global Summit - Smart City / Community Services and InfrastructuresFIWARE
 
An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...NECST Lab @ Politecnico di Milano
 
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-Flop
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-FlopIRJET- Metastability Mitigation & Error Masking of High Speed Flip-Flop
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-FlopIRJET Journal
 
Kitzmiller Openhelisphereproject Bosc2008
Kitzmiller Openhelisphereproject Bosc2008Kitzmiller Openhelisphereproject Bosc2008
Kitzmiller Openhelisphereproject Bosc2008bosc_2008
 
IRJET- Study of Real Time Kinematica Survey with Differential Global Position...
IRJET- Study of Real Time Kinematica Survey with Differential Global Position...IRJET- Study of Real Time Kinematica Survey with Differential Global Position...
IRJET- Study of Real Time Kinematica Survey with Differential Global Position...IRJET Journal
 
Biotech Era Ahead: Transcriptomics
Biotech Era Ahead: TranscriptomicsBiotech Era Ahead: Transcriptomics
Biotech Era Ahead: TranscriptomicsTaha A. Taha
 
SRv6 Mobile User Plane : Initial POC and Implementation
SRv6 Mobile User Plane : Initial POC and ImplementationSRv6 Mobile User Plane : Initial POC and Implementation
SRv6 Mobile User Plane : Initial POC and ImplementationKentaro Ebisawa
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015Kohei KaiGai
 
PIT Overload Analysis in Content Centric Networks - Slides ICN '13
PIT Overload Analysis in Content Centric Networks - Slides ICN '13PIT Overload Analysis in Content Centric Networks - Slides ICN '13
PIT Overload Analysis in Content Centric Networks - Slides ICN '13Matteo Virgilio
 
Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...
Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...
Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...ADVA
 
IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...
IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...
IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...IRJET Journal
 
Key Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdf
Key Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdfKey Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdf
Key Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdfssuser3be61c1
 
Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...Piotr Dziurzanski
 
Gene mutations
Gene mutationsGene mutations
Gene mutationspawl9
 
Proportional-integral genetic algorithm controller for stability of TCP network
Proportional-integral genetic algorithm controller for stability of TCP network Proportional-integral genetic algorithm controller for stability of TCP network
Proportional-integral genetic algorithm controller for stability of TCP network IJECEIAES
 
GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...
GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...
GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...Yole Developpement
 

Similaire à Langmead bosc2010 cloud-genomics (20)

In silico analysis for unknown data
In silico analysis for unknown dataIn silico analysis for unknown data
In silico analysis for unknown data
 
Towards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniquesTowards reading genomic data using deep learning-driven NLP techniques
Towards reading genomic data using deep learning-driven NLP techniques
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
FIWARE Global Summit - Smart City / Community Services and Infrastructures
FIWARE Global Summit - Smart City / Community Services and InfrastructuresFIWARE Global Summit - Smart City / Community Services and Infrastructures
FIWARE Global Summit - Smart City / Community Services and Infrastructures
 
An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...An FPGA-based acceleration methodology and performance model for iterative st...
An FPGA-based acceleration methodology and performance model for iterative st...
 
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-Flop
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-FlopIRJET- Metastability Mitigation & Error Masking of High Speed Flip-Flop
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-Flop
 
Kitzmiller Openhelisphereproject Bosc2008
Kitzmiller Openhelisphereproject Bosc2008Kitzmiller Openhelisphereproject Bosc2008
Kitzmiller Openhelisphereproject Bosc2008
 
IRJET- Study of Real Time Kinematica Survey with Differential Global Position...
IRJET- Study of Real Time Kinematica Survey with Differential Global Position...IRJET- Study of Real Time Kinematica Survey with Differential Global Position...
IRJET- Study of Real Time Kinematica Survey with Differential Global Position...
 
Biotech Era Ahead: Transcriptomics
Biotech Era Ahead: TranscriptomicsBiotech Era Ahead: Transcriptomics
Biotech Era Ahead: Transcriptomics
 
SRv6 Mobile User Plane : Initial POC and Implementation
SRv6 Mobile User Plane : Initial POC and ImplementationSRv6 Mobile User Plane : Initial POC and Implementation
SRv6 Mobile User Plane : Initial POC and Implementation
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
PIT Overload Analysis in Content Centric Networks - Slides ICN '13
PIT Overload Analysis in Content Centric Networks - Slides ICN '13PIT Overload Analysis in Content Centric Networks - Slides ICN '13
PIT Overload Analysis in Content Centric Networks - Slides ICN '13
 
Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...
Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...
Time sync: Existing mobile networks need to be ready for 5G and time-sensitiv...
 
IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...
IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...
IRJET- An Improved DCM-Based Tunable True Random Number Generator for Xilinx ...
 
Key Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdf
Key Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdfKey Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdf
Key Factors that affect 5G Throughput, Possible Causes and Ways to optimize.pdf
 
Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...Cloud-based dynamic distributed optimisation of integrated process planning a...
Cloud-based dynamic distributed optimisation of integrated process planning a...
 
Edge trends mizuno
Edge trends mizunoEdge trends mizuno
Edge trends mizuno
 
Gene mutations
Gene mutationsGene mutations
Gene mutations
 
Proportional-integral genetic algorithm controller for stability of TCP network
Proportional-integral genetic algorithm controller for stability of TCP network Proportional-integral genetic algorithm controller for stability of TCP network
Proportional-integral genetic algorithm controller for stability of TCP network
 
GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...
GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...
GaN-on-Silicon Transistor Comparison 2018 Structural, Process & Costing Repor...
 

Plus de BOSC 2010

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkBOSC 2010
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesBOSC 2010
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenisBOSC 2010
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 embossBOSC 2010
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evokerBOSC 2010
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorBOSC 2010
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisBOSC 2010
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorBOSC 2010
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfBOSC 2010
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsBOSC 2010
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perlBOSC 2010
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopythonBOSC 2010
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBOSC 2010
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaBOSC 2010
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytowebBOSC 2010
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloBOSC 2010
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptxBOSC 2010
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiBOSC 2010
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitBOSC 2010
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010BOSC 2010
 

Plus de BOSC 2010 (20)

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductor
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
 
Friedberg bosc2010 iprstats
Friedberg bosc2010 iprstatsFriedberg bosc2010 iprstats
Friedberg bosc2010 iprstats
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perl
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 

Dernier

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Dernier (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Langmead bosc2010 cloud-genomics

  • 1.
  • 2.
  • 3. Crossbow GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT GTCGCAGTA N CTGTCT ||||||||| |||||| GTCGCAGTA T CTGTCT GGATCT G CGATATACC |||||| ||||||||| GGATCT - CGATATACC AATCTGATCTTATTTT |||||||||||||||| AATCTGATCTTATTTT ATATATATATATATAT |||||||||||||||| ATATATATATATATAT TCTCTCCCA NN AGAGC ||||||||| ||||| TCTCTCCCA GG AGAGC Align Aggregate Reference Call: HET A, G p-value: 0.0023 GTCGCAGTATCTGTCT GTCGCAGTATCTGT NN TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TAT A TCGCAGTATCT T TAT A TCGCAGTATCTG N AT A TCGCAGTAT N TG CCCTAT A TCGCAGTAT A CACCCTATGTCGCA A CACCCTAT C TCGCA A CACCCTATGTCGCA GA - CACCCTATGTCGC CCGGA - CACCCTAT A T CCGGA - CACCCTAT A T GCCGGA - CACCCTATG Statistics Parallel by read Handled by Hadoop Parallel by genome bin
  • 4. Myrna Gene 1 GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTACGTTCAATATT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT GTCGCAGTATCTGTCT TGTCGCAGTATCTGTC TATGTCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG TATATCGCAGTATCTG CCCTATATCGCAGTAT AGCACCCTATGTCGCA AGCACCCTATATCGCA AGCACCCTATGTCGCA GAGCACCCTATGTCGC CCGGAGCACCCTATAT CCGGAGCACCCTATAT GCCGGAGCACCCTATG GTCGCAGTA N CTGTCT ||||||||| |||||| GTCGCAGTA T CTGTCT GGATCT G CGATATACC |||||| ||||||||| GGATCT - CGATATACC AATCTGATCTTATTTT |||||||||||||||| AATCTGATCTTATTTT ATATATATATATATAT |||||||||||||||| ATATATATATATATAT TCTCTCCCA NN AGAGC ||||||||| ||||| TCTCTCCCA GG AGAGC Align Gene 1 differentially expressed?: YES p-value: 0.0012 TGTCGCAGTATCTGTC AGCACCCTATGTCGCA GCCGGAGCACCCTATG GTCGCAGTA N CTGTCT ||||||||| |||||| GTCGCAGTA T CTGTCT GGATCT G CGATATACC |||||| ||||||||| GGATCT - CGATATACC AATCTGATCTTATTTT |||||||||||||||| AATCTGATCTTATTTT ATATATATATATATAT |||||||||||||||| ATATATATATATATAT TCTCTCCCA NN AGAGC ||||||||| ||||| TCTCTCCCA GG AGAGC Sample A Sample B Align Aggregate Aggregate Overlap Aggregate Normalize Aggregate Normalize Aggregate Statistics Parallel by read Handled by Hadoop Parallel by genome bin Handled by Hadoop Parallel by sample Handled by Hadoop Parallel by gene
  • 5. Myrna Table 1 . Timing and cost for a Myrna experiment with 1.1 billion 35 bp unpaired reads from the Pickrell et al study as input. Costs are approximate and based on the pricing as of this writing, that is, $0.68 per extra-large high-CPU EC2 node per hour in the Northern Virginia zone and $0.78 in other zones, plus a $0.12 per-node-per-hour surcharge for Elastic MapReduce in all zones. Times can vary subject to, for example, congestion and Internet traffic conditions. Data transfer adds about 1hr:15m, $11 Myrna Runtime, Cost for 1.1 billion reads from Pickrell et al study EC2 Nodes 1 master, 10 workers 1 master, 20 workers 1 master, 40 workers Worker CPU cores 80 160 320 Wall clock time 4h:20m 2h:32m 1h:38m Cluster setup 4m 4m 3m Align 2h:56m 1h:31m 54m Overlap 52m 31m 16m Normalize 6m 7m 6m Statistics 9m 6m 6m Summarize & Postprocess 13m 14m 13m Approximate cost (N. Virginia / Elsewhere) $44.00 / $49.50 $50.40 / $56.70 $65.60 / $73.80
  • 7. Bet-hedging architecture Cloud driver script Wrapper bowtie Wrapper soapsnp Postprocess Hadoop Wrapper bowtie Wrapper soapsnp Postprocess Hadoop Singleton driver script Wrapper bowtie Wrapper soapsnp Postprocess Perl, fork, sort Hadoop driver script Cloud mode Hadoop mode Single-computer mode
  • 8.
  • 9. Crossbow Data transfer adds about 1hr:15m, $28