SlideShare une entreprise Scribd logo
1  sur  70
 NGS techniques and data relevant for metagenomics analyses Lex Nederbragt Norwegian Sequencing Center & Centre for Ecological and Evolutionary Synthesis University of Oslo
The sequence revolution Stratton et al Nature458, 719-724
The sequence revolution Stratton et al Nature458, 719-724
Norwegian Sequencing Center www.sequencing.uio.no
This talk Technologies 454 Illumina Topics How does it work What do you get Quality check Filtering
How does it work: 454
Library preparation Shotgun library Amplicon library Starting from DNA sample Starting from PCR product
Library preparation Shotgun library Amplicon library Fragmentation A Rv Fw B Addition of adaptors Fw A B Rv
Multiplexing Amplicon library A A Fw Fw Rv Tag B Fw A B Rv Shotgun: tag in the adaptors
Amplification
Plate loading
Multiplexing 2 lanes 4 lanes 8 lanes 16 lanes Flickr.com
Sequencing PPi: pyrophosphate
Basecalling
Read length 500 bases
Coming soon
Single end Default single end sequencing Special protocols for mate-pairs
How does it work: Illumina
Library preparation Multiplexing:  same as for 454
Bridge amplification Metzker 2010 Nat Rev Genet.11(1):31-46
Bridge amplification Metzker 2010 Nat Rev Genet.11(1):31-46
Multiplexing Flowcell: 8 lanes
Sequencing Reversible terminators Metzker 2010 Nat Rev Genet.11(1):31-46
Basecalling Metzker 2010 Nat Rev Genet.11(1):31-46
Read length 454 GS FLX Titanium IlluminaHiSeq 500 bases
Paired-end Default paired-end sequencing single end also possible 150– 600 bases
What do you get?
454 Throughput GS FLX Titanium per-run output: Up to 1.5 million single-end reads Up to 600 megabases (Mb, million bases) Less for amplicons
Illumina throughput (HiSeq 2000) Variable length 50,100, (soon 150) single or paired-end per-run output: Up to 1 billion (109) single-end Up to 2 billion paired-end reads  Up to 200 gigabases (Gb, billion bases)  Soon: 3 times more reads and bases
What do you get? Errors! http://www.it.bton.ac.uk/staff/je/java/jewl/tutorial/tutorial.html
Error profiles 454 GS FLX Titanium Illumina Genome Analyzer II
454 specific 3 G's? 4 G's?
Illumina specific Substitutions e.g. AG Underrepresentation of AT and GC rich regions
Solving errors Oversampling
Oversampling: 454 Undercall in two reads Overcall  in three reads AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATT-GGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAAATTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAAATTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAAATTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATT-GGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAAATTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG Consensus
Solving errors Oversampling 454 amplicons: AmpliconNoise this course Illumina GC-bias: PCR conditions Aird et al. Genome Biology 2011, 12:R18
Duplicate reads Illumina: PCR step in library prep 454: two beads in one microreactor emulsion PCR
Chimeras Haas B J et al. Genome Res. 2011;21:494-504
Chimeras 454 FLX Titanium chimera rate of up to 20%  >70% of sequences representing particular genera  Haas B J et al. Genome Res. 2011;21:494-504
Chimeras: solutions ChimeraSlayer AmpliconNoise ChimeraCheck Mothur See Haas et al. 2011 Genome Res. 21:494-504
What do you get? Bytes!
Filesizes 454 Up to 2 Gbytes per lane (sff) two lanes HiSeq up to 20 Gb per lane (fastq) eight lanes
Datafiles 454 sff file (standard flowgram format) binary fasta & qual text
454: sff file (text format) >F7K88GK01BMPI0 Run Prefix: R_2009_12_18_15_27_42_ Region #: 1 XY Location: 0551_2346 Run Name: R_2009_12_18_15_27_42_FLX########_Administrator_yourrunname Analysis Name: D_2009_12_19_01_11_43_XX_fullProcessing Full Path: /data/R_2009_12_18_15_27_42_FLX########_Administrator_yourrunname/D_2009_12_19_01_11_43_XX_fullProcessing/ Read Header Len: 32 Name Length: 14 # of Bases: 500 Clip Qual Left: 15 Clip Qual Right: 490 Clip Adap Left: 0 Clip Adap Right: 0 Flowgram: 1.03 0.00 1.01 0.02 0.00 0.96 0.00 1.00 0.00 1.04 0.00 0.00 0.97 0.00 0.96 0.02 0.00 1.04 0.01 1.04 0.00 0.97 0.96 0.02 0.00 1.00 0.95 1.04 0.00 0.00 2.04 0.02 0.03 1.05 Flow Indexes: 1 3 6 8 10 13 15 18 20 22 23 26 27 28 31 31 34 35 37 37 37 40 43 45 47 47 47 50 53 53 53 55 58 60 63 66 67 67 67 67 70 71 71 74 74 76 79 82 83 86 86 88 88 91 93 96 97... Bases: tcagatcagacacgCCACTTTGCTCCCATTTCAGCACCCCACCAAGCACAAGGCTGTCATCCCAATTGGACGGACAGATATGAGGTTAGCATTGGAAACCAATTCAGTCCCTAATTATTCACGACTGAACCCAGCGACAATTGGACATGGATTCATTTTTCAACTTGATTTGTTGTTGTAAAAGCA... Quality Scores: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 38 38 38 40 40 40 39 39 39 40 34 34 34 40 40 40 40 39 26 26 26 26 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 ...
454: fasta and qual files Fasta: >FTJD6BE02HHD3W length=409 xy=2951_1562 region=2 run=R_2009_04_01_11_28_49_ AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAATTGTCCCTTTGACATAACGACTAAAGG AGTCAACAGATTTTCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACGCTATT ... Qual: >FTJD6BE02HHD3W length=409 xy=2951_1562 region=2 run=R_2009_04_01_11_28_49_ 40 40 39 39 39 40 40 40 40 40 40 40 40 38 31 26 26 16 16 16 20 20 14 14 14 14 27 33 32 35 36 33 36 35 36 38 35 20 20 21 24 24 22 36 39 40 38 38 38 40 40 40 40 40 40 37 37 37 33 33 29 36 38 38 38 38 38 38 38 35 20 21 21 21 31 36 37 40 40 35 37 37 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 ... Sanger-style Phred scores
454: fasta and qual files Fasta: >FTJD6BE02HHD3W length=409 xy=2951_1562 region=2 run=R_2009_04_01_11_28_49_ AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAATTGTCCCTTTGACATAACGACTAAAGG AGTCAACAGATTTTCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACGCTATT ... Qual: >FTJD6BE02HHD3W length=409 xy=2951_1562 region=2 run=R_2009_04_01_11_28_49_ 40 40 39 39 39 40 40 40 40 40 40 40 40 38 31 26 26 16 16 16 20 20 14 14 14 14 27 33 32 35 36 33 36 35 36 38 35 20 20 21 24 24 22 36 39 40 38 38 38 40 40 40 40 40 40 37 37 37 33 33 29 36 38 38 38 38 38 38 38 35 20 21 21 21 31 36 37 40 40 35 37 37 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 ... chance of being wrong: 1:104.0 = 1:10000 chance of being wrong: 1:103.5 = 1:3162 Sanger-style Phred scores
Illumina: fastq file @PCUS-319-EAS487_0004_FC:6:1:1351:952#0/1 CCAACATAGCTGGATGCCAACATAGCTGGATTGTTATAGCTGGTTTGCTTTTCTAACTCGCTGGAAGTTTATAAGCATTCCTACTATTTCATAGTATTAC +@PCUS-319-EAS487_0004_FC:6:1:1351:952#0/1 BBbfYcbV^BV`cQffaBZfB_fdfUYaa]`adcbfefcfd^cad^fOabRceb`beSbdfaad_e^^dbeedTbd`VcdfffYBddb^fae Quality score as characters:  Phred score = ASCII value -33 'B' is ASCII 66 Phred 33
Illumina: fastq file @PCUS-319-EAS487_0004_FC:6:1:1351:952#0/1 CCAACATAGCTGGATGCCAACATAGCTGGATTGTTATAGCTGGTTTGCTTTTCTAACTCGCTGGAAGTTTATAAGCATTCCTACTATTTCATAGTATTAC +@PCUS-319-EAS487_0004_FC:6:1:1351:952#0/1 BBbfYcbV^BV`cQffaBZfB_fdfUYaa]`adcbfefcfd^cad^fOabRceb`beSbdfaad_e^^dbeedTbd`VcdfffYBddb^fae Matching pair in the other file: +@PCUS-319-EAS487_0004_FC:6:1:1351:952#0/2
FastQ formats Cock PJ et al 2009 The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010 Apr;38(6):1767-71.  and http://en.wikipedia.org/wiki/Fastq
Quality control
Quality Control 454 (and others): Prinseq Illumina (and others): fastQC, fastQA, etc
Prinseq http://edwards.sdsu.edu/prinseq_beta Web-based and stand-alone Upload  fasta file qual file (optional)
Prinseq: read length
Prinseq: quality per position
Prinseq: quality values
Prinseq: duplicate reads
Prinseq: adaptors No tag Barcode (Roche 'MID') Transcriptome library adaptor
Prinseq: contamination The dinucleotide odds ratios* Principal component analysis (PCA) *dinucleotide frequencies normalized for the base composition
FastQC http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ Stand-alone GUI (Java based) Upload  fasta file qual file (optional)
FastQC: quality per position
FastQC: quality per position
FastQC: quality values
FastQC: nucleotide composition
FastQC: GC distribution
FastQC: duplicated reads
Filtering/trimming Adaptor removal  especially Illumina Duplicate removal Filtering for low quality bases or stretches of them reads with 'N's E.g.  fastX toolkit prinseq
Other technologies Life Technologies SOLiD ionTorrent not much used for metagenomics Pacific Biosciences PacBio RS large potential
Pacific Biosciences Zero Mode Waveguides Metzker 2010 Nat Rev Genet.11(1):31-46
Pacific Biosciences Metzker 2010 Nat Rev Genet.11(1):31-46
Videos http://www.qiagen.com/media/player.aspx?movie=Pyrosequencing http://www.youtube.com/watch?v=HtuUFUnYB9Y

Contenu connexe

Tendances

Visual Log Analysis - DefCon 2006
Visual Log Analysis - DefCon 2006Visual Log Analysis - DefCon 2006
Visual Log Analysis - DefCon 2006Raffael Marty
 
Event Graphs - EUSecWest 2006
Event Graphs - EUSecWest 2006Event Graphs - EUSecWest 2006
Event Graphs - EUSecWest 2006Raffael Marty
 
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...Cisco Russia
 
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...Cisco Russia
 
3 scanning-ger paoctes-pub
3  scanning-ger paoctes-pub3  scanning-ger paoctes-pub
3 scanning-ger paoctes-pubCassio Ramos
 
Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...Yasset Perez-Riverol
 
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...Ontico
 
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...Андрей Шорин
 
True stories on the analysis of network activity using Python
True stories on the analysis of network activity using PythonTrue stories on the analysis of network activity using Python
True stories on the analysis of network activity using Pythondelimitry
 
20141219 workshop methylation sequencing analysis
20141219 workshop methylation sequencing analysis20141219 workshop methylation sequencing analysis
20141219 workshop methylation sequencing analysisYi-Feng Chang
 
Incident response: Advanced Network Forensics
Incident response: Advanced Network ForensicsIncident response: Advanced Network Forensics
Incident response: Advanced Network ForensicsNapier University
 
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemSneha Inguva
 
Fredmoyer postgresopen 2017
Fredmoyer postgresopen 2017Fredmoyer postgresopen 2017
Fredmoyer postgresopen 2017Fred Moyer
 

Tendances (20)

Visual Log Analysis - DefCon 2006
Visual Log Analysis - DefCon 2006Visual Log Analysis - DefCon 2006
Visual Log Analysis - DefCon 2006
 
Event Graphs - EUSecWest 2006
Event Graphs - EUSecWest 2006Event Graphs - EUSecWest 2006
Event Graphs - EUSecWest 2006
 
PhD Defence
PhD DefencePhD Defence
PhD Defence
 
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
Пример отчета по анализу вредоносного кода TeslaCrypt, подготовленного Cisco ...
 
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...
Пример отчета по анализу вредоносного кода Zeus, подготовленного Cisco AMP Th...
 
3 scanning-ger paoctes-pub
3  scanning-ger paoctes-pub3  scanning-ger paoctes-pub
3 scanning-ger paoctes-pub
 
Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...Systematic integration of millions of peptidoform evidences into Ensembl and ...
Systematic integration of millions of peptidoform evidences into Ensembl and ...
 
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
 
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
 
2 netcat enum-pub
2 netcat enum-pub2 netcat enum-pub
2 netcat enum-pub
 
True stories on the analysis of network activity using Python
True stories on the analysis of network activity using PythonTrue stories on the analysis of network activity using Python
True stories on the analysis of network activity using Python
 
20141219 workshop methylation sequencing analysis
20141219 workshop methylation sequencing analysis20141219 workshop methylation sequencing analysis
20141219 workshop methylation sequencing analysis
 
Pasteur deep seq_analysis_theory_2016
Pasteur deep seq_analysis_theory_2016Pasteur deep seq_analysis_theory_2016
Pasteur deep seq_analysis_theory_2016
 
Log
LogLog
Log
 
Nat64 server
Nat64 serverNat64 server
Nat64 server
 
Incident response: Advanced Network Forensics
Incident response: Advanced Network ForensicsIncident response: Advanced Network Forensics
Incident response: Advanced Network Forensics
 
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use Them
 
Log
LogLog
Log
 
Fredmoyer postgresopen 2017
Fredmoyer postgresopen 2017Fredmoyer postgresopen 2017
Fredmoyer postgresopen 2017
 
Advanced Computational Drug Design
Advanced Computational Drug DesignAdvanced Computational Drug Design
Advanced Computational Drug Design
 

En vedette

[기아워캠9기] 조용윤 활동결과보고_국내무안캠프 KNCU02
[기아워캠9기] 조용윤 활동결과보고_국내무안캠프 KNCU02[기아워캠9기] 조용윤 활동결과보고_국내무안캠프 KNCU02
[기아워캠9기] 조용윤 활동결과보고_국내무안캠프 KNCU02kiaworkcamp
 
Nmdl final project-dan jiang (revised)
Nmdl final project-dan jiang (revised)Nmdl final project-dan jiang (revised)
Nmdl final project-dan jiang (revised)cathyjd88
 
Smau Milano 2014 Filippo Novario
Smau Milano 2014 Filippo NovarioSmau Milano 2014 Filippo Novario
Smau Milano 2014 Filippo NovarioSMAU
 
Nuevo rating (BBB-) de la deuda española
Nuevo rating (BBB-) de la deuda españolaNuevo rating (BBB-) de la deuda española
Nuevo rating (BBB-) de la deuda españolaManfredNolte
 
자료구조 5차
자료구조 5차자료구조 5차
자료구조 5차who7117
 
Email Marketing 3.0 Adapting to Mobile 1st World
Email Marketing 3.0 Adapting to Mobile 1st WorldEmail Marketing 3.0 Adapting to Mobile 1st World
Email Marketing 3.0 Adapting to Mobile 1st WorldAffiliate Summit
 
대신리포트 모닝미팅 160322
대신리포트 모닝미팅 160322대신리포트 모닝미팅 160322
대신리포트 모닝미팅 160322DaishinSecurities
 
Homenaje a la Guardia Civil
Homenaje a la Guardia CivilHomenaje a la Guardia Civil
Homenaje a la Guardia CivilDraco703
 
Ultimos avances tecnológicos
Ultimos avances tecnológicosUltimos avances tecnológicos
Ultimos avances tecnológicosbrianxhp
 
Meno facebook, più marketing smau
Meno facebook, più marketing smauMeno facebook, più marketing smau
Meno facebook, più marketing smauLa Content
 
Ben Stoker text architecture as theology in theory and in practice at the chu...
Ben Stoker text architecture as theology in theory and in practice at the chu...Ben Stoker text architecture as theology in theory and in practice at the chu...
Ben Stoker text architecture as theology in theory and in practice at the chu...Historic England
 
EL DIENTE
 EL DIENTE EL DIENTE
EL DIENTElaly145
 
«Επιχειρηματικότητα και Πράσινη Στρατηγική» - Κάρολος Παπαδάς
«Επιχειρηματικότητα και Πράσινη Στρατηγική»  - Κάρολος Παπαδάς«Επιχειρηματικότητα και Πράσινη Στρατηγική»  - Κάρολος Παπαδάς
«Επιχειρηματικότητα και Πράσινη Στρατηγική» - Κάρολος ΠαπαδάςStarttech Ventures
 
Textual analysis contents pages
Textual analysis  contents pagesTextual analysis  contents pages
Textual analysis contents pages051MARY
 

En vedette (20)

[기아워캠9기] 조용윤 활동결과보고_국내무안캠프 KNCU02
[기아워캠9기] 조용윤 활동결과보고_국내무안캠프 KNCU02[기아워캠9기] 조용윤 활동결과보고_국내무안캠프 KNCU02
[기아워캠9기] 조용윤 활동결과보고_국내무안캠프 KNCU02
 
Brian
BrianBrian
Brian
 
Nmdl final project-dan jiang (revised)
Nmdl final project-dan jiang (revised)Nmdl final project-dan jiang (revised)
Nmdl final project-dan jiang (revised)
 
Smau Milano 2014 Filippo Novario
Smau Milano 2014 Filippo NovarioSmau Milano 2014 Filippo Novario
Smau Milano 2014 Filippo Novario
 
Nuevo rating (BBB-) de la deuda española
Nuevo rating (BBB-) de la deuda españolaNuevo rating (BBB-) de la deuda española
Nuevo rating (BBB-) de la deuda española
 
The circle4
The circle4The circle4
The circle4
 
자료구조 5차
자료구조 5차자료구조 5차
자료구조 5차
 
Verkosto 2012
Verkosto 2012Verkosto 2012
Verkosto 2012
 
Jaime2
Jaime2Jaime2
Jaime2
 
Email Marketing 3.0 Adapting to Mobile 1st World
Email Marketing 3.0 Adapting to Mobile 1st WorldEmail Marketing 3.0 Adapting to Mobile 1st World
Email Marketing 3.0 Adapting to Mobile 1st World
 
Group 4 spelling
Group 4 spelling Group 4 spelling
Group 4 spelling
 
대신리포트 모닝미팅 160322
대신리포트 모닝미팅 160322대신리포트 모닝미팅 160322
대신리포트 모닝미팅 160322
 
Homenaje a la Guardia Civil
Homenaje a la Guardia CivilHomenaje a la Guardia Civil
Homenaje a la Guardia Civil
 
Ultimos avances tecnológicos
Ultimos avances tecnológicosUltimos avances tecnológicos
Ultimos avances tecnológicos
 
Meno facebook, più marketing smau
Meno facebook, più marketing smauMeno facebook, più marketing smau
Meno facebook, più marketing smau
 
Ben Stoker text architecture as theology in theory and in practice at the chu...
Ben Stoker text architecture as theology in theory and in practice at the chu...Ben Stoker text architecture as theology in theory and in practice at the chu...
Ben Stoker text architecture as theology in theory and in practice at the chu...
 
EL DIENTE
 EL DIENTE EL DIENTE
EL DIENTE
 
«Επιχειρηματικότητα και Πράσινη Στρατηγική» - Κάρολος Παπαδάς
«Επιχειρηματικότητα και Πράσινη Στρατηγική»  - Κάρολος Παπαδάς«Επιχειρηματικότητα και Πράσινη Στρατηγική»  - Κάρολος Παπαδάς
«Επιχειρηματικότητα και Πράσινη Στρατηγική» - Κάρολος Παπαδάς
 
The circle light
The circle lightThe circle light
The circle light
 
Textual analysis contents pages
Textual analysis  contents pagesTextual analysis  contents pages
Textual analysis contents pages
 

Similaire à NGS techniques and data

Equipment inventory list dec. 2020-en
Equipment inventory list dec. 2020-enEquipment inventory list dec. 2020-en
Equipment inventory list dec. 2020-enEmily Tan
 
Researching postgresql
Researching postgresqlResearching postgresql
Researching postgresqlFernando Ike
 
Xilinx vs Intel (Altera) FPGA performance comparison
Xilinx vs Intel (Altera) FPGA performance comparison Xilinx vs Intel (Altera) FPGA performance comparison
Xilinx vs Intel (Altera) FPGA performance comparison Roy Messinger
 
RNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then someRNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then somebasepairtech
 
Course on parsing methods for biologists with a focus on ChIP-seq data
Course on parsing methods for biologists with a focus on ChIP-seq dataCourse on parsing methods for biologists with a focus on ChIP-seq data
Course on parsing methods for biologists with a focus on ChIP-seq dataLuca Cozzuto
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.Jennifer Shelton
 
Imgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorialImgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorialDeanna Church
 
Building an Automated Behavioral Malware Analysis Environment using Free and ...
Building an Automated Behavioral Malware Analysis Environment using Free and ...Building an Automated Behavioral Malware Analysis Environment using Free and ...
Building an Automated Behavioral Malware Analysis Environment using Free and ...Jim Clausing
 
Studies on 16 s rrna of f. columnare
Studies on 16 s rrna of f. columnareStudies on 16 s rrna of f. columnare
Studies on 16 s rrna of f. columnareSoumya Sankar Rath
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerSasha Goldshtein
 
ULA network experience @ JANOG34, by Shishio Tsuchiya [APNIC 38 / APIPv6TF]
ULA network experience @ JANOG34, by Shishio Tsuchiya [APNIC 38 / APIPv6TF]ULA network experience @ JANOG34, by Shishio Tsuchiya [APNIC 38 / APIPv6TF]
ULA network experience @ JANOG34, by Shishio Tsuchiya [APNIC 38 / APIPv6TF]APNIC
 
Concordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsConcordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsAndrea Ujvari
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleJennifer Shelton
 
Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...
Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...
Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...St John's Laboratory Ltd
 

Similaire à NGS techniques and data (20)

Equipment inventory list dec. 2020-en
Equipment inventory list dec. 2020-enEquipment inventory list dec. 2020-en
Equipment inventory list dec. 2020-en
 
Researching postgresql
Researching postgresqlResearching postgresql
Researching postgresql
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
Submitted sequence (strains)
Submitted sequence (strains)Submitted sequence (strains)
Submitted sequence (strains)
 
Xilinx vs Intel (Altera) FPGA performance comparison
Xilinx vs Intel (Altera) FPGA performance comparison Xilinx vs Intel (Altera) FPGA performance comparison
Xilinx vs Intel (Altera) FPGA performance comparison
 
Blast fasta 4
Blast fasta 4Blast fasta 4
Blast fasta 4
 
RNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then someRNA-Seq Analysis: Everything You Always Wanted to Know...and then some
RNA-Seq Analysis: Everything You Always Wanted to Know...and then some
 
Course on parsing methods for biologists with a focus on ChIP-seq data
Course on parsing methods for biologists with a focus on ChIP-seq dataCourse on parsing methods for biologists with a focus on ChIP-seq data
Course on parsing methods for biologists with a focus on ChIP-seq data
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
 
Imgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorialImgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorial
 
Building an Automated Behavioral Malware Analysis Environment using Free and ...
Building an Automated Behavioral Malware Analysis Environment using Free and ...Building an Automated Behavioral Malware Analysis Environment using Free and ...
Building an Automated Behavioral Malware Analysis Environment using Free and ...
 
Studies on 16 s rrna of f. columnare
Studies on 16 s rrna of f. columnareStudies on 16 s rrna of f. columnare
Studies on 16 s rrna of f. columnare
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF Primer
 
Similarity
SimilaritySimilarity
Similarity
 
ULA network experience @ JANOG34, by Shishio Tsuchiya [APNIC 38 / APIPv6TF]
ULA network experience @ JANOG34, by Shishio Tsuchiya [APNIC 38 / APIPv6TF]ULA network experience @ JANOG34, by Shishio Tsuchiya [APNIC 38 / APIPv6TF]
ULA network experience @ JANOG34, by Shishio Tsuchiya [APNIC 38 / APIPv6TF]
 
Concordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsConcordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_results
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Apoptosis
ApoptosisApoptosis
Apoptosis
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
 
Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...
Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...
Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...
 

Plus de Lex Nederbragt

Coding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraCoding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraLex Nederbragt
 
Why of version control
Why of version controlWhy of version control
Why of version controlLex Nederbragt
 
Assembly: before and after
Assembly: before and afterAssembly: before and after
Assembly: before and afterLex Nederbragt
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioLex Nederbragt
 
Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)? Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)? Lex Nederbragt
 
A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...Lex Nederbragt
 
Combining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyCombining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyLex Nederbragt
 
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Lex Nederbragt
 
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...Lex Nederbragt
 
How and why I use blogging
How and why I use bloggingHow and why I use blogging
How and why I use bloggingLex Nederbragt
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeLex Nederbragt
 
Assembly of metagenomes
Assembly of metagenomesAssembly of metagenomes
Assembly of metagenomesLex Nederbragt
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challengesLex Nederbragt
 

Plus de Lex Nederbragt (13)

Coding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraCoding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS era
 
Why of version control
Why of version controlWhy of version control
Why of version control
 
Assembly: before and after
Assembly: before and afterAssembly: before and after
Assembly: before and after
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBio
 
Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)? Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)?
 
A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...
 
Combining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyCombining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assembly
 
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
 
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
 
How and why I use blogging
How and why I use bloggingHow and why I use blogging
How and why I use blogging
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genome
 
Assembly of metagenomes
Assembly of metagenomesAssembly of metagenomes
Assembly of metagenomes
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challenges
 

NGS techniques and data

  • 1.  NGS techniques and data relevant for metagenomics analyses Lex Nederbragt Norwegian Sequencing Center & Centre for Ecological and Evolutionary Synthesis University of Oslo
  • 2. The sequence revolution Stratton et al Nature458, 719-724
  • 3. The sequence revolution Stratton et al Nature458, 719-724
  • 4. Norwegian Sequencing Center www.sequencing.uio.no
  • 5. This talk Technologies 454 Illumina Topics How does it work What do you get Quality check Filtering
  • 6. How does it work: 454
  • 7. Library preparation Shotgun library Amplicon library Starting from DNA sample Starting from PCR product
  • 8. Library preparation Shotgun library Amplicon library Fragmentation A Rv Fw B Addition of adaptors Fw A B Rv
  • 9. Multiplexing Amplicon library A A Fw Fw Rv Tag B Fw A B Rv Shotgun: tag in the adaptors
  • 12. Multiplexing 2 lanes 4 lanes 8 lanes 16 lanes Flickr.com
  • 17. Single end Default single end sequencing Special protocols for mate-pairs
  • 18. How does it work: Illumina
  • 20. Bridge amplification Metzker 2010 Nat Rev Genet.11(1):31-46
  • 21. Bridge amplification Metzker 2010 Nat Rev Genet.11(1):31-46
  • 23. Sequencing Reversible terminators Metzker 2010 Nat Rev Genet.11(1):31-46
  • 24. Basecalling Metzker 2010 Nat Rev Genet.11(1):31-46
  • 25. Read length 454 GS FLX Titanium IlluminaHiSeq 500 bases
  • 26. Paired-end Default paired-end sequencing single end also possible 150– 600 bases
  • 27. What do you get?
  • 28. 454 Throughput GS FLX Titanium per-run output: Up to 1.5 million single-end reads Up to 600 megabases (Mb, million bases) Less for amplicons
  • 29. Illumina throughput (HiSeq 2000) Variable length 50,100, (soon 150) single or paired-end per-run output: Up to 1 billion (109) single-end Up to 2 billion paired-end reads Up to 200 gigabases (Gb, billion bases) Soon: 3 times more reads and bases
  • 30. What do you get? Errors! http://www.it.bton.ac.uk/staff/je/java/jewl/tutorial/tutorial.html
  • 31. Error profiles 454 GS FLX Titanium Illumina Genome Analyzer II
  • 32. 454 specific 3 G's? 4 G's?
  • 33. Illumina specific Substitutions e.g. AG Underrepresentation of AT and GC rich regions
  • 35. Oversampling: 454 Undercall in two reads Overcall in three reads AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATT-GGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAAATTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAAATTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAAATTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATT-GGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAAATTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAA-TTGTCCCTTTGACATAACGACTAAAGG Consensus
  • 36. Solving errors Oversampling 454 amplicons: AmpliconNoise this course Illumina GC-bias: PCR conditions Aird et al. Genome Biology 2011, 12:R18
  • 37. Duplicate reads Illumina: PCR step in library prep 454: two beads in one microreactor emulsion PCR
  • 38. Chimeras Haas B J et al. Genome Res. 2011;21:494-504
  • 39. Chimeras 454 FLX Titanium chimera rate of up to 20% >70% of sequences representing particular genera Haas B J et al. Genome Res. 2011;21:494-504
  • 40. Chimeras: solutions ChimeraSlayer AmpliconNoise ChimeraCheck Mothur See Haas et al. 2011 Genome Res. 21:494-504
  • 41. What do you get? Bytes!
  • 42. Filesizes 454 Up to 2 Gbytes per lane (sff) two lanes HiSeq up to 20 Gb per lane (fastq) eight lanes
  • 43. Datafiles 454 sff file (standard flowgram format) binary fasta & qual text
  • 44. 454: sff file (text format) >F7K88GK01BMPI0 Run Prefix: R_2009_12_18_15_27_42_ Region #: 1 XY Location: 0551_2346 Run Name: R_2009_12_18_15_27_42_FLX########_Administrator_yourrunname Analysis Name: D_2009_12_19_01_11_43_XX_fullProcessing Full Path: /data/R_2009_12_18_15_27_42_FLX########_Administrator_yourrunname/D_2009_12_19_01_11_43_XX_fullProcessing/ Read Header Len: 32 Name Length: 14 # of Bases: 500 Clip Qual Left: 15 Clip Qual Right: 490 Clip Adap Left: 0 Clip Adap Right: 0 Flowgram: 1.03 0.00 1.01 0.02 0.00 0.96 0.00 1.00 0.00 1.04 0.00 0.00 0.97 0.00 0.96 0.02 0.00 1.04 0.01 1.04 0.00 0.97 0.96 0.02 0.00 1.00 0.95 1.04 0.00 0.00 2.04 0.02 0.03 1.05 Flow Indexes: 1 3 6 8 10 13 15 18 20 22 23 26 27 28 31 31 34 35 37 37 37 40 43 45 47 47 47 50 53 53 53 55 58 60 63 66 67 67 67 67 70 71 71 74 74 76 79 82 83 86 86 88 88 91 93 96 97... Bases: tcagatcagacacgCCACTTTGCTCCCATTTCAGCACCCCACCAAGCACAAGGCTGTCATCCCAATTGGACGGACAGATATGAGGTTAGCATTGGAAACCAATTCAGTCCCTAATTATTCACGACTGAACCCAGCGACAATTGGACATGGATTCATTTTTCAACTTGATTTGTTGTTGTAAAAGCA... Quality Scores: 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 38 38 38 40 40 40 39 39 39 40 34 34 34 40 40 40 40 39 26 26 26 26 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 ...
  • 45. 454: fasta and qual files Fasta: >FTJD6BE02HHD3W length=409 xy=2951_1562 region=2 run=R_2009_04_01_11_28_49_ AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAATTGTCCCTTTGACATAACGACTAAAGG AGTCAACAGATTTTCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACGCTATT ... Qual: >FTJD6BE02HHD3W length=409 xy=2951_1562 region=2 run=R_2009_04_01_11_28_49_ 40 40 39 39 39 40 40 40 40 40 40 40 40 38 31 26 26 16 16 16 20 20 14 14 14 14 27 33 32 35 36 33 36 35 36 38 35 20 20 21 24 24 22 36 39 40 38 38 38 40 40 40 40 40 40 37 37 37 33 33 29 36 38 38 38 38 38 38 38 35 20 21 21 21 31 36 37 40 40 35 37 37 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 ... Sanger-style Phred scores
  • 46. 454: fasta and qual files Fasta: >FTJD6BE02HHD3W length=409 xy=2951_1562 region=2 run=R_2009_04_01_11_28_49_ AGAAAGTCAGCGGCAAATTTGGTTTTAGACGAATTGTCCCTTTGACATAACGACTAAAGG AGTCAACAGATTTTCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACGCTATT ... Qual: >FTJD6BE02HHD3W length=409 xy=2951_1562 region=2 run=R_2009_04_01_11_28_49_ 40 40 39 39 39 40 40 40 40 40 40 40 40 38 31 26 26 16 16 16 20 20 14 14 14 14 27 33 32 35 36 33 36 35 36 38 35 20 20 21 24 24 22 36 39 40 38 38 38 40 40 40 40 40 40 37 37 37 33 33 29 36 38 38 38 38 38 38 38 35 20 21 21 21 31 36 37 40 40 35 37 37 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 ... chance of being wrong: 1:104.0 = 1:10000 chance of being wrong: 1:103.5 = 1:3162 Sanger-style Phred scores
  • 47. Illumina: fastq file @PCUS-319-EAS487_0004_FC:6:1:1351:952#0/1 CCAACATAGCTGGATGCCAACATAGCTGGATTGTTATAGCTGGTTTGCTTTTCTAACTCGCTGGAAGTTTATAAGCATTCCTACTATTTCATAGTATTAC +@PCUS-319-EAS487_0004_FC:6:1:1351:952#0/1 BBbfYcbV^BV`cQffaBZfB_fdfUYaa]`adcbfefcfd^cad^fOabRceb`beSbdfaad_e^^dbeedTbd`VcdfffYBddb^fae Quality score as characters: Phred score = ASCII value -33 'B' is ASCII 66 Phred 33
  • 48. Illumina: fastq file @PCUS-319-EAS487_0004_FC:6:1:1351:952#0/1 CCAACATAGCTGGATGCCAACATAGCTGGATTGTTATAGCTGGTTTGCTTTTCTAACTCGCTGGAAGTTTATAAGCATTCCTACTATTTCATAGTATTAC +@PCUS-319-EAS487_0004_FC:6:1:1351:952#0/1 BBbfYcbV^BV`cQffaBZfB_fdfUYaa]`adcbfefcfd^cad^fOabRceb`beSbdfaad_e^^dbeedTbd`VcdfffYBddb^fae Matching pair in the other file: +@PCUS-319-EAS487_0004_FC:6:1:1351:952#0/2
  • 49. FastQ formats Cock PJ et al 2009 The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010 Apr;38(6):1767-71. and http://en.wikipedia.org/wiki/Fastq
  • 51. Quality Control 454 (and others): Prinseq Illumina (and others): fastQC, fastQA, etc
  • 52. Prinseq http://edwards.sdsu.edu/prinseq_beta Web-based and stand-alone Upload fasta file qual file (optional)
  • 57. Prinseq: adaptors No tag Barcode (Roche 'MID') Transcriptome library adaptor
  • 58. Prinseq: contamination The dinucleotide odds ratios* Principal component analysis (PCA) *dinucleotide frequencies normalized for the base composition
  • 59. FastQC http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ Stand-alone GUI (Java based) Upload fasta file qual file (optional)
  • 66. Filtering/trimming Adaptor removal especially Illumina Duplicate removal Filtering for low quality bases or stretches of them reads with 'N's E.g. fastX toolkit prinseq
  • 67. Other technologies Life Technologies SOLiD ionTorrent not much used for metagenomics Pacific Biosciences PacBio RS large potential
  • 68. Pacific Biosciences Zero Mode Waveguides Metzker 2010 Nat Rev Genet.11(1):31-46
  • 69. Pacific Biosciences Metzker 2010 Nat Rev Genet.11(1):31-46