28. 454 Throughput GS FLX Titanium per-run output: Up to 1.5 million single-end reads Up to 600 megabases (Mb, million bases) Less for amplicons
29. Illumina throughput (HiSeq 2000) Variable length 50,100, (soon 150) single or paired-end per-run output: Up to 1 billion (109) single-end Up to 2 billion paired-end reads Up to 200 gigabases (Gb, billion bases) Soon: 3 times more reads and bases
30. What do you get? Errors! http://www.it.bton.ac.uk/staff/je/java/jewl/tutorial/tutorial.html
47. Illumina: fastq file @PCUS-319-EAS487_0004_FC:6:1:1351:952#0/1 CCAACATAGCTGGATGCCAACATAGCTGGATTGTTATAGCTGGTTTGCTTTTCTAACTCGCTGGAAGTTTATAAGCATTCCTACTATTTCATAGTATTAC +@PCUS-319-EAS487_0004_FC:6:1:1351:952#0/1 BBbfYcbV^BV`cQffaBZfB_fdfUYaa]`adcbfefcfd^cad^fOabRceb`beSbdfaad_e^^dbeedTbd`VcdfffYBddb^fae Quality score as characters: Phred score = ASCII value -33 'B' is ASCII 66 Phred 33
48. Illumina: fastq file @PCUS-319-EAS487_0004_FC:6:1:1351:952#0/1 CCAACATAGCTGGATGCCAACATAGCTGGATTGTTATAGCTGGTTTGCTTTTCTAACTCGCTGGAAGTTTATAAGCATTCCTACTATTTCATAGTATTAC +@PCUS-319-EAS487_0004_FC:6:1:1351:952#0/1 BBbfYcbV^BV`cQffaBZfB_fdfUYaa]`adcbfefcfd^cad^fOabRceb`beSbdfaad_e^^dbeedTbd`VcdfffYBddb^fae Matching pair in the other file: +@PCUS-319-EAS487_0004_FC:6:1:1351:952#0/2
49. FastQ formats Cock PJ et al 2009 The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010 Apr;38(6):1767-71. and http://en.wikipedia.org/wiki/Fastq
58. Prinseq: contamination The dinucleotide odds ratios* Principal component analysis (PCA) *dinucleotide frequencies normalized for the base composition
66. Filtering/trimming Adaptor removal especially Illumina Duplicate removal Filtering for low quality bases or stretches of them reads with 'N's E.g. fastX toolkit prinseq
67. Other technologies Life Technologies SOLiD ionTorrent not much used for metagenomics Pacific Biosciences PacBio RS large potential