SNP Calling & Outbreak Reconstruction in a Monomorphic Pathogen
1. S N P C A L L I N G & O U T B R E A K
R E C O N S T R U C T I O N I N A
M O N O M O R P H I C PAT H O G E N
W I T H I N - H O S T D I V E R S I T Y A N D O T H E R C O N S I D E R A T I O N S
2. Y O U R I N S T R U C T O R
D R . J E N N I F E R G A R D Y
S E N I O R S C I E N T I S T,
B R I T I S H C O L U M B I A
C E N T R E F O R D I S E A S E
C O N T R O L
A S S I S TA N T P R O F E S S O R ,
S C H O O L O F P O P U L AT I O N
& P U B L I C H E A LT H ,
U N I V E R S I T Y O F B R I T I S H
C O L U M B I A
jennifer.gardy@bccdc.ca
@jennifergardy
3. A G E N D A
• Introduction I: a bit about my research
• Introduction II: WGS for outbreak investigation: hooray for
clonal pathogens
• Lesson 1: getting a good dataset
• Lesson 2: linking variation to transmission
• Lesson 3: within-host diversity
• Lesson 4: putting it together
4. M Y R E S E A R C H I N T E R E S T S
I N T R O D U C T I O N
5. TB IS CAUSED BY
Mycobacterium tuberculosis
• Infects alveolar macrophages
• Doubling time of 15-24h
• Can exist in latent phase
• ~90% of infections never
progress to active disease
• Highly clonal population
• 7 major lineages recognized
worldwide
• ~4.4 Mbp genome
• No ECEs
• ~10% repetitive regions
• 37 complete MTB reference
genomes, 1000s of draft
assemblies
6.
7.
8. One key to stopping TB is
UNDERSTANDING
TRANSMISSION
9. BCCDC is responsible for communicable disease
diagnosis, surveillance, epidemiology, and
prevention in British Columbia, Canada.
12. M O L E C U L A R T Y P I N G
O F M . T U B E R C U L O S I S
• SPOLIGOTYPING
• 43 oligonucleotide spacers between conserved direct repeats
• Hybridisation assay: is spacer present or not? Binary 0 or 1
• 43-digit binary string converted to 15-digit string using octal
transformation
• IS6110-RFLP
• Restriction enzyme digest followed by electrophoresis
• Probe these ladders for IS6110 insertion element
• Final pattern is just the bands with IS6110
• MIRU-VNTR
• PCR amplification of 12-24 MIRU (Mycobacterial Interspersed
Repetitive Unit) VNTR regions
• Size of amplified product indicates number of repeats
• Final fingerprint is a 12 or 24-digit number
14. L I M I TAT I O N S O F C U R R E N T M E T H O D S
• Genotyping methods only tell you a cluster of
cases exists, not the order/direction of
transmission
• Size/membership of the cluster varies with the
molecular typing method(s) used
• Epidemiological investigation is required to
derive the links between cases, and may not
be available or of sufficient quality
35. A D VA N TA G E S O F W O R K I N G W I T H
C L O N A L PAT H O G E N S I N W G S
• Genetically monomorphic - limited/no recombination/HGT,
low diversity compared to other organisms
• Easy to find a reference genome to align reads against
• De novo assembly also easier
• Diversity largely arises through insertions, deletions, and
point mutations
• Identification of these elements is a single-step process
• Can use most of the genome for comparing multiple
isolates, instead of a small subset of core genes
• More data, more accurate phylogenies, prediction of
function and resistance
37. Comas et al, PLoS One
1. Align your genome
against a standard
reference genome,
find variation
2. Assign it to a
lineage with the
lineage-defining
variations
3. Within a lineage,
place your isolate
into the phylogeny of
previously-
sequenced genomes
4. Look for SNPs
indicating drug
resistance or
epidemiological
clustering
38.
39. N AT I O N - W I D E W G S O F T B
S P E C I AT E , R E S I S TA N C E T Y P E , E P I L I N K S
S I N G L E D ATA B A S E
40. RECAP SO FAR
• WGS CAN BE USED TO TRACK PERSON-
TO-PERSON TRANSMISSION AND
EPIDEMIC DYNAMICS - “GENOMIC
EPIDEMIOLOGY”
• CLONAL PATHOGENS (E.G. TB, MRSA, Y.
PESTIS, B. ANTHRACIS, ETC…) ARE AN
ESPECIALLY GOOD USE CASE FOR WGS
• GENOMIC EPI REQUIRES MAPPING TO A
REFERENCE AND CALLING SNPS
41. G E T T I N G A G O O D D ATA S E T
L E S S O N 1
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
gi|50953765|ref|NC_002755.2| 235 . CG C 328.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=1.570;DP=52;FS=1.221;MLEAC=1;MLEAF=1.00;MQ=59.61;MQ0=0;MQRankSum=-0.984;QD=6.33;RPA=3,2;RU=G;ReadPosRankSum=0.797;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:15,31:52:99:1:1.00:368,0
gi|50953765|ref|NC_002755.2| 238 . GC G 403.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=2.733;DP=53;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=59.61;MQ0=0;MQRankSum=-2.059;QD=7.62;RPA=4,3;RU=C;ReadPosRankSum=0.349;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:15,30:52:99:1:1.00:443,0
gi|50953765|ref|NC_002755.2| 3631 . GC G 215.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-2.204;DP=54;FS=69.670;MLEAC=1;MLEAF=1.00;MQ=58.43;MQ0=0;MQRankSum=-1.742;QD=4.00;RPA=4,3;RU=C;ReadPosRankSum=0.384;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:17,25:52:99:1:1.00:255,0
gi|50953765|ref|NC_002755.2| 4123 . C T 1459 . AC=1;AF=1.00;AN=1;DP=55;Dels=0.00;FS=0.000;HaplotypeScore=13.3300;MLEAC=1;MLEAF=1.00;MQ=59.28;MQ0=0;QD=26.53 GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:0,55:55:99:1:1.00:1489,0
gi|50953765|ref|NC_002755.2| 4630 . CG C 163.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-0.623;DP=42;FS=3.012;MLEAC=1;MLEAF=1.00;MQ=59.69;MQ0=0;MQRankSum=2.239;QD=3.90;RPA=4,3;RU=G;ReadPosRankSum=0.084;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:14,21:42:99:1:1.00:203,0
gi|50953765|ref|NC_002755.2| 5701 . AC A 68.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=0.591;DP=39;FS=44.682;MLEAC=1;MLEAF=1.00;MQ=59.17;MQ0=0;MQRankSum=0.066;QD=1.77;RPA=4,3;RU=C;ReadPosRankSum=-0.394;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:11,19:39:99:1:1.00:108,0
gi|50953765|ref|NC_002755.2| 7543 . TG T 247.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-1.463;DP=43;FS=79.062;MLEAC=1;MLEAF=1.00;MQ=59.54;MQ0=0;MQRankSum=0.547;QD=5.77;RPA=3,2;RU=G;ReadPosRankSum=1.996;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:16,22:43:99:1:1.00:287,0
gi|50953765|ref|NC_002755.2| 12448 . TG T 292.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=0.000;DP=38;FS=41.965;MLEAC=1;MLEAF=1.00;MQ=57.91;MQ0=0;MQRankSum=2.037;QD=7.71;RPA=5,4;RU=G;ReadPosRankSum=0.724;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:9,21:35:99:1:1.00:332,0
gi|50953765|ref|NC_002755.2| 13030 . CG C 344.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=0.857;DP=57;FS=5.672;MLEAC=1;MLEAF=1.00;MQ=59.28;MQ0=0;MQRankSum=0.334;QD=6.05;RPA=2,1;RU=G;ReadPosRankSum=0.009;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:22,31:57:99:1:1.00:384,0
gi|50953765|ref|NC_002755.2| 14147 . GC G 299.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-1.195;DP=49;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=59.22;MQ0=0;MQRankSum=1.344;QD=6.12;RPA=2,1;RU=C;ReadPosRankSum=1.762;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:15,23:49:99:1:1.00:339,0
gi|50953765|ref|NC_002755.2| 14192 . CG C 352.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-0.924;DP=41;FS=36.845;MLEAC=1;MLEAF=1.00;MQ=59.04;MQ0=0;MQRankSum=-1.143;QD=8.61;RPA=4,3;RU=G;ReadPosRankSum=-0.830;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:14,23:41:99:1:1.00:392,0
gi|50953765|ref|NC_002755.2| 15273 . AG A 107.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-0.823;DP=77;FS=146.402;MLEAC=1;MLEAF=1.00;MQ=59.82;MQ0=0;MQRankSum=0.076;QD=1.40;RPA=3,2;RU=G;ReadPosRankSum=0.823;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:35,39:76:99:1:1.00:147,0
gi|50953765|ref|NC_002755.2| 15571 . AC A 76.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=1.057;DP=50;FS=37.078;MLEAC=1;MLEAF=1.00;MQ=59.63;MQ0=0;MQRankSum=-0.449;QD=1.54;RPA=2,1;RU=C;ReadPosRankSum=1.591;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:22,21:50:99:1:1.00:116,0
gi|50953765|ref|NC_002755.2| 15647 . CG C 89.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-3.493;DP=46;FS=10.328;MLEAC=1;MLEAF=1.00;MQ=59.58;MQ0=0;MQRankSum=-0.068;QD=1.96;RPA=3,2;RU=G;ReadPosRankSum=2.841;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:18,22:46:99:1:1.00:129,0
gi|50953765|ref|NC_002755.2| 17609 . C G 875 . AC=1;AF=1.00;AN=1;BaseQRankSum=1.288;DP=39;Dels=0.00;FS=0.000;HaplotypeScore=33.2023;MLEAC=1;MLEAF=1.00;MQ=60.00;MQ0=0;MQRankSum=1.555;QD=22.44;ReadPosRankSum=-1.466 GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:1,38:39:99:1:1.00:905,0
gi|50953765|ref|NC_002755.2| 18844 . GC G 49.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=1.550;DP=35;FS=67.160;MLEAC=1;MLEAF=1.00;MQ=59.42;MQ0=0;MQRankSum=-1.822;QD=1.43;RPA=4,3;RU=C;ReadPosRankSum=-0.136;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:12,13:35:89:1:1.00:89,0
gi|50953765|ref|NC_002755.2| 18890 . CG C 47.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=0.433;DP=27;FS=44.730;MLEAC=1;MLEAF=1.00;MQ=59.86;MQ0=0;MQRankSum=0.029;QD=1.78;RPA=3,2;RU=G;ReadPosRankSum=1.299;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:12,12:27:87:1:1.00:87,0
gi|50953765|ref|NC_002755.2| 19260 . AG A 66.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-1.899;DP=24;FS=33.331;MLEAC=1;MLEAF=1.00;MQ=59.92;MQ0=0;MQRankSum=1.823;QD=2.79;RPA=2,1;RU=G;ReadPosRankSum=-0.152;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:9,11:24:99:1:1.00:106,0
gi|50953765|ref|NC_002755.2| 19342 . AG A 162.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-0.153;DP=38;FS=28.024;MLEAC=1;MLEAF=1.00;MQ=58.61;MQ0=0;MQRankSum=-0.576;QD=4.29;RPA=3,2;RU=G;ReadPosRankSum=1.458;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:13,19:38:99:1:1.00:202,0
gi|50953765|ref|NC_002755.2| 22351 . G A 614 . AC=1;AF=1.00;AN=1;DP=29;Dels=0.00;FS=0.000;HaplotypeScore=3.3416;MLEAC=1;MLEAF=1.00;MQ=59.40;MQ0=0;QD=21.17 GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:0,29:29:99:1:1.00:644,0
gi|50953765|ref|NC_002755.2| 22858 . GC G 385.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-0.517;DP=44;FS=77.067;MLEAC=1;MLEAF=1.00;MQ=60.00;MQ0=0;MQRankSum=0.986;QD=8.77;RPA=3,2;RU=C;ReadPosRankSum=0.548;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:14,23:44:99:1:1.00:425,0
gi|50953765|ref|NC_002755.2| 24291 . AC A 40.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-0.229;DP=36;FS=25.732;MLEAC=1;MLEAF=1.00;MQ=58.96;MQ0=0;MQRankSum=0.811;QD=1.14;RPA=3,2;RU=C;ReadPosRankSum=2.557;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:14,16:35:80:1:1.00:80,0
gi|50953765|ref|NC_002755.2| 24554 . GC G 286.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-0.667;DP=18;FS=2.187;MLEAC=1;MLEAF=1.00;MQ=56.45;MQ0=0;MQRankSum=0.061;QD=15.94;RPA=3,2;RU=C;ReadPosRankSum=0.182;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:4,12:18:99:1:1.00:326,0
gi|50953765|ref|NC_002755.2| 24792 . TG T 226.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-1.775;DP=20;FS=11.553;MLEAC=1;MLEAF=1.00;MQ=58.44;MQ0=0;MQRankSum=-0.254;QD=11.35;RPA=4,3;RU=G;ReadPosRankSum=1.099;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:3,10:20:99:1:1.00:266,0
gi|50953765|ref|NC_002755.2| 25567 . GC G 59.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=1.107;DP=24;FS=33.858;MLEAC=1;MLEAF=1.00;MQ=59.72;MQ0=0;MQRankSum=0.048;QD=2.50;RPA=4,3;RU=C;ReadPosRankSum=1.203;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:8,9:24:99:1:1.00:99,0
gi|50953765|ref|NC_002755.2| 26566 . CG C 132.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-0.915;DP=29;FS=13.132;MLEAC=1;MLEAF=1.00;MQ=60.00;MQ0=0;MQRankSum=-0.471;QD=4.59;RPA=4,3;RU=G;ReadPosRankSum=1.248;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:10,15:29:99:1:1.00:172,0
gi|50953765|ref|NC_002755.2| 30131 . CG C 407.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=0.935;DP=35;FS=14.606;MLEAC=1;MLEAF=1.00;MQ=60.00;MQ0=0;MQRankSum=-1.230;QD=11.66;RPA=2,1;RU=G;ReadPosRankSum=2.066;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:9,19:30:99:1:1.00:447,0
gi|50953765|ref|NC_002755.2| 30500 . T C 882 . AC=1;AF=1.00;AN=1;DP=31;Dels=0.03;FS=0.000;HaplotypeScore=4.8259;MLEAC=1;MLEAF=1.00;MQ=59.65;MQ0=0;QD=28.45 GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:0,30:30:99:1:1.00:912,0
gi|50953765|ref|NC_002755.2| 30974 . GC G 132.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-1.336;DP=32;FS=36.755;MLEAC=1;MLEAF=1.00;MQ=58.06;MQ0=0;MQRankSum=1.736;QD=4.16;RPA=3,2;RU=C;ReadPosRankSum=0.735;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:9,13:32:99:1:1.00:172,0
gi|50953765|ref|NC_002755.2| 31870 . TG T 100.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-0.986;DP=33;FS=47.544;MLEAC=1;MLEAF=1.00;MQ=60.00;MQ0=0;MQRankSum=0.363;QD=3.06;RPA=4,3;RU=G;ReadPosRankSum=-0.259;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:11,15:33:99:1:1.00:140,0
gi|50953765|ref|NC_002755.2| 31979 . C G 938 . AC=1;AF=1.00;AN=1;DP=37;Dels=0.00;FS=0.000;HaplotypeScore=5.4934;MLEAC=1;MLEAF=1.00;MQ=59.36;MQ0=0;QD=25.35 GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:0,37:37:99:1:1.00:968,0
gi|50953765|ref|NC_002755.2| 32682 . GC G 993.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=1.348;DP=42;FS=0.000;MLEAC=1;MLEAF=1.00;MQ=59.55;MQ0=0;MQRankSum=0.835;QD=23.67;RPA=5,4;RU=C;ReadPosRankSum=-0.449;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:1,33:42:99:1:1.00:1033,0
gi|50953765|ref|NC_002755.2| 34472 . CG C 66.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=2.736;DP=26;FS=31.527;MLEAC=1;MLEAF=1.00;MQ=60.00;MQ0=0;MQRankSum=-0.165;QD=2.58;RPA=3,2;RU=G;ReadPosRankSum=0.231;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:10,12:26:99:1:1.00:106,0
gi|50953765|ref|NC_002755.2| 35847 . CG C 189.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=0.581;DP=45;FS=41.475;MLEAC=1;MLEAF=1.00;MQ=58.98;MQ0=0;MQRankSum=0.751;QD=4.22;RPA=3,2;RU=G;ReadPosRankSum=1.629;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:17,22:45:99:1:1.00:229,0
gi|50953765|ref|NC_002755.2| 36233 . AC A 215.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=0.751;DP=10;FS=6.532;MLEAC=1;MLEAF=1.00;MQ=58.31;MQ0=0;MQRankSum=-0.751;QD=21.60;RPA=3,2;RU=C;ReadPosRankSum=-0.751;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:1,8:10:99:1:1.00:255,0
gi|50953765|ref|NC_002755.2| 37870 . TG T 138.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=0.440;DP=49;FS=49.878;MLEAC=1;MLEAF=1.00;MQ=58.61;MQ0=0;MQRankSum=-1.321;QD=2.84;RPA=4,3;RU=G;ReadPosRankSum=-0.029;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:17,21:49:99:1:1.00:178,0
gi|50953765|ref|NC_002755.2| 37985 . CG C 531.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-1.030;DP=42;FS=68.866;MLEAC=1;MLEAF=1.00;MQ=59.62;MQ0=0;MQRankSum=-0.859;QD=12.67;RPA=4,3;RU=G;ReadPosRankSum=0.481;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:11,25:42:99:1:1.00:571,0
gi|50953765|ref|NC_002755.2| 39010 . CG C 94.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-0.317;DP=24;FS=8.022;MLEAC=1;MLEAF=1.00;MQ=55.37;MQ0=0;MQRankSum=-1.347;QD=3.96;RPA=2,1;RU=G;ReadPosRankSum=0.238;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:7,13:24:99:1:1.00:134,0
gi|50953765|ref|NC_002755.2| 39106 . TG T 30.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-1.588;DP=19;FS=30.584;MLEAC=1;MLEAF=1.00;MQ=58.65;MQ0=0;MQRankSum=0.741;QD=1.63;RPA=3,2;RU=G;ReadPosRankSum=2.117;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:7,9:19:70:1:1.00:70,0
gi|50953765|ref|NC_002755.2| 39850 . GC G 340.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=2.329;DP=44;FS=25.006;MLEAC=1;MLEAF=1.00;MQ=59.96;MQ0=0;MQRankSum=-0.687;QD=7.75;RPA=3,2;RU=C;ReadPosRankSum=2.628;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:15,23:44:99:1:1.00:380,0
gi|50953765|ref|NC_002755.2| 40422 . A C 677 . AC=1;AF=1.00;AN=1;DP=31;Dels=0.03;FS=0.000;HaplotypeScore=2.9313;MLEAC=1;MLEAF=1.00;MQ=59.60;MQ0=0;QD=21.84 GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:0,29:30:99:1:1.00:707,0
gi|50953765|ref|NC_002755.2| 40815 . AG A 85.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=1.473;DP=23;FS=5.021;MLEAC=1;MLEAF=1.00;MQ=60.00;MQ0=0;MQRankSum=0.412;QD=3.74;RPA=3,2;RU=G;ReadPosRankSum=-1.473;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:6,9:23:99:1:1.00:125,0
gi|50953765|ref|NC_002755.2| 40878 . GC G 365.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-0.520;DP=24;FS=12.155;MLEAC=1;MLEAF=1.00;MQ=57.81;MQ0=0;MQRankSum=-0.236;QD=15.25;RPA=4,3;RU=C;ReadPosRankSum=0.236;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:4,16:24:99:1:1.00:405,0
gi|50953765|ref|NC_002755.2| 40891 . GC G 307.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-1.837;DP=28;FS=11.823;MLEAC=1;MLEAF=1.00;MQ=58.13;MQ0=0;MQRankSum=0.167;QD=11.00;RPA=4,3;RU=C;ReadPosRankSum=1.236;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:7,16:27:99:1:1.00:347,0
gi|50953765|ref|NC_002755.2| 40929 . GC G 314.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-1.037;DP=37;FS=31.951;MLEAC=1;MLEAF=1.00;MQ=58.57;MQ0=0;MQRankSum=0.699;QD=8.51;RPA=5,4;RU=C;ReadPosRankSum=2.053;STR GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:12,18:37:99:1:1.00:354,0
gi|50953765|ref|NC_002755.2| 40956 . GC G 61.97 . AC=1;AF=1.00;AN=1;BaseQRankSum=-0.954;DP=37;FS=21.093;MLEAC=1;MLEA
42. SEQUENCING
CONSIDERATIONS
• What depth of coverage do I need?
• 50x-100x to facilitate SNP calling
• Don’t multiplex too much!
• Should I sequence multiple isolates from a patient?
• Useful for chronic/latent infections
• Can I send multiple outbreaks for sequencing?
• LIMS check
• Should I generate one long-read scaffold?
• Can finish genomes this way
44. M Y U S U A L P I P E L I N E
• Read QC with FASTQC
• Map against reference with BWAmem
• Call SNVs with samtools pileup
• Output a VCF file with SNVs only - no indels
• Custom Python script to filter out SNVs common to all
sequenced isolates and format remainder as a table
• Remove all SNVs within 50bp of another
• High coverage dataset makes SNV calling based on
qual score thresholds easy
• Manually inspect each SNV using a BAM viewer tool
46. M Y U S U A L P I P E L I N E
• Read QC with FASTQC
• Map against reference with BWAmem
• Call SNVs with samtools pileup
• Output a VCF file with SNVs only - no indels
• Custom Python script to filter out SNVs common to all
sequenced isolates and format remainder as a table
• Remove all SNVs within 50bp of another
• High coverage dataset makes SNV calling based on
qual score or other thresholds easy
• Manually inspect each SNV using a BAM viewer tool
53. O T H E R C O N S I D E R AT I O N S
• Are you seeing the expected number of SNVs?
• Is there over-representation of SNVs in annotated
repetitive genes? These may be false.
• You may be sequencing one population or many - do
you see heterogeneity at any positions?
• Indels may also act as markers of transmission but are
harder to reliably call, especially on certain NGS
platforms
54. L I N K I N G VA R I AT I O N
T O T R A N S M I S S I O N
L E S S O N 2
65. Genomic data provides a higher resolution
view of a cluster, but SNVs alone do not often
suggest obvious person-to-person transmission
66. D E T E R M I N I N G T H E
O R D E R O F
T R A N S M I S S I O N
• Date of symptom onset
• Date of diagnosis
• Date put on treatment
• Infectiousness
• Hospitalizations
• Duration of infectious
period
• SOCIAL CONTACTS!
67. W I T H I N - H O S T D I V E R S I T Y
L E S S O N 3
68. ~90-BED SHELTER
& MEAL CENTRE
FRIENDSHIP CENTRE
DROP-IN CENTRE
PUB
DAY LABOUR AGENCY
SHELTER outbreak
79. C O N S I D E R T H E P O S S I B I L I T Y
• Infection with latency periods (e.g. TB), chronic infection
(e.g. H. pylori), asymptomatic carraige (e.g. Staph)
• In TB, expect diversity in patients who were
undiagnosed for a long time, immunosuppressed, non-
compliant with treatment
• Consider the material that was sequenced - would it
capture this diversity? Could you sequence serial
isolates from a patient?
80. P U T T I N G I T T O G E T H E R
L E S S O N 4
87. RECAP #2
• A GOOD ANALYSIS BEGINS BEFORE THE SEQUENCER
• DON’T REINVENT THE BIOINFORMATICS WHEELS
• LOOK AT YOUR DATA. LOOK AT IT SOME MORE. NOW
LOOK AGAIN. NO SERIOUSLY, KEEP LOOKING.
• REMEMBER EVOLUTION AND YOUR ORGANISM’S
BIOLOGY
• WITHIN-HOST DIVERSITY CAN COMPLICATE
RECONSTRUCTIONS - CONSIDER THE POSSIBILITY
• USE ALL THE KNOWN EPIDEMIOLOGY YOU HAVE
88. J E N N I F E R . G A R D Y @ B C C D C . C A
T W I T T E R . C O M / J E N N I F E R G A R D Y