Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
[13.09.19] 16S workshop introduction
1. A practical introduction
to handling 16S data
Mads Albertsen
Internal workshop 2013
CENTER FOR MICROBIAL COMMUNITIES
2. • Introduction
• Case story: GAO Reactors
• Generating OTU tables (Hands on)
• Analyzing data in Excel (Hands on)
• Analyzing data in R (Hands on)
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Agenda
4. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Who - when, where and why?
http://phil.cdc.gov/phil/details.asp?pid=2226http://en.wikipedia.org/wiki/File:EBPR_FISH_Floc.jpg P. Larsen 2012
Accumulibacter Competibacter Bacillus anthracis
5. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
The affinities of all the beings of the same class have
sometimes been represented by a great tree... The
green and budding twigs may represent existing
species; and those produced during former years
may represent the long succession of extinct species.
C. Darwin, 1872
http://tolweb.org
Nothing in biology makes sense,
except in the light of evolution.
T. Dobzhansky, 1973
Taking advantage of evolution
6. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Why do we use the 16S gene?
Ribosomes are universal
rRNA = Structural RNA
http://www.rna.icmb.utexas.edu/SAE/2B/ConsStruc/Diagrams/cons.16.b.Bacteria.pdf
7. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Why do we use the 16S gene?
http://www.rna.icmb.utexas.edu/SAE/2B/ConsStruc/Diagrams/cons.16.b.Bacteria.pdf
8F8F Universal primer
8F
8F
8. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Why do we use the 16S gene?
http://www.rna.icmb.utexas.edu/SAE/2B/ConsStruc/Diagrams/cons.16.b.Bacteria.pdf
Ashelford et al. AEM. 2005;71:7724-7736
• Advantages:
• Universal gene (No horizontal gene transfer)
• Conserved regions
• Variable regions
• Great databases and alignments
• Problems:
• Variable copy number
• No universal (unbiased) primers
• (Not directly correlated with activity)
• (Lack of functional information)
9. CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
Sampling SequencingExtraction Sample prep Bioinformatics
The focus of the workshop is bioinformatics.
However, the preceding steps influences how we
handle the data.
10. Sampling SequencingExtraction Sample prep Bioinformatics
• Standardisation, standardization, standardizasion..!
• Use biological replicates and evaluate your variation…!
• Design a good experiment with realistic expectations to
the outcome (Most studies fail here!!!)
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
AAU activated sludge standard @ midasfieldguide.org
11. Sampling SequencingExtraction Sample prep Bioinformatics
eDNA removal
Input (mg)
Bead beating
Storage
Intensity (ms-1)
Duration(s) 4 6
400
160
80
40
20
1 2 4 9 22
• Fresh
• 24 h @ 4°C
• 24 h @ 20 °C
PMA650 W 10 min
+ N+ CH3
NH2
N3
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
AAU activated sludge standard @ midasfieldguide.org
12. Sampling SequencingExtraction Sample prep Bioinformatics
Bp
Meanfrequencyof
mostcommonresidue
in50bpwindow
0 500 1000 1500
1.0
0.8
0.6 V1
V2 V3
V4 V5
V6
V7
V8
V9
V1.3 V4
V3.4
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
AAU activated sludge standard @ midasfieldguide.org
Ashelford et al. AEM. 2005;71:7724-7736
13. Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
PCR with modified 16S primers
5’-AATGATACGGCGACCACCGAGATCTACAC GTACGTACG GT AGAGTTTGATCCTGGCTCAG-3’
5’-CAAGCAGAAGACGGCATACGAGAT TCCCTTGTCTCC ACGTACGTAC CCG ATTACCGCGGCTGCTGG-3’
Illumina adapter Barcode Pad linker 534R
Illumina adapter Pad linker 27F
////
Target region
//
1.
2.
3.
AAU activated sludge standard @ midasfieldguide.org
PCR Cycle
14. Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
Mardis, 2008 (PMID 18576944)
≈ 500 bp target amplicon
15. Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
Read 1: 300 bp
Read 2: 300 bp
Read 1
Read 2
Barcode
≈ 500 bp target amplicon
After Sequencing:
16. Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
How many sequences are needed? It depends on your question!
(although 50.000 raw sequences per sample is usually fine)
AAU raw kit and chemical costs (DKK) Cost Cost v2
DNA extraction 105 70a
Library preparation 40 40
Sequencing (min 100k reads / sample) 190b 70c
Total 335 180
a Kits discounted
b 50 samples per run
c 150 samples per run (can run up to 300)
17. Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
Merge Cluster
3
11
3
1
OTU Count
Assign taxonomy (Compare to database)
3 Accumulibacter
11 Unkown
3 Competibacter
1 Bacillus anthracis
OTU Count OTU table
18. Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
Merge Cluster
2 1
3 8
3 0
1 0
OTU A B
Assign taxonomy (Compare to database)
A
A
A
A
A
A
A
A
A
B
B
B
B
B
B
B
B
B
Barcode
2 1 Accumulibacter
3 8 Unkown
3 0 Competibacter
1 0 Bacillus anthracis
OTU A B
OTU table
19. Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
Sequence errors, chimera’s and weird stuff..
The chance of a perfect read as
function of the read length
Chimera’s
20. Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
Merge Cluster
3
11
3
OTU Count
Assign taxonomy (Compare to database)
3 Accumulibacter
11 Unkown
3 Competibacter
OTU Count OTU table
Removing unique sequences makes the
subsequent steps 10-100x faster and removes
the majority of errors and chimera’s
Dependent on sequencing depth and
sample complexity! Be careful!
21. Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
AAU workflow
16SAMP-145
16SAMP-146
16SAMP-147
16SAMP-148
16SAMP-149
16SAMP-150
16S.V13.workflow.sh
Find sample ID’s on Google drive
OTU table (+ R version)
Plain text file
2 1 Accumulibacter
3 8 Unkown
3 0 Competibacter
OTU A B
22. Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
AAU workflow
What 16S.V13.workflow.sh does:
1. Find and unpack your samples
2. Optional subsampling
3. Remove potential phiX contamination (bowtie2)
4. Merge read 1 and read 2 (flash)
5. Remove reads outside length criteria
6. Optional removal of unique reads and subsampling to even depth
7. Format reads for QIIME
8. Cluster reads to OTUs (Uclust, QIIME)
9. Assign taxonomy (RDP classifier, QIIME + database: MiDAS, Greengnes or Silva)
10. Generate OTU table (QIIME)