SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
These slides are from Meren’s pirate
presentation at the STAMPS 2017 course.
The purpose of this was to provide a very
broad introduction to the two essential
concepts behind automatic identification of
microbial genome bins in metagenomic
data.
If you have questions, let us know:
http://merenlab.org/people/
Recovering genomes from
metagenomes using short
sequencing reads can be
challenging. But its importance
pushes us to try harder.
Here are the major steps of the
assembly-based, genome-
resolved metagenomics:
Assembly and binning suffers
from many challenges, and we
often miss parts of genomes even
when environments we study are
not very complex.
… but we do much worse when
population abundances are not
even, which usually is the case.
Regardless, there are
tremendous benefits when we
can get population genomes
from metagenomes.
‘Binning’ is the step during
which we organize those
contigs in our assembly results
into population genomes.
But how do we do that when
we know almost nothing about
the origins of our contigs at the
end of the assembly?
?
There are two aspects of data
we commonly exploit to
identify contigs in our
assemblies that likely belong
to the same population
genome in the environment
The first one is the ‘sequence
composition’, which requires
no prior understanding of the
the likely origins of contigs
(and it is fascinating why this
works for multiple reasons)
Fine, but how do we even
compute k-mer frequencies?
The following example does it
for multiple sequences by
assuming k=2
Now this information can be
used to organize sequences
based on their compositional
similarities
Now you know what ‘di-
nucleotide composition space’
is (i.e. k-mer frequencies for
k=2), following papers will
probably make much more
sense
When k=4, we call it the
tetranucleotide frequency
(which you may have heard
many times before as it is the
de facto standard for
characterizing sequence
composition)
So, the somewhat-preserved
sequence signatures in
genomes is the first aspect of
data we use to organize
contigs into meaningful bins
Genomic signatures
But in most cases sequence
signatures are not enough to
resolve things accurately.
The second aspect of data that
improves the resolving power
of binning algorithms when
multiple samples are available
is the ‘differential coverage’ of
contigs across metagenomes.
But what is ‘coverage’?
Coverage is the average
number of short reads
mapping to each nucleotide
position throughout a contig:
Yeah. So if that is coverage, we
could use it the following way:
And it would have worked even
when we don’t know anything
about the contigs, or
distribution patterns of
individual populations they
belong to:
Modern algorithms often use
these two aspects of the data
to organize contigs into
genome bins automatically
Genomic signatures
Differential Coverage
Believe it or not, recovering
population genomes from
metagenomes is not a new
thing…
But luckily, there are many
algorithms to standardize the
way we can do automatic
binning.
That being said, you should
think twice before putting your
absolute trust in any genome
bin you get from automatic
binning tools.
Metagenomic data is complex, and
things will often work less than
optimal.
Here is a blog post you may find
relevant if you are interested in
exploring how to refine metagenomic
bins:
http://merenlab.org/2017/05/11/anvi-refine-by-veronika/

Contenu connexe

Tendances

BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITS
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignmentKubuldinho
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomicsAisha Kalsoom
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES nadeem akhter
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysisDr. Naveen Gaurav srivastava
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysisRamaJumwal2
 
Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsAndrea Telatin
 
Metagenomics and it’s applications
Metagenomics and it’s applicationsMetagenomics and it’s applications
Metagenomics and it’s applicationsSham Sadiq
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSHEETHUMOLKS
 
Synthetic biology: Concepts and Applications
Synthetic biology: Concepts and ApplicationsSynthetic biology: Concepts and Applications
Synthetic biology: Concepts and ApplicationsUSTC, Hefei, PRC
 

Tendances (20)

BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Metagenomic
MetagenomicMetagenomic
Metagenomic
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
 
Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR Genomics
 
Metagenomics and it’s applications
Metagenomics and it’s applicationsMetagenomics and it’s applications
Metagenomics and it’s applications
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
Synthetic biology: Concepts and Applications
Synthetic biology: Concepts and ApplicationsSynthetic biology: Concepts and Applications
Synthetic biology: Concepts and Applications
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
cDNA Library
cDNA LibrarycDNA Library
cDNA Library
 

Similaire à Intro to metagenomic binning

Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Keith Bradnam
 
U Florida / Gainesville talk, apr 13 2011
U Florida / Gainesville  talk, apr 13 2011U Florida / Gainesville  talk, apr 13 2011
U Florida / Gainesville talk, apr 13 2011c.titus.brown
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Keith Bradnam
 
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...Anthony Parziale
 
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...Abdelrahman Hosny
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grcc.titus.brown
 
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERHPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERcscpconf
 
A Review On Genetic Algorithm And Its Applications
A Review On Genetic Algorithm And Its ApplicationsA Review On Genetic Algorithm And Its Applications
A Review On Genetic Algorithm And Its ApplicationsKaren Gomez
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxxRowlet
 
An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.Monica Munoz-Torres
 
metagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdfmetagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdfVisheshMishra20
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talkc.titus.brown
 

Similaire à Intro to metagenomic binning (20)

Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...
 
2013 duke-talk
2013 duke-talk2013 duke-talk
2013 duke-talk
 
U Florida / Gainesville talk, apr 13 2011
U Florida / Gainesville  talk, apr 13 2011U Florida / Gainesville  talk, apr 13 2011
U Florida / Gainesville talk, apr 13 2011
 
2012 oslo-talk
2012 oslo-talk2012 oslo-talk
2012 oslo-talk
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
 
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...
 
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 
2014 marine-microbes-grc
2014 marine-microbes-grc2014 marine-microbes-grc
2014 marine-microbes-grc
 
CROP GENOME SEQUENCING
CROP GENOME SEQUENCINGCROP GENOME SEQUENCING
CROP GENOME SEQUENCING
 
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERHPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER
 
A Review On Genetic Algorithm And Its Applications
A Review On Genetic Algorithm And Its ApplicationsA Review On Genetic Algorithm And Its Applications
A Review On Genetic Algorithm And Its Applications
 
Molecular Biology Software Links
Molecular Biology Software LinksMolecular Biology Software Links
Molecular Biology Software Links
 
2014 naples
2014 naples2014 naples
2014 naples
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
Apolo Taller en BIOS
Apolo Taller en BIOS Apolo Taller en BIOS
Apolo Taller en BIOS
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 
An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.An introduction to Web Apollo for the Biomphalaria glabatra research community.
An introduction to Web Apollo for the Biomphalaria glabatra research community.
 
metagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdfmetagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdf
 
2012 hpcuserforum talk
2012 hpcuserforum talk2012 hpcuserforum talk
2012 hpcuserforum talk
 

Dernier

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 

Dernier (20)

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 

Intro to metagenomic binning

  • 1. These slides are from Meren’s pirate presentation at the STAMPS 2017 course. The purpose of this was to provide a very broad introduction to the two essential concepts behind automatic identification of microbial genome bins in metagenomic data. If you have questions, let us know: http://merenlab.org/people/
  • 2. Recovering genomes from metagenomes using short sequencing reads can be challenging. But its importance pushes us to try harder. Here are the major steps of the assembly-based, genome- resolved metagenomics:
  • 3.
  • 4. Assembly and binning suffers from many challenges, and we often miss parts of genomes even when environments we study are not very complex. … but we do much worse when population abundances are not even, which usually is the case.
  • 5.
  • 6. Regardless, there are tremendous benefits when we can get population genomes from metagenomes. ‘Binning’ is the step during which we organize those contigs in our assembly results into population genomes.
  • 7.
  • 8. But how do we do that when we know almost nothing about the origins of our contigs at the end of the assembly?
  • 9. ?
  • 10. There are two aspects of data we commonly exploit to identify contigs in our assemblies that likely belong to the same population genome in the environment
  • 11. The first one is the ‘sequence composition’, which requires no prior understanding of the the likely origins of contigs (and it is fascinating why this works for multiple reasons)
  • 12.
  • 13.
  • 14.
  • 15. Fine, but how do we even compute k-mer frequencies? The following example does it for multiple sequences by assuming k=2
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28. Now this information can be used to organize sequences based on their compositional similarities
  • 29.
  • 30.
  • 31. Now you know what ‘di- nucleotide composition space’ is (i.e. k-mer frequencies for k=2), following papers will probably make much more sense
  • 32.
  • 33.
  • 34. When k=4, we call it the tetranucleotide frequency (which you may have heard many times before as it is the de facto standard for characterizing sequence composition)
  • 35.
  • 36.
  • 37. So, the somewhat-preserved sequence signatures in genomes is the first aspect of data we use to organize contigs into meaningful bins
  • 39. But in most cases sequence signatures are not enough to resolve things accurately.
  • 40. The second aspect of data that improves the resolving power of binning algorithms when multiple samples are available is the ‘differential coverage’ of contigs across metagenomes.
  • 41. But what is ‘coverage’? Coverage is the average number of short reads mapping to each nucleotide position throughout a contig:
  • 42.
  • 43. Yeah. So if that is coverage, we could use it the following way:
  • 44.
  • 45. And it would have worked even when we don’t know anything about the contigs, or distribution patterns of individual populations they belong to:
  • 46.
  • 47. Modern algorithms often use these two aspects of the data to organize contigs into genome bins automatically
  • 49. Believe it or not, recovering population genomes from metagenomes is not a new thing…
  • 50.
  • 51. But luckily, there are many algorithms to standardize the way we can do automatic binning.
  • 52.
  • 53. That being said, you should think twice before putting your absolute trust in any genome bin you get from automatic binning tools.
  • 54. Metagenomic data is complex, and things will often work less than optimal. Here is a blog post you may find relevant if you are interested in exploring how to refine metagenomic bins: http://merenlab.org/2017/05/11/anvi-refine-by-veronika/