Intro to metagenomic binning

•

4 j'aime•2,381 vues

Meren's pirate presentation at the STAMPS course to talk about the basic concepts most binning algorithms use to bin contigs into genome bins: sequence composition, and differential coverage.

Sciences

These slides are from Meren’s pirate
presentation at the STAMPS 2017 course.
The purpose of this was to provide a very
broad introduction to the two essential
concepts behind automatic identification of
microbial genome bins in metagenomic
data.
If you have questions, let us know:
http://merenlab.org/people/

Recovering genomes from
metagenomes using short
sequencing reads can be
challenging. But its importance
pushes us to try harder.
Here are the major steps of the
assembly-based, genome-
resolved metagenomics:

Assembly and binning suffers
from many challenges, and we
often miss parts of genomes even
when environments we study are
not very complex.
… but we do much worse when
population abundances are not
even, which usually is the case.

Regardless, there are
tremendous benefits when we
can get population genomes
from metagenomes.
‘Binning’ is the step during
which we organize those
contigs in our assembly results
into population genomes.

But how do we do that when
we know almost nothing about
the origins of our contigs at the
end of the assembly?

There are two aspects of data
we commonly exploit to
identify contigs in our
assemblies that likely belong
to the same population
genome in the environment

The first one is the ‘sequence
composition’, which requires
no prior understanding of the
the likely origins of contigs
(and it is fascinating why this
works for multiple reasons)

Fine, but how do we even
compute k-mer frequencies?
The following example does it
for multiple sequences by
assuming k=2

Now this information can be
used to organize sequences
based on their compositional
similarities

Now you know what ‘di-
nucleotide composition space’
is (i.e. k-mer frequencies for
k=2), following papers will
probably make much more
sense

When k=4, we call it the
tetranucleotide frequency
(which you may have heard
many times before as it is the
de facto standard for
characterizing sequence
composition)

So, the somewhat-preserved
sequence signatures in
genomes is the first aspect of
data we use to organize
contigs into meaningful bins

But in most cases sequence
signatures are not enough to
resolve things accurately.

The second aspect of data that
improves the resolving power
of binning algorithms when
multiple samples are available
is the ‘differential coverage’ of
contigs across metagenomes.

But what is ‘coverage’?
Coverage is the average
number of short reads
mapping to each nucleotide
position throughout a contig:

Yeah. So if that is coverage, we
could use it the following way:

And it would have worked even
when we don’t know anything
about the contigs, or
distribution patterns of
individual populations they
belong to:

Modern algorithms often use
these two aspects of the data
to organize contigs into
genome bins automatically

Genomic signatures
Differential Coverage

Believe it or not, recovering
population genomes from
metagenomes is not a new
thing…

But luckily, there are many
algorithms to standardize the
way we can do automatic
binning.

That being said, you should
think twice before putting your
absolute trust in any genome
bin you get from automatic
binning tools.

Metagenomic data is complex, and
things will often work less than
optimal.
Here is a blog post you may find
relevant if you are interested in
exploring how to refine metagenomic
bins:
http://merenlab.org/2017/05/11/anvi-refine-by-veronika/

Contenu connexe

Tendances

BITs: Genome browsers and interpretation of gene lists.BITS

Introduction to sequence alignmentKubuldinho

Bioinformatics and functional genomicsAisha Kalsoom

BIOLOGICAL SEQUENCE DATABASES nadeem akhter

Tools and database of NCBISantosh Kumar Sahoo

Web based servers and softwares for genome analysisDr. Naveen Gaurav srivastava

RNA-seq AnalysisCOST action BM1006

MetagenomicsChinthu V Saji

Metagenomicnasim arshadi

Transcriptome analysisRamaJumwal2

Introduction to 16S Analysis with NGS - BMR GenomicsAndrea Telatin

Metagenomics and it’s applicationsSham Sadiq

Whole genome sequencing of bacteria & analysisdrelamuruganvet

STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSHEETHUMOLKS

Introduction to next generation sequencingVHIR Vall d’Hebron Institut de Recerca

Synthetic biology: Concepts and ApplicationsUSTC, Hefei, PRC

Protein databasesbansalaman80

Intro bioinfoVinitha Nair

Express sequence tagsDhananjay Desai

cDNA LibrarySamsuDeen12

Tendances (20)

BITs: Genome browsers and interpretation of gene lists.

Introduction to sequence alignment

Bioinformatics and functional genomics

BIOLOGICAL SEQUENCE DATABASES

Tools and database of NCBI

Web based servers and softwares for genome analysis

RNA-seq Analysis

Metagenomics

Metagenomic

Transcriptome analysis

Introduction to 16S Analysis with NGS - BMR Genomics

Metagenomics and it’s applications

Whole genome sequencing of bacteria & analysis

STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS

Introduction to next generation sequencing

Synthetic biology: Concepts and Applications

Protein databases

Intro bioinfo

Express sequence tags

cDNA Library

Similaire à Intro to metagenomic binning

Genome assembly: the art of trying to make one big thing from millions of ver...Keith Bradnam

2013 duke-talkc.titus.brown

U Florida / Gainesville talk, apr 13 2011c.titus.brown

2012 oslo-talkc.titus.brown

Genome Assembly: the art of trying to make one BIG thing from millions of ver...Keith Bradnam

Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...Anthony Parziale

Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...Abdelrahman Hosny

2014 uclc.titus.brown

2014 marine-microbes-grcc.titus.brown

CROP GENOME SEQUENCINGSABYASACHISAHU10

HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLERcscpconf

A Review On Genetic Algorithm And Its ApplicationsKaren Gomez

Molecular Biology Software Linksuniversity of education,Lahore

2014 naplesc.titus.brown

Bioinformatics_1_ChenS.pptxxRowlet

Apolo Taller en BIOS Monica Munoz-Torres

2014 sage-talkc.titus.brown

An introduction to Web Apollo for the Biomphalaria glabatra research community.Monica Munoz-Torres

metagenomicsanditsapplications-161222180924.pdfVisheshMishra20

2012 hpcuserforum talkc.titus.brown

Similaire à Intro to metagenomic binning (20)

Genome assembly: the art of trying to make one big thing from millions of ver...

2013 duke-talk

U Florida / Gainesville talk, apr 13 2011

2012 oslo-talk

Genome Assembly: the art of trying to make one BIG thing from millions of ver...

Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...

Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...

2014 ucl

2014 marine-microbes-grc

CROP GENOME SEQUENCING

HPC-MAQ : A PARALLEL SHORT-READ REFERENCE ASSEMBLER

A Review On Genetic Algorithm And Its Applications

Molecular Biology Software Links

2014 naples

Bioinformatics_1_ChenS.pptx

Apolo Taller en BIOS

2014 sage-talk

An introduction to Web Apollo for the Biomphalaria glabatra research community.

metagenomicsanditsapplications-161222180924.pdf

2012 hpcuserforum talk

Dernier

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra

Biological Classification BioHack (3).pdfmuntazimhurra

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani

Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav

Botany 4th semester series (krishna).pdfSumit Kumar yadav

Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha

Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha

Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6

fundamental of entomology all in one topics of entomologyDrAnita Sharma

VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P

Chemistry 4th semester series (krishna).pdfSumit Kumar yadav

Forensic Biology & Its biological significance.pdfrohankumarsinghrore1

Formation of low mass protostars and their circumstellar disksSérgio Sacani

Animal Communication- Auditory and Visual.pptxUmerFayaz5

Isotopic evidence of long-lived volcanism on IoSérgio Sacani

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari

Zoology 4th semester series (krishna).pdfSumit Kumar yadav

Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav

Dernier (20)

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis

Biological Classification BioHack (3).pdf

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...

Botany krishna series 2nd semester Only Mcq type questions

Botany 4th semester series (krishna).pdf

Spermiogenesis or Spermateleosis or metamorphosis of spermatid

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000

Physiochemical properties of nanomaterials and its nanotoxicity.pptx

Biopesticide (2).pptx .This slides helps to know the different types of biop...

fundamental of entomology all in one topics of entomology

VIRUSES structure and classification ppt by Dr.Prince C P

Chemistry 4th semester series (krishna).pdf

Forensic Biology & Its biological significance.pdf

Formation of low mass protostars and their circumstellar disks

Animal Communication- Auditory and Visual.pptx

Isotopic evidence of long-lived volcanism on Io

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...

Zoology 4th semester series (krishna).pdf

Botany 4th semester file By Sumit Kumar yadav.pdf

Intro to metagenomic binning

1. These slides are from Meren’s pirate presentation at the STAMPS 2017 course. The purpose of this was to provide a very broad introduction to the two essential concepts behind automatic identification of microbial genome bins in metagenomic data. If you have questions, let us know: http://merenlab.org/people/

2. Recovering genomes from metagenomes using short sequencing reads can be challenging. But its importance pushes us to try harder. Here are the major steps of the assembly-based, genome- resolved metagenomics:

4. Assembly and binning suffers from many challenges, and we often miss parts of genomes even when environments we study are not very complex. … but we do much worse when population abundances are not even, which usually is the case.

6. Regardless, there are tremendous benefits when we can get population genomes from metagenomes. ‘Binning’ is the step during which we organize those contigs in our assembly results into population genomes.

8. But how do we do that when we know almost nothing about the origins of our contigs at the end of the assembly?

9. ?

10. There are two aspects of data we commonly exploit to identify contigs in our assemblies that likely belong to the same population genome in the environment

11. The first one is the ‘sequence composition’, which requires no prior understanding of the the likely origins of contigs (and it is fascinating why this works for multiple reasons)

12.

13.

14.

15. Fine, but how do we even compute k-mer frequencies? The following example does it for multiple sequences by assuming k=2

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28. Now this information can be used to organize sequences based on their compositional similarities

29.

30.

31. Now you know what ‘di- nucleotide composition space’ is (i.e. k-mer frequencies for k=2), following papers will probably make much more sense

32.

33.

34. When k=4, we call it the tetranucleotide frequency (which you may have heard many times before as it is the de facto standard for characterizing sequence composition)

35.

36.

37. So, the somewhat-preserved sequence signatures in genomes is the first aspect of data we use to organize contigs into meaningful bins

38. Genomic signatures

39. But in most cases sequence signatures are not enough to resolve things accurately.

40. The second aspect of data that improves the resolving power of binning algorithms when multiple samples are available is the ‘differential coverage’ of contigs across metagenomes.

41. But what is ‘coverage’? Coverage is the average number of short reads mapping to each nucleotide position throughout a contig:

42.

43. Yeah. So if that is coverage, we could use it the following way:

44.

45. And it would have worked even when we don’t know anything about the contigs, or distribution patterns of individual populations they belong to:

46.

47. Modern algorithms often use these two aspects of the data to organize contigs into genome bins automatically

48. Genomic signatures Differential Coverage

49. Believe it or not, recovering population genomes from metagenomes is not a new thing…

50.

51. But luckily, there are many algorithms to standardize the way we can do automatic binning.

52.

53. That being said, you should think twice before putting your absolute trust in any genome bin you get from automatic binning tools.

54. Metagenomic data is complex, and things will often work less than optimal. Here is a blog post you may find relevant if you are interested in exploring how to refine metagenomic bins: http://merenlab.org/2017/05/11/anvi-refine-by-veronika/

Intro to metagenomic binning

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Intro to metagenomic binning

Similaire à Intro to metagenomic binning (20)

Dernier

Dernier (20)

Intro to metagenomic binning