SlideShare une entreprise Scribd logo
1  sur  24
16S Classifier: a tool for fast and accurate
classification of 16S rRNA sequences
Ashok K. Sharma
Research Scholar
Metagenomics and Systems Biology Laboratory
Indian Institute of Science Education and Research, Bhopal
What
How
Who
Species Diversity
Overview
Arcobacter
Paludibacter
Shewanella
Pseudomonas
Unknown
Species Richness
Metagenome
 Microbial diversity of soil and other
extreme environments are still limited
 Only 1-3% of soil microbes are culturable
 Estimated in 1g of soil = 4000- 5000
different bacterial “genomic units”
 Bacteria and fungi plays an important
role in biogeochemical cycles, and
specially in human health
Methods of studying microbial diversity
Biochemical
• Plate count
• Community level
physiological profiling
• Fatty acid methyl ester
analysis: as fatty acids make
up constant proportion of
cell biomass
Molecular
• G+C content
• Nucleic acid re-association
and hybridization
• DNA microarray
• DNA cloning and
sequencing-based methods
Metagenomic reads vs 16S rRNA for microbial
diversity identification
Metagenome
DNA Isolation
Fragmentation
of DNA
Metagenomic Reads
Amplification of
16S rRNA
16S rRNA from multiple species
Microbial diversity
Tools: Kraken, PhylopathiaS,
Phymm, phymmBL,
Metabin
Microbial diversity
16S rRNA – a “gold standard” for microbial
molecular identification
• Universal
• Highly conserved
• Long enough (~1500 bp) to provide significant discrimination
between many species
• Structural information can guide alignment and phylogenetic
reconstruction
• Many species now represented in the database
16S rRNA gene
sequencing
Earlier  By sequencing whole gene
Now  By sequencing short variable regions
Limitations:
• Insufficient and
underestimated diversity
16S rRNA gene
16S rRNA: to understand microbial diversity
Community composition shifts over time as revealed
by 16S data
Software and tools available for the analysis of
16S rRNA data
• CloVR-16S
• QIIME – a Python-based workflow package, allowing for sequence
processing and phylogenetic analysis using different methods including
the phylogenetic distance metric UniFrac, UCLUST, PyNAST and the RDP
Bayesian classifier;
• Mothur – a C++-based software package for 16S analysis;
• Metastats and custom R scripts used to generate additional statistical and
graphical evaluations.
• Most recent: 16S Classifier – Random forest based standalone package
specially for short hypervariable regions
Material and methods
• Green genes database
• Random forest
• Emboss
• RDP Classifier
• BLAST
Input Data for Training
In 16S Classifier, we made separate models for different
Hypervariable regions of 16S rRNA gene
 Took Greengenes 16S rRNA database
 Extracted individual HVRs as well as combination of 2 or more commonly
used HVRs using commonly used Universal primers with the help of in-
house perl scripts and EMBOSS software suit
 Discarded HVRs where primer coverage was lesser than 50% of all
sequences
 Clustered out highly similar sequences using CD-hit at threshold 1.
Table 1. Summary of the number of HVR sequences which were
used for the training and testing of RF*.
Parameters optimizations
 Labeled each sequence with its taxonomic information to the lowest
known level except species
 Used V3 region for optimization of parameters
 Calculated 2-mer, 3-mer, 4-mer, 5-mer, 6-mer nucleotide
frequencies and tried them as feature inputs
 Tried various mtry values at each k-mer to get the least OOB error
value
 Got best results at k = 4. So utilized 4-mer nucleotide frequencies
for building models at ntree = 1000.
Figure 1. Optimization of parameters using hypervariable region V3
Variables selection
ntree optimization
OOB Error for Different HVRs
Input data for testing
 First test dataset was obtained by randomly extracting ~10% of the
sequences which we had clustered out using CD-hit earlier. 1%
random mutations were inserted in these sequences to mimic real
life sequencing errors
 Second dataset was obtained from real metagenomics sequences
available from SRA dataset of NCBI
 Performance of 16S Classifier was compared with that of RDP
Classifier in terms of accuracy as well as time taken for computation.
Performance Of Different RF Models On
Different Hvrs And Complete 16S rRna Gene
Performance Of RF Models On First Test
Dataset
Comparison Of 16S Classifier With RDP
Classifier On Real Datasets
Advantages of 16S Classifier
• Extremely fast
• High sensitivity as well as specificity
• Consistent across various HVRs
• Easy availability
• Easy to deploy and use
How to use
• User can download zip file of a particular hypervariable region or complete 16S,
which is freely available at
http://metagenomics.iiserb.ac.in/16Sclassifier/download.html
• Extract the zipped file which contains a model file (*.Rdata), a script file (*.sh) and
an exe file (16Sclassifier.exe).
• Other dependencies:
User has to install R from the following link http://cran.r-project.org/
intall Randomforest
## Command line usage ##
./16sclassifier.exe <queryfile> <modelname>
The query file should be in Fasta format and the model name could be v2, v3, v4, v5,
v6, v7, v8, v23, v34, v35, v45, v56, v67, v78 and complete.
16S classifier
16S classifier

Contenu connexe

Tendances (20)

Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Metagenomics and it’s applications
Metagenomics and it’s applicationsMetagenomics and it’s applications
Metagenomics and it’s applications
 
Analisis 16S dan 18S rRNA.ppt
Analisis 16S dan 18S rRNA.pptAnalisis 16S dan 18S rRNA.ppt
Analisis 16S dan 18S rRNA.ppt
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expression
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
CRISPR
CRISPRCRISPR
CRISPR
 
Genomics
GenomicsGenomics
Genomics
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Metagenomics analysis
Metagenomics  analysisMetagenomics  analysis
Metagenomics analysis
 
Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
 
Quantitative real time pcr
Quantitative real time pcrQuantitative real time pcr
Quantitative real time pcr
 
Pi rna-ppt, sasmita behura
Pi rna-ppt, sasmita behuraPi rna-ppt, sasmita behura
Pi rna-ppt, sasmita behura
 
Whole genome sequencing
Whole genome sequencingWhole genome sequencing
Whole genome sequencing
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Shotgun and clone contig method
Shotgun and clone contig methodShotgun and clone contig method
Shotgun and clone contig method
 
Primer designing
Primer designingPrimer designing
Primer designing
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discovery
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 

En vedette

Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsAndrea Telatin
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisJosh Neufeld
 
[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introduction[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introductionMads Albertsen
 
Silva ribosomal RNA database
Silva ribosomal RNA databaseSilva ribosomal RNA database
Silva ribosomal RNA databasecfloare
 
Microbiota y la respuesta immune - Dra Romina Goldszmid
Microbiota y la respuesta immune - Dra Romina GoldszmidMicrobiota y la respuesta immune - Dra Romina Goldszmid
Microbiota y la respuesta immune - Dra Romina GoldszmidWebee by Formar
 
Ueda2016 the role of gut microbiota in the pathogenesis of obesity &amp; tdm2...
Ueda2016 the role of gut microbiota in the pathogenesis of obesity &amp; tdm2...Ueda2016 the role of gut microbiota in the pathogenesis of obesity &amp; tdm2...
Ueda2016 the role of gut microbiota in the pathogenesis of obesity &amp; tdm2...ueda2015
 
Identification methods for oral microbes
Identification methods for oral microbesIdentification methods for oral microbes
Identification methods for oral microbesDr. Ali Yaldrum
 
The Human Oral Microflora
The Human Oral MicrofloraThe Human Oral Microflora
The Human Oral MicrofloraDr. Ali Yaldrum
 

En vedette (9)

Thesis
ThesisThesis
Thesis
 
Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR Genomics
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introduction[13.09.19] 16S workshop introduction
[13.09.19] 16S workshop introduction
 
Silva ribosomal RNA database
Silva ribosomal RNA databaseSilva ribosomal RNA database
Silva ribosomal RNA database
 
Microbiota y la respuesta immune - Dra Romina Goldszmid
Microbiota y la respuesta immune - Dra Romina GoldszmidMicrobiota y la respuesta immune - Dra Romina Goldszmid
Microbiota y la respuesta immune - Dra Romina Goldszmid
 
Ueda2016 the role of gut microbiota in the pathogenesis of obesity &amp; tdm2...
Ueda2016 the role of gut microbiota in the pathogenesis of obesity &amp; tdm2...Ueda2016 the role of gut microbiota in the pathogenesis of obesity &amp; tdm2...
Ueda2016 the role of gut microbiota in the pathogenesis of obesity &amp; tdm2...
 
Identification methods for oral microbes
Identification methods for oral microbesIdentification methods for oral microbes
Identification methods for oral microbes
 
The Human Oral Microflora
The Human Oral MicrofloraThe Human Oral Microflora
The Human Oral Microflora
 

Similaire à 16S classifier

NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...VHIR Vall d’Hebron Institut de Recerca
 
Quan Nguyen at #ICG12: Mammalian genomic regulatory regions predicted by util...
Quan Nguyen at #ICG12: Mammalian genomic regulatory regions predicted by util...Quan Nguyen at #ICG12: Mammalian genomic regulatory regions predicted by util...
Quan Nguyen at #ICG12: Mammalian genomic regulatory regions predicted by util...GigaScience, BGI Hong Kong
 
De novo RNA-seq for the study of ODAP synthesis pathway in Lathyrus sativus
De novo RNA-seq for the study of ODAP synthesis pathway in Lathyrus sativus De novo RNA-seq for the study of ODAP synthesis pathway in Lathyrus sativus
De novo RNA-seq for the study of ODAP synthesis pathway in Lathyrus sativus Iris Martínez-Rodero
 
Molecular techniques in food microbiology
Molecular  techniques in food microbiologyMolecular  techniques in food microbiology
Molecular techniques in food microbiologyNajiyaNaju1
 
16 s rRNA Gene Sequencing for Bacterial Identification
16 s rRNA Gene Sequencing for Bacterial Identification16 s rRNA Gene Sequencing for Bacterial Identification
16 s rRNA Gene Sequencing for Bacterial IdentificationSanam Parajuli
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle
 
An Introduction to Amplicon Sequencing of 16S18SITS Regions.pdf
An Introduction to Amplicon Sequencing of 16S18SITS Regions.pdfAn Introduction to Amplicon Sequencing of 16S18SITS Regions.pdf
An Introduction to Amplicon Sequencing of 16S18SITS Regions.pdfKikoGarcia13
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
Cpgr services brochure 14 may 2013 - v 16
Cpgr services brochure   14 may 2013 - v 16Cpgr services brochure   14 may 2013 - v 16
Cpgr services brochure 14 may 2013 - v 16Reinhard Hiller
 
Emergingroleo fmi rnainmedicalsciences
Emergingroleo fmi rnainmedicalsciencesEmergingroleo fmi rnainmedicalsciences
Emergingroleo fmi rnainmedicalscienceskarenbbs
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 
RNA lab 021215.pptx
RNA lab 021215.pptxRNA lab 021215.pptx
RNA lab 021215.pptxssuser395871
 
Applications of transcriptomice s in modern biotechnology 2
Applications of transcriptomice s in modern biotechnology 2Applications of transcriptomice s in modern biotechnology 2
Applications of transcriptomice s in modern biotechnology 2Pakeeza Rubab
 
RNA Biomarker Discovery in Exosomes and Liquid Biopsies by Sequencing and qPCR
RNA Biomarker Discovery in Exosomes and Liquid Biopsies by Sequencing and qPCRRNA Biomarker Discovery in Exosomes and Liquid Biopsies by Sequencing and qPCR
RNA Biomarker Discovery in Exosomes and Liquid Biopsies by Sequencing and qPCRWilliam Baird
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seqJyoti Singh
 
Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...
Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...
Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...Thermo Fisher Scientific
 
Ribosome Display Technology - Creative Biolabs
Ribosome Display Technology - Creative BiolabsRibosome Display Technology - Creative Biolabs
Ribosome Display Technology - Creative BiolabsCreative-Biolabs
 

Similaire à 16S classifier (20)

NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
 
Quan Nguyen at #ICG12: Mammalian genomic regulatory regions predicted by util...
Quan Nguyen at #ICG12: Mammalian genomic regulatory regions predicted by util...Quan Nguyen at #ICG12: Mammalian genomic regulatory regions predicted by util...
Quan Nguyen at #ICG12: Mammalian genomic regulatory regions predicted by util...
 
De novo RNA-seq for the study of ODAP synthesis pathway in Lathyrus sativus
De novo RNA-seq for the study of ODAP synthesis pathway in Lathyrus sativus De novo RNA-seq for the study of ODAP synthesis pathway in Lathyrus sativus
De novo RNA-seq for the study of ODAP synthesis pathway in Lathyrus sativus
 
Molecular techniques in food microbiology
Molecular  techniques in food microbiologyMolecular  techniques in food microbiology
Molecular techniques in food microbiology
 
16 s rRNA Gene Sequencing for Bacterial Identification
16 s rRNA Gene Sequencing for Bacterial Identification16 s rRNA Gene Sequencing for Bacterial Identification
16 s rRNA Gene Sequencing for Bacterial Identification
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
An Introduction to Amplicon Sequencing of 16S18SITS Regions.pdf
An Introduction to Amplicon Sequencing of 16S18SITS Regions.pdfAn Introduction to Amplicon Sequencing of 16S18SITS Regions.pdf
An Introduction to Amplicon Sequencing of 16S18SITS Regions.pdf
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
Cufflinks
CufflinksCufflinks
Cufflinks
 
Cpgr services brochure 14 may 2013 - v 16
Cpgr services brochure   14 may 2013 - v 16Cpgr services brochure   14 may 2013 - v 16
Cpgr services brochure 14 may 2013 - v 16
 
Emergingroleo fmi rnainmedicalsciences
Emergingroleo fmi rnainmedicalsciencesEmergingroleo fmi rnainmedicalsciences
Emergingroleo fmi rnainmedicalsciences
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
RNA lab 021215.pptx
RNA lab 021215.pptxRNA lab 021215.pptx
RNA lab 021215.pptx
 
EnrichR database
EnrichR databaseEnrichR database
EnrichR database
 
Applications of transcriptomice s in modern biotechnology 2
Applications of transcriptomice s in modern biotechnology 2Applications of transcriptomice s in modern biotechnology 2
Applications of transcriptomice s in modern biotechnology 2
 
RNA Biomarker Discovery in Exosomes and Liquid Biopsies by Sequencing and qPCR
RNA Biomarker Discovery in Exosomes and Liquid Biopsies by Sequencing and qPCRRNA Biomarker Discovery in Exosomes and Liquid Biopsies by Sequencing and qPCR
RNA Biomarker Discovery in Exosomes and Liquid Biopsies by Sequencing and qPCR
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...
Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...
Design and Evaluation of a 16S-based Integrated Solution to Study Bacterial D...
 
Ribosome Display Technology - Creative Biolabs
Ribosome Display Technology - Creative BiolabsRibosome Display Technology - Creative Biolabs
Ribosome Display Technology - Creative Biolabs
 

Dernier

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 

Dernier (20)

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 

16S classifier

  • 1. 16S Classifier: a tool for fast and accurate classification of 16S rRNA sequences Ashok K. Sharma Research Scholar Metagenomics and Systems Biology Laboratory Indian Institute of Science Education and Research, Bhopal
  • 2. What How Who Species Diversity Overview Arcobacter Paludibacter Shewanella Pseudomonas Unknown Species Richness Metagenome  Microbial diversity of soil and other extreme environments are still limited  Only 1-3% of soil microbes are culturable  Estimated in 1g of soil = 4000- 5000 different bacterial “genomic units”  Bacteria and fungi plays an important role in biogeochemical cycles, and specially in human health
  • 3. Methods of studying microbial diversity Biochemical • Plate count • Community level physiological profiling • Fatty acid methyl ester analysis: as fatty acids make up constant proportion of cell biomass Molecular • G+C content • Nucleic acid re-association and hybridization • DNA microarray • DNA cloning and sequencing-based methods
  • 4. Metagenomic reads vs 16S rRNA for microbial diversity identification Metagenome DNA Isolation Fragmentation of DNA Metagenomic Reads Amplification of 16S rRNA 16S rRNA from multiple species Microbial diversity Tools: Kraken, PhylopathiaS, Phymm, phymmBL, Metabin Microbial diversity
  • 5. 16S rRNA – a “gold standard” for microbial molecular identification • Universal • Highly conserved • Long enough (~1500 bp) to provide significant discrimination between many species • Structural information can guide alignment and phylogenetic reconstruction • Many species now represented in the database 16S rRNA gene sequencing Earlier  By sequencing whole gene Now  By sequencing short variable regions Limitations: • Insufficient and underestimated diversity
  • 7. 16S rRNA: to understand microbial diversity Community composition shifts over time as revealed by 16S data
  • 8. Software and tools available for the analysis of 16S rRNA data • CloVR-16S • QIIME – a Python-based workflow package, allowing for sequence processing and phylogenetic analysis using different methods including the phylogenetic distance metric UniFrac, UCLUST, PyNAST and the RDP Bayesian classifier; • Mothur – a C++-based software package for 16S analysis; • Metastats and custom R scripts used to generate additional statistical and graphical evaluations. • Most recent: 16S Classifier – Random forest based standalone package specially for short hypervariable regions
  • 9. Material and methods • Green genes database • Random forest • Emboss • RDP Classifier • BLAST
  • 10. Input Data for Training In 16S Classifier, we made separate models for different Hypervariable regions of 16S rRNA gene  Took Greengenes 16S rRNA database  Extracted individual HVRs as well as combination of 2 or more commonly used HVRs using commonly used Universal primers with the help of in- house perl scripts and EMBOSS software suit  Discarded HVRs where primer coverage was lesser than 50% of all sequences  Clustered out highly similar sequences using CD-hit at threshold 1.
  • 11. Table 1. Summary of the number of HVR sequences which were used for the training and testing of RF*.
  • 12. Parameters optimizations  Labeled each sequence with its taxonomic information to the lowest known level except species  Used V3 region for optimization of parameters  Calculated 2-mer, 3-mer, 4-mer, 5-mer, 6-mer nucleotide frequencies and tried them as feature inputs  Tried various mtry values at each k-mer to get the least OOB error value  Got best results at k = 4. So utilized 4-mer nucleotide frequencies for building models at ntree = 1000.
  • 13. Figure 1. Optimization of parameters using hypervariable region V3
  • 16. OOB Error for Different HVRs
  • 17. Input data for testing  First test dataset was obtained by randomly extracting ~10% of the sequences which we had clustered out using CD-hit earlier. 1% random mutations were inserted in these sequences to mimic real life sequencing errors  Second dataset was obtained from real metagenomics sequences available from SRA dataset of NCBI  Performance of 16S Classifier was compared with that of RDP Classifier in terms of accuracy as well as time taken for computation.
  • 18. Performance Of Different RF Models On Different Hvrs And Complete 16S rRna Gene
  • 19. Performance Of RF Models On First Test Dataset
  • 20. Comparison Of 16S Classifier With RDP Classifier On Real Datasets
  • 21. Advantages of 16S Classifier • Extremely fast • High sensitivity as well as specificity • Consistent across various HVRs • Easy availability • Easy to deploy and use
  • 22. How to use • User can download zip file of a particular hypervariable region or complete 16S, which is freely available at http://metagenomics.iiserb.ac.in/16Sclassifier/download.html • Extract the zipped file which contains a model file (*.Rdata), a script file (*.sh) and an exe file (16Sclassifier.exe). • Other dependencies: User has to install R from the following link http://cran.r-project.org/ intall Randomforest ## Command line usage ## ./16sclassifier.exe <queryfile> <modelname> The query file should be in Fasta format and the model name could be v2, v3, v4, v5, v6, v7, v8, v23, v34, v35, v45, v56, v67, v78 and complete.

Notes de l'éditeur

  1. Species diversity consists of: 1. Species richness, 2. Total number of species, and 3. Distribution of species
  2. Plate count is fast and cost effective but having disadvantage of not detection of unculturable microbes, bias towards fast growing, bias towards fungal species CLPP is fast, highly reproducible, inexpensive and generate large amount of data but having disadvantage of only represent culturable community, favour fast growing FAME: no culturing needed, directly extracted from soil, but having disadvantage of affecting by external factors.