SlideShare une entreprise Scribd logo
1  sur  53
QIIME Workshop

   Get started by opening:
http://bit.ly/mbe-qiime2012
       and read up at:
       www.qiime.org
        Greg Caporaso
   gregcaporaso@gmail.com
www.qiime.org
Extract DNA and amplify
   marker gene with
   barcoded primers            Pool amplicons and sequence



                                             RefSeq 1
 >GCACCTGAGGACAGGCATGAGGAA…
 >GCACCTGAGGACAGGGGAGGAGGA…                  RefSeq 2

 >TCACATGAACCTAGGCAGGACGAA…                  RefSeq 3
                                             RefSeq 4
 >CTACCGGAGGACAGGCATGAGGAT…
 >TCACATGAACCTAGGCAGGAGGAA…                  RefSeq 5
                                             RefSeq 6
 >GCACCTGAGGACACGCAGGACGAC…
 >CTACCGGAGGACAGGCAGGAGGAA…                  RefSeq 7
 >CTACCGGAGGACACACAGGAGGAA…                  RefSeq 8
                                             RefSeq 9
 >GAACCTTCACATAGGCAGGAGGAT…
 >TCACATGAACCTAGGGGCAAGGAA…                  RefSeq 10

 >GCACCTGAGGACAGGCAGGAGGAA…
                                  Assign millions of         Compute UniFrac distances
  Assign reads to samples     sequences from thousands         and compare samples
                                 of samples to OTUs
>5000 samples in analysis pipeline
   •   Stream and lake water
   •   Marine water, sediment and reef
   •   Soil (forest, farm, peatland, tundra, …)
   •   Air
   •   Coalbed
   •   Arctic ice core
   •   Insect-associated
   •   Human-associated (gut, mouth, skin)



http://www.earthmicrobiome.org/
>5000 samples analyzed
to date
Alpha diversity by environment type
Where do we look for new diversity?




* As determined by no hit to Greengenes database.
Sequencing output
                                                     Metadata
        (454, Illumina, Sanger)

  fastq, fasta, qual, or sff/trace files
                                                    mapping file              www.QIIME.org
                                                                                                               Phylogenetic Tree
                                                                         OTU (or other sample by
                         Pre-processing                                    observation) table
                                                                                                           Evolutionary relationship
            e.g., remove primer(s), demultiplex,
                                                                                                               between OTUs
                        quality filter



  Denoise 454 Data                    Database Submission               α-diversity and rarefaction        β-diversity and rarefaction
 PyroNoise, Denoiser                                                        e.g., Phylogenetic               e.g., Weighted and
                                          (In development)
                                                                            Diversity, Chao1,             unweighted UniFrac, Bray-
                                                                            Observed Species                   Curtis, Jaccard

      Pick OTUs and representative sequences
    Reference based                    De novo                                          Interactive visualizations
    BLAST, UCLUST,              e.g., UCLUST, CD-HIT,
      USEARCH                   MOTHUR, USEARCH
                                                                  e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction
                                                                     plots, network visualization, jackknifed hierarchical clustering.

        Assign taxonomy               Align sequences
                                      e.g., PyNAST,
                                                                         Legend
          BLAST, RDP                                                                                    Currently supported for
                                   INFERNAL, MUSCLE,                       Currently supported for
           Classifier                                                                                      general sample by
                                         MAFFT                             marker-gene data only
                                                                                                           observation data
                                                                             (i.e., 'upstream' step)     (i.e., 'downstream' step)
     Build 'OTU table'               Build phylogenetic tree
i.e., sample by observation          e.g., FastTree, RAxML,                 Required step or input       Optional step or input
           matrix                            ClearCut
http://analytics.google.com
Running QIIME
       Native installation on OS X
       or Linux (laptops through
       16,416-core compute
       cluster*)

       Ubuntu Linux Virtual Box

       Amazon Web Services
       (EC2)

         * http://ncar.janus.rc.colorado.edu/
IPython notebook
Moving Pictures of the Human
             Microbiome
• Two subjects sampled daily, one for six
  months, one for 18 months
• Four body sites: tongue, palm of left
  hand, palm of right hand, and gut (via fecal
  swabs).
Moving Pictures of the Human
             Microbiome
• Investigate the relative temporal variability of
  body sites.
• Is there a temporal core microbiome?
• Technical points: do we observe the same
  conclusions on 454 and Illumina data?
Moving Pictures of the Human
      Microbiome: QIIME tutorial
• A small subset of the full data set to facilitate
  short run time: ~0.1% of the full sequence
  collection.
• Sequenced across six Illumina GAIIx
  lanes, with a subset of the samples also
  sequenced on 454.
• The online tutorial contains details on all of
  the steps: go back and read that text.
Key QIIME files

• Mapping file: per sample meta-data, user-
  defined
• Input sequence file
• OTU table: sample x OTU matrix, central to
  downstream analyses [now in biom format]
• Parameters file: defines analyses, for use
  with the ‘workflow’ scripts (optional)
Mapping file
Mapping file: always run
             check_id_map.py




 = required field
Sequences file
>[sampleID_seqID] description

Barcodes have been removed!!
>[sampleID_seqID] description

Barcodes have been removed!!
Sequences file: can be user-provided, or
    generated by split_libraries.py
OTU table
     (classic format)
sample x OTU matrix
OTU table
                  (classic format)
    sample x OTU matrix




OTU identifiers
OTU table
                     (classic format)
     sample x OTU matrix




Sample identifiers
OTU table
                    (classic format)
        sample x OTU matrix




Optional per OTU taxonomic information
OTU tables are now in biological observation
             matrix (.biom) format
          (QIIME 1.4.0-dev and later)
            Google: “biom format”


         http://biom-format.org


                See convert_biom.py
for translating between classic and biom otu tables
sample x observation contingency matrix
   Samples

OTUs

       Observation
       counts
sample x observation contingency matrix

       Samples

Taxa

         Observation
         counts
sample x observation contingency matrix
     Metagenomes

Functions

            Observation
            counts
sample x observation contingency matrix
        Samples                          Genomes                       Samples
   OTUs                           Ortholog                      Taxa
                                   groups
            Marker                           Comparative                 Marker
            gene (e.g., 16S)                 genomics                    gene (e.g., 16S)
            surveys                                                      surveys



                                             Samples
     Metagenomes

Functions                          Metabolites

            Metagenomics

            Metatranscriptomics
                                                 Metabolomics
                                                                            ...
The Biological Observation Matrix (BIOM) Format
  or: How I Learned To Stop Worrying and
  Love the Ome-ome

    JSON-based format for
    representing arbitrary
    sample x observation
    contingency tables with
    optional metadata




McDonald et al., GigaScience (2012).
                                       http://www.biom-format.org
Comparative genomic (B) and metagenome
analysis (C) with QIIME
Working with OTU tables
• single_rarefaction.py: even sampling (very important if you
  have different numbers of seqs/sample!)
• filter_otus_from_otu_table.py
• filter_samples_from_otu_table.py
• per_library_stats.py
OTU picking: terminology
OTU picking
• De Novo
  – Reads are clustered based on similarity to one
    another.
• Reference-based
  – Closed reference: any reads which don’t hit a
    reference sequence are discarded
  – Open reference: any reads which don’t hit a
    reference sequence are clustered de novo
De novo OTU picking
• Pros
  – All reads are clustered
• Cons
  – Not parallelizable
  – OTUs may be defined by erroneous reads
Closed-reference OTU picking
• Pros
  – Built-in quality filter
  – Easily parallelizable
  – OTUs are defined by high-quality, trusted
    sequences
• Cons
  – Reads that don’t hit reference dataset are
    excluded, so you can never observe new OTUs
Percentage of reads
that do not hit the
reference
collection, by
environment type.
Open-reference OTU picking
• Pros
  – All reads are clustered
  – Partially parallelizable
• Cons
  – Only partially parallelizable
  – Mix of high quality sequences defining OTUs
    (i.e., the database sequences) and possible low
    quality sequences defining OTUs (i.e., the
    sequencing reads)
Considerations in analysis
Variation in sampling depth is an
important consideration




                                                                                         Human skin, colored
                                                                                         by individual, at 500
                                                                                         sequence/sample

Image/analysis credit: Justin Kuczynski

Data reference:
Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R.
Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
Variation in sampling depth is an
important consideration




                                                                                       Human skin, colored by
                                                                                       sampling depth, at
                                                                                       either 50 or 500
                                                                                       sequences/sample
Image/analysis credit: Justin Kuczynski

Data reference:
Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R.
Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
Variation in sampling depth is an
important consideration




                                                                                       Human skin, colored by
                                                                                       sampling depth, at
                                                                                       either 50 (blue) or 500
                                                                                       (red) sequences/sample
Image/analysis credit: Justin Kuczynski

Data reference:
Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R.
Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
How deep is deep enough?
It depends on the question…
  – Differences between community types: not many
    sequences.
  – Rare biosphere: more (but be careful about
    sequencing noise!)
How deep is deep enough?

   100 sequences/sample                                    10 sequences/sample                              1 sequence/sample
PC2 (8 .4 %)



                                             PC2 (1 1 %)
                                                                                              PC2 (1 7 %)




                                                                                                                     PC1 (2 4 %)


                                                                 PC1 (1 3 %)
                              PC1 (8 .6 %)
                                                                                                                                   PC3 (9 .7 %)

                                                                               PC3 (8 .1 %)



               PC3 (6 .2 %)




                                              Direct sequencing of the human microbiome readily reveals community differences.
                                                                                      J Kuczynski et al. Genome Biology (2011).
Figure 1
  (A)              (B)




                  10
            100




                   1

           (C)
Can we get accurate taxonomic
 assignment from short reads?
Extra slides
Elizabeth K. Costello, et al. Science 2009.
Bacterial Community Variation in Human Body Habitats Across Space and Time.
This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a
copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to
Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Feel free to use or modify these slides, but please credit me by placing the following attribution
information where you feel that it makes sense: Greg Caporaso, www.caporaso.us.

Contenu connexe

Tendances

Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
EBI
 
JulieKlein_Bosc2012
JulieKlein_Bosc2012JulieKlein_Bosc2012
JulieKlein_Bosc2012
KUPKB_Team
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
Monica Munoz-Torres
 

Tendances (20)

2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
Ensembl Browser Workshop
Ensembl Browser WorkshopEnsembl Browser Workshop
Ensembl Browser Workshop
 
Introduction to METAGENOTE
Introduction to METAGENOTE Introduction to METAGENOTE
Introduction to METAGENOTE
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
 
vectorQC: 'A pipeline for assembling and annotation of vectors'
vectorQC: 'A pipeline for assembling and annotation of vectors'vectorQC: 'A pipeline for assembling and annotation of vectors'
vectorQC: 'A pipeline for assembling and annotation of vectors'
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
 
2016 02 23_biological_databases_part1
2016 02 23_biological_databases_part12016 02 23_biological_databases_part1
2016 02 23_biological_databases_part1
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 
JulieKlein_Bosc2012
JulieKlein_Bosc2012JulieKlein_Bosc2012
JulieKlein_Bosc2012
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 
Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/Bioconductor
 
Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...
 
NCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesNCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners Slides
 
Whole exome sequencing(wes)
Whole exome sequencing(wes)Whole exome sequencing(wes)
Whole exome sequencing(wes)
 
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
 

Similaire à Caporaso sloan qiime_workshop_slides_18_oct2012

Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Reece Hart
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
Thomas Keane
 
MAGE-TAB introduction: Alvis Brazma (EBI)
MAGE-TAB introduction: Alvis Brazma (EBI)MAGE-TAB introduction: Alvis Brazma (EBI)
MAGE-TAB introduction: Alvis Brazma (EBI)
niranabey
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Klaas Vandepoele
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
sesejun
 

Similaire à Caporaso sloan qiime_workshop_slides_18_oct2012 (20)

NCBI
NCBINCBI
NCBI
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic mining
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
LAS - Project Overview
LAS - Project OverviewLAS - Project Overview
LAS - Project Overview
 
Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR Genomics
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
 
Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
MAGE-TAB introduction: Alvis Brazma (EBI)
MAGE-TAB introduction: Alvis Brazma (EBI)MAGE-TAB introduction: Alvis Brazma (EBI)
MAGE-TAB introduction: Alvis Brazma (EBI)
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 

Dernier

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 

Dernier (20)

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 

Caporaso sloan qiime_workshop_slides_18_oct2012

  • 1. QIIME Workshop Get started by opening: http://bit.ly/mbe-qiime2012 and read up at: www.qiime.org Greg Caporaso gregcaporaso@gmail.com
  • 2. www.qiime.org Extract DNA and amplify marker gene with barcoded primers Pool amplicons and sequence RefSeq 1 >GCACCTGAGGACAGGCATGAGGAA… >GCACCTGAGGACAGGGGAGGAGGA… RefSeq 2 >TCACATGAACCTAGGCAGGACGAA… RefSeq 3 RefSeq 4 >CTACCGGAGGACAGGCATGAGGAT… >TCACATGAACCTAGGCAGGAGGAA… RefSeq 5 RefSeq 6 >GCACCTGAGGACACGCAGGACGAC… >CTACCGGAGGACAGGCAGGAGGAA… RefSeq 7 >CTACCGGAGGACACACAGGAGGAA… RefSeq 8 RefSeq 9 >GAACCTTCACATAGGCAGGAGGAT… >TCACATGAACCTAGGGGCAAGGAA… RefSeq 10 >GCACCTGAGGACAGGCAGGAGGAA… Assign millions of Compute UniFrac distances Assign reads to samples sequences from thousands and compare samples of samples to OTUs
  • 3. >5000 samples in analysis pipeline • Stream and lake water • Marine water, sediment and reef • Soil (forest, farm, peatland, tundra, …) • Air • Coalbed • Arctic ice core • Insect-associated • Human-associated (gut, mouth, skin) http://www.earthmicrobiome.org/
  • 5. Alpha diversity by environment type
  • 6. Where do we look for new diversity? * As determined by no hit to Greengenes database.
  • 7. Sequencing output Metadata (454, Illumina, Sanger) fastq, fasta, qual, or sff/trace files mapping file www.QIIME.org Phylogenetic Tree OTU (or other sample by Pre-processing observation) table Evolutionary relationship e.g., remove primer(s), demultiplex, between OTUs quality filter Denoise 454 Data Database Submission α-diversity and rarefaction β-diversity and rarefaction PyroNoise, Denoiser e.g., Phylogenetic e.g., Weighted and (In development) Diversity, Chao1, unweighted UniFrac, Bray- Observed Species Curtis, Jaccard Pick OTUs and representative sequences Reference based De novo Interactive visualizations BLAST, UCLUST, e.g., UCLUST, CD-HIT, USEARCH MOTHUR, USEARCH e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering. Assign taxonomy Align sequences e.g., PyNAST, Legend BLAST, RDP Currently supported for INFERNAL, MUSCLE, Currently supported for Classifier general sample by MAFFT marker-gene data only observation data (i.e., 'upstream' step) (i.e., 'downstream' step) Build 'OTU table' Build phylogenetic tree i.e., sample by observation e.g., FastTree, RAxML, Required step or input Optional step or input matrix ClearCut
  • 9. Running QIIME Native installation on OS X or Linux (laptops through 16,416-core compute cluster*) Ubuntu Linux Virtual Box Amazon Web Services (EC2) * http://ncar.janus.rc.colorado.edu/
  • 11. Moving Pictures of the Human Microbiome • Two subjects sampled daily, one for six months, one for 18 months • Four body sites: tongue, palm of left hand, palm of right hand, and gut (via fecal swabs).
  • 12. Moving Pictures of the Human Microbiome • Investigate the relative temporal variability of body sites. • Is there a temporal core microbiome? • Technical points: do we observe the same conclusions on 454 and Illumina data?
  • 13. Moving Pictures of the Human Microbiome: QIIME tutorial • A small subset of the full data set to facilitate short run time: ~0.1% of the full sequence collection. • Sequenced across six Illumina GAIIx lanes, with a subset of the samples also sequenced on 454. • The online tutorial contains details on all of the steps: go back and read that text.
  • 14. Key QIIME files • Mapping file: per sample meta-data, user- defined • Input sequence file • OTU table: sample x OTU matrix, central to downstream analyses [now in biom format] • Parameters file: defines analyses, for use with the ‘workflow’ scripts (optional)
  • 16. Mapping file: always run check_id_map.py = required field
  • 20. Sequences file: can be user-provided, or generated by split_libraries.py
  • 21. OTU table (classic format) sample x OTU matrix
  • 22. OTU table (classic format) sample x OTU matrix OTU identifiers
  • 23. OTU table (classic format) sample x OTU matrix Sample identifiers
  • 24. OTU table (classic format) sample x OTU matrix Optional per OTU taxonomic information
  • 25. OTU tables are now in biological observation matrix (.biom) format (QIIME 1.4.0-dev and later) Google: “biom format” http://biom-format.org See convert_biom.py for translating between classic and biom otu tables
  • 26. sample x observation contingency matrix Samples OTUs Observation counts
  • 27. sample x observation contingency matrix Samples Taxa Observation counts
  • 28. sample x observation contingency matrix Metagenomes Functions Observation counts
  • 29. sample x observation contingency matrix Samples Genomes Samples OTUs Ortholog Taxa groups Marker Comparative Marker gene (e.g., 16S) genomics gene (e.g., 16S) surveys surveys Samples Metagenomes Functions Metabolites Metagenomics Metatranscriptomics Metabolomics ...
  • 30. The Biological Observation Matrix (BIOM) Format or: How I Learned To Stop Worrying and Love the Ome-ome JSON-based format for representing arbitrary sample x observation contingency tables with optional metadata McDonald et al., GigaScience (2012). http://www.biom-format.org
  • 31. Comparative genomic (B) and metagenome analysis (C) with QIIME
  • 32. Working with OTU tables • single_rarefaction.py: even sampling (very important if you have different numbers of seqs/sample!) • filter_otus_from_otu_table.py • filter_samples_from_otu_table.py • per_library_stats.py
  • 34. OTU picking • De Novo – Reads are clustered based on similarity to one another. • Reference-based – Closed reference: any reads which don’t hit a reference sequence are discarded – Open reference: any reads which don’t hit a reference sequence are clustered de novo
  • 35. De novo OTU picking • Pros – All reads are clustered • Cons – Not parallelizable – OTUs may be defined by erroneous reads
  • 36. Closed-reference OTU picking • Pros – Built-in quality filter – Easily parallelizable – OTUs are defined by high-quality, trusted sequences • Cons – Reads that don’t hit reference dataset are excluded, so you can never observe new OTUs
  • 37. Percentage of reads that do not hit the reference collection, by environment type.
  • 38. Open-reference OTU picking • Pros – All reads are clustered – Partially parallelizable • Cons – Only partially parallelizable – Mix of high quality sequences defining OTUs (i.e., the database sequences) and possible low quality sequences defining OTUs (i.e., the sequencing reads)
  • 40. Variation in sampling depth is an important consideration Human skin, colored by individual, at 500 sequence/sample Image/analysis credit: Justin Kuczynski Data reference: Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R. Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
  • 41. Variation in sampling depth is an important consideration Human skin, colored by sampling depth, at either 50 or 500 sequences/sample Image/analysis credit: Justin Kuczynski Data reference: Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R. Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
  • 42. Variation in sampling depth is an important consideration Human skin, colored by sampling depth, at either 50 (blue) or 500 (red) sequences/sample Image/analysis credit: Justin Kuczynski Data reference: Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R. Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
  • 43. How deep is deep enough? It depends on the question… – Differences between community types: not many sequences. – Rare biosphere: more (but be careful about sequencing noise!)
  • 44. How deep is deep enough? 100 sequences/sample 10 sequences/sample 1 sequence/sample PC2 (8 .4 %) PC2 (1 1 %) PC2 (1 7 %) PC1 (2 4 %) PC1 (1 3 %) PC1 (8 .6 %) PC3 (9 .7 %) PC3 (8 .1 %) PC3 (6 .2 %) Direct sequencing of the human microbiome readily reveals community differences. J Kuczynski et al. Genome Biology (2011).
  • 45. Figure 1 (A) (B) 10 100 1 (C)
  • 46. Can we get accurate taxonomic assignment from short reads?
  • 47.
  • 48.
  • 50. Elizabeth K. Costello, et al. Science 2009. Bacterial Community Variation in Human Body Habitats Across Space and Time.
  • 51.
  • 52.
  • 53. This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Feel free to use or modify these slides, but please credit me by placing the following attribution information where you feel that it makes sense: Greg Caporaso, www.caporaso.us.