SlideShare une entreprise Scribd logo
1  sur  53
QIIME Workshop

   Get started by opening:
http://bit.ly/mbe-qiime2012
       and read up at:
       www.qiime.org
        Greg Caporaso
   gregcaporaso@gmail.com
www.qiime.org
Extract DNA and amplify
   marker gene with
   barcoded primers            Pool amplicons and sequence



                                             RefSeq 1
 >GCACCTGAGGACAGGCATGAGGAA…
 >GCACCTGAGGACAGGGGAGGAGGA…                  RefSeq 2

 >TCACATGAACCTAGGCAGGACGAA…                  RefSeq 3
                                             RefSeq 4
 >CTACCGGAGGACAGGCATGAGGAT…
 >TCACATGAACCTAGGCAGGAGGAA…                  RefSeq 5
                                             RefSeq 6
 >GCACCTGAGGACACGCAGGACGAC…
 >CTACCGGAGGACAGGCAGGAGGAA…                  RefSeq 7
 >CTACCGGAGGACACACAGGAGGAA…                  RefSeq 8
                                             RefSeq 9
 >GAACCTTCACATAGGCAGGAGGAT…
 >TCACATGAACCTAGGGGCAAGGAA…                  RefSeq 10

 >GCACCTGAGGACAGGCAGGAGGAA…
                                  Assign millions of         Compute UniFrac distances
  Assign reads to samples     sequences from thousands         and compare samples
                                 of samples to OTUs
>5000 samples in analysis pipeline
   •   Stream and lake water
   •   Marine water, sediment and reef
   •   Soil (forest, farm, peatland, tundra, …)
   •   Air
   •   Coalbed
   •   Arctic ice core
   •   Insect-associated
   •   Human-associated (gut, mouth, skin)



http://www.earthmicrobiome.org/
>5000 samples analyzed
to date
Alpha diversity by environment type
Where do we look for new diversity?




* As determined by no hit to Greengenes database.
Sequencing output
                                                     Metadata
        (454, Illumina, Sanger)

  fastq, fasta, qual, or sff/trace files
                                                    mapping file              www.QIIME.org
                                                                                                               Phylogenetic Tree
                                                                         OTU (or other sample by
                         Pre-processing                                    observation) table
                                                                                                           Evolutionary relationship
            e.g., remove primer(s), demultiplex,
                                                                                                               between OTUs
                        quality filter



  Denoise 454 Data                    Database Submission               α-diversity and rarefaction        β-diversity and rarefaction
 PyroNoise, Denoiser                                                        e.g., Phylogenetic               e.g., Weighted and
                                          (In development)
                                                                            Diversity, Chao1,             unweighted UniFrac, Bray-
                                                                            Observed Species                   Curtis, Jaccard

      Pick OTUs and representative sequences
    Reference based                    De novo                                          Interactive visualizations
    BLAST, UCLUST,              e.g., UCLUST, CD-HIT,
      USEARCH                   MOTHUR, USEARCH
                                                                  e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction
                                                                     plots, network visualization, jackknifed hierarchical clustering.

        Assign taxonomy               Align sequences
                                      e.g., PyNAST,
                                                                         Legend
          BLAST, RDP                                                                                    Currently supported for
                                   INFERNAL, MUSCLE,                       Currently supported for
           Classifier                                                                                      general sample by
                                         MAFFT                             marker-gene data only
                                                                                                           observation data
                                                                             (i.e., 'upstream' step)     (i.e., 'downstream' step)
     Build 'OTU table'               Build phylogenetic tree
i.e., sample by observation          e.g., FastTree, RAxML,                 Required step or input       Optional step or input
           matrix                            ClearCut
http://analytics.google.com
Running QIIME
       Native installation on OS X
       or Linux (laptops through
       16,416-core compute
       cluster*)

       Ubuntu Linux Virtual Box

       Amazon Web Services
       (EC2)

         * http://ncar.janus.rc.colorado.edu/
IPython notebook
Moving Pictures of the Human
             Microbiome
• Two subjects sampled daily, one for six
  months, one for 18 months
• Four body sites: tongue, palm of left
  hand, palm of right hand, and gut (via fecal
  swabs).
Moving Pictures of the Human
             Microbiome
• Investigate the relative temporal variability of
  body sites.
• Is there a temporal core microbiome?
• Technical points: do we observe the same
  conclusions on 454 and Illumina data?
Moving Pictures of the Human
      Microbiome: QIIME tutorial
• A small subset of the full data set to facilitate
  short run time: ~0.1% of the full sequence
  collection.
• Sequenced across six Illumina GAIIx
  lanes, with a subset of the samples also
  sequenced on 454.
• The online tutorial contains details on all of
  the steps: go back and read that text.
Key QIIME files

• Mapping file: per sample meta-data, user-
  defined
• Input sequence file
• OTU table: sample x OTU matrix, central to
  downstream analyses [now in biom format]
• Parameters file: defines analyses, for use
  with the ‘workflow’ scripts (optional)
Mapping file
Mapping file: always run
             check_id_map.py




 = required field
Sequences file
>[sampleID_seqID] description

Barcodes have been removed!!
>[sampleID_seqID] description

Barcodes have been removed!!
Sequences file: can be user-provided, or
    generated by split_libraries.py
OTU table
     (classic format)
sample x OTU matrix
OTU table
                  (classic format)
    sample x OTU matrix




OTU identifiers
OTU table
                     (classic format)
     sample x OTU matrix




Sample identifiers
OTU table
                    (classic format)
        sample x OTU matrix




Optional per OTU taxonomic information
OTU tables are now in biological observation
             matrix (.biom) format
          (QIIME 1.4.0-dev and later)
            Google: “biom format”


         http://biom-format.org


                See convert_biom.py
for translating between classic and biom otu tables
sample x observation contingency matrix
   Samples

OTUs

       Observation
       counts
sample x observation contingency matrix

       Samples

Taxa

         Observation
         counts
sample x observation contingency matrix
     Metagenomes

Functions

            Observation
            counts
sample x observation contingency matrix
        Samples                          Genomes                       Samples
   OTUs                           Ortholog                      Taxa
                                   groups
            Marker                           Comparative                 Marker
            gene (e.g., 16S)                 genomics                    gene (e.g., 16S)
            surveys                                                      surveys



                                             Samples
     Metagenomes

Functions                          Metabolites

            Metagenomics

            Metatranscriptomics
                                                 Metabolomics
                                                                            ...
The Biological Observation Matrix (BIOM) Format
  or: How I Learned To Stop Worrying and
  Love the Ome-ome

    JSON-based format for
    representing arbitrary
    sample x observation
    contingency tables with
    optional metadata




McDonald et al., GigaScience (2012).
                                       http://www.biom-format.org
Comparative genomic (B) and metagenome
analysis (C) with QIIME
Working with OTU tables
• single_rarefaction.py: even sampling (very important if you
  have different numbers of seqs/sample!)
• filter_otus_from_otu_table.py
• filter_samples_from_otu_table.py
• per_library_stats.py
OTU picking: terminology
OTU picking
• De Novo
  – Reads are clustered based on similarity to one
    another.
• Reference-based
  – Closed reference: any reads which don’t hit a
    reference sequence are discarded
  – Open reference: any reads which don’t hit a
    reference sequence are clustered de novo
De novo OTU picking
• Pros
  – All reads are clustered
• Cons
  – Not parallelizable
  – OTUs may be defined by erroneous reads
Closed-reference OTU picking
• Pros
  – Built-in quality filter
  – Easily parallelizable
  – OTUs are defined by high-quality, trusted
    sequences
• Cons
  – Reads that don’t hit reference dataset are
    excluded, so you can never observe new OTUs
Percentage of reads
that do not hit the
reference
collection, by
environment type.
Open-reference OTU picking
• Pros
  – All reads are clustered
  – Partially parallelizable
• Cons
  – Only partially parallelizable
  – Mix of high quality sequences defining OTUs
    (i.e., the database sequences) and possible low
    quality sequences defining OTUs (i.e., the
    sequencing reads)
Considerations in analysis
Variation in sampling depth is an
important consideration




                                                                                         Human skin, colored
                                                                                         by individual, at 500
                                                                                         sequence/sample

Image/analysis credit: Justin Kuczynski

Data reference:
Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R.
Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
Variation in sampling depth is an
important consideration




                                                                                       Human skin, colored by
                                                                                       sampling depth, at
                                                                                       either 50 or 500
                                                                                       sequences/sample
Image/analysis credit: Justin Kuczynski

Data reference:
Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R.
Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
Variation in sampling depth is an
important consideration




                                                                                       Human skin, colored by
                                                                                       sampling depth, at
                                                                                       either 50 (blue) or 500
                                                                                       (red) sequences/sample
Image/analysis credit: Justin Kuczynski

Data reference:
Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R.
Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
How deep is deep enough?
It depends on the question…
  – Differences between community types: not many
    sequences.
  – Rare biosphere: more (but be careful about
    sequencing noise!)
How deep is deep enough?

   100 sequences/sample                                    10 sequences/sample                              1 sequence/sample
PC2 (8 .4 %)



                                             PC2 (1 1 %)
                                                                                              PC2 (1 7 %)




                                                                                                                     PC1 (2 4 %)


                                                                 PC1 (1 3 %)
                              PC1 (8 .6 %)
                                                                                                                                   PC3 (9 .7 %)

                                                                               PC3 (8 .1 %)



               PC3 (6 .2 %)




                                              Direct sequencing of the human microbiome readily reveals community differences.
                                                                                      J Kuczynski et al. Genome Biology (2011).
Figure 1
  (A)              (B)




                  10
            100




                   1

           (C)
Can we get accurate taxonomic
 assignment from short reads?
Extra slides
Elizabeth K. Costello, et al. Science 2009.
Bacterial Community Variation in Human Body Habitats Across Space and Time.
This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a
copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to
Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Feel free to use or modify these slides, but please credit me by placing the following attribution
information where you feel that it makes sense: Greg Caporaso, www.caporaso.us.

Contenu connexe

Tendances

Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
EBI
 
JulieKlein_Bosc2012
JulieKlein_Bosc2012JulieKlein_Bosc2012
JulieKlein_Bosc2012
KUPKB_Team
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
Monica Munoz-Torres
 

Tendances (20)

2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
Ensembl Browser Workshop
Ensembl Browser WorkshopEnsembl Browser Workshop
Ensembl Browser Workshop
 
Introduction to METAGENOTE
Introduction to METAGENOTE Introduction to METAGENOTE
Introduction to METAGENOTE
 
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and MetagenomicsCross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
Cross-Kingdom Standards in Genomics, Epigenomics and Metagenomics
 
vectorQC: 'A pipeline for assembling and annotation of vectors'
vectorQC: 'A pipeline for assembling and annotation of vectors'vectorQC: 'A pipeline for assembling and annotation of vectors'
vectorQC: 'A pipeline for assembling and annotation of vectors'
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
 
2016 02 23_biological_databases_part1
2016 02 23_biological_databases_part12016 02 23_biological_databases_part1
2016 02 23_biological_databases_part1
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 
JulieKlein_Bosc2012
JulieKlein_Bosc2012JulieKlein_Bosc2012
JulieKlein_Bosc2012
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 
Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/Bioconductor
 
Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...
 
NCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesNCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners Slides
 
Whole exome sequencing(wes)
Whole exome sequencing(wes)Whole exome sequencing(wes)
Whole exome sequencing(wes)
 
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
 

Similaire à Caporaso sloan qiime_workshop_slides_18_oct2012

Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Reece Hart
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
Thomas Keane
 
MAGE-TAB introduction: Alvis Brazma (EBI)
MAGE-TAB introduction: Alvis Brazma (EBI)MAGE-TAB introduction: Alvis Brazma (EBI)
MAGE-TAB introduction: Alvis Brazma (EBI)
niranabey
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Klaas Vandepoele
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
sesejun
 

Similaire à Caporaso sloan qiime_workshop_slides_18_oct2012 (20)

NCBI
NCBINCBI
NCBI
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic mining
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
LAS - Project Overview
LAS - Project OverviewLAS - Project Overview
LAS - Project Overview
 
Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR Genomics
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
 
Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
MAGE-TAB introduction: Alvis Brazma (EBI)
MAGE-TAB introduction: Alvis Brazma (EBI)MAGE-TAB introduction: Alvis Brazma (EBI)
MAGE-TAB introduction: Alvis Brazma (EBI)
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 

Dernier

MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
Krashi Coaching
 
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdfFinancial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
MinawBelay
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
中 央社
 

Dernier (20)

MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
 
Navigating the Misinformation Minefield: The Role of Higher Education in the ...
Navigating the Misinformation Minefield: The Role of Higher Education in the ...Navigating the Misinformation Minefield: The Role of Higher Education in the ...
Navigating the Misinformation Minefield: The Role of Higher Education in the ...
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
 
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdfFinancial Accounting IFRS, 3rd Edition-dikompresi.pdf
Financial Accounting IFRS, 3rd Edition-dikompresi.pdf
 
Capitol Tech Univ Doctoral Presentation -May 2024
Capitol Tech Univ Doctoral Presentation -May 2024Capitol Tech Univ Doctoral Presentation -May 2024
Capitol Tech Univ Doctoral Presentation -May 2024
 
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
 
How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17
 
Word Stress rules esl .pptx
Word Stress rules esl               .pptxWord Stress rules esl               .pptx
Word Stress rules esl .pptx
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 
The Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryThe Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. Henry
 
Software testing for project report .pdf
Software testing for project report .pdfSoftware testing for project report .pdf
Software testing for project report .pdf
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
 
Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 
IPL Online Quiz by Pragya; Question Set.
IPL Online Quiz by Pragya; Question Set.IPL Online Quiz by Pragya; Question Set.
IPL Online Quiz by Pragya; Question Set.
 
HVAC System | Audit of HVAC System | Audit and regulatory Comploance.pptx
HVAC System | Audit of HVAC System | Audit and regulatory Comploance.pptxHVAC System | Audit of HVAC System | Audit and regulatory Comploance.pptx
HVAC System | Audit of HVAC System | Audit and regulatory Comploance.pptx
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 
REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptxREPRODUCTIVE TOXICITY  STUDIE OF MALE AND FEMALEpptx
REPRODUCTIVE TOXICITY STUDIE OF MALE AND FEMALEpptx
 

Caporaso sloan qiime_workshop_slides_18_oct2012

  • 1. QIIME Workshop Get started by opening: http://bit.ly/mbe-qiime2012 and read up at: www.qiime.org Greg Caporaso gregcaporaso@gmail.com
  • 2. www.qiime.org Extract DNA and amplify marker gene with barcoded primers Pool amplicons and sequence RefSeq 1 >GCACCTGAGGACAGGCATGAGGAA… >GCACCTGAGGACAGGGGAGGAGGA… RefSeq 2 >TCACATGAACCTAGGCAGGACGAA… RefSeq 3 RefSeq 4 >CTACCGGAGGACAGGCATGAGGAT… >TCACATGAACCTAGGCAGGAGGAA… RefSeq 5 RefSeq 6 >GCACCTGAGGACACGCAGGACGAC… >CTACCGGAGGACAGGCAGGAGGAA… RefSeq 7 >CTACCGGAGGACACACAGGAGGAA… RefSeq 8 RefSeq 9 >GAACCTTCACATAGGCAGGAGGAT… >TCACATGAACCTAGGGGCAAGGAA… RefSeq 10 >GCACCTGAGGACAGGCAGGAGGAA… Assign millions of Compute UniFrac distances Assign reads to samples sequences from thousands and compare samples of samples to OTUs
  • 3. >5000 samples in analysis pipeline • Stream and lake water • Marine water, sediment and reef • Soil (forest, farm, peatland, tundra, …) • Air • Coalbed • Arctic ice core • Insect-associated • Human-associated (gut, mouth, skin) http://www.earthmicrobiome.org/
  • 5. Alpha diversity by environment type
  • 6. Where do we look for new diversity? * As determined by no hit to Greengenes database.
  • 7. Sequencing output Metadata (454, Illumina, Sanger) fastq, fasta, qual, or sff/trace files mapping file www.QIIME.org Phylogenetic Tree OTU (or other sample by Pre-processing observation) table Evolutionary relationship e.g., remove primer(s), demultiplex, between OTUs quality filter Denoise 454 Data Database Submission α-diversity and rarefaction β-diversity and rarefaction PyroNoise, Denoiser e.g., Phylogenetic e.g., Weighted and (In development) Diversity, Chao1, unweighted UniFrac, Bray- Observed Species Curtis, Jaccard Pick OTUs and representative sequences Reference based De novo Interactive visualizations BLAST, UCLUST, e.g., UCLUST, CD-HIT, USEARCH MOTHUR, USEARCH e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering. Assign taxonomy Align sequences e.g., PyNAST, Legend BLAST, RDP Currently supported for INFERNAL, MUSCLE, Currently supported for Classifier general sample by MAFFT marker-gene data only observation data (i.e., 'upstream' step) (i.e., 'downstream' step) Build 'OTU table' Build phylogenetic tree i.e., sample by observation e.g., FastTree, RAxML, Required step or input Optional step or input matrix ClearCut
  • 9. Running QIIME Native installation on OS X or Linux (laptops through 16,416-core compute cluster*) Ubuntu Linux Virtual Box Amazon Web Services (EC2) * http://ncar.janus.rc.colorado.edu/
  • 11. Moving Pictures of the Human Microbiome • Two subjects sampled daily, one for six months, one for 18 months • Four body sites: tongue, palm of left hand, palm of right hand, and gut (via fecal swabs).
  • 12. Moving Pictures of the Human Microbiome • Investigate the relative temporal variability of body sites. • Is there a temporal core microbiome? • Technical points: do we observe the same conclusions on 454 and Illumina data?
  • 13. Moving Pictures of the Human Microbiome: QIIME tutorial • A small subset of the full data set to facilitate short run time: ~0.1% of the full sequence collection. • Sequenced across six Illumina GAIIx lanes, with a subset of the samples also sequenced on 454. • The online tutorial contains details on all of the steps: go back and read that text.
  • 14. Key QIIME files • Mapping file: per sample meta-data, user- defined • Input sequence file • OTU table: sample x OTU matrix, central to downstream analyses [now in biom format] • Parameters file: defines analyses, for use with the ‘workflow’ scripts (optional)
  • 16. Mapping file: always run check_id_map.py = required field
  • 20. Sequences file: can be user-provided, or generated by split_libraries.py
  • 21. OTU table (classic format) sample x OTU matrix
  • 22. OTU table (classic format) sample x OTU matrix OTU identifiers
  • 23. OTU table (classic format) sample x OTU matrix Sample identifiers
  • 24. OTU table (classic format) sample x OTU matrix Optional per OTU taxonomic information
  • 25. OTU tables are now in biological observation matrix (.biom) format (QIIME 1.4.0-dev and later) Google: “biom format” http://biom-format.org See convert_biom.py for translating between classic and biom otu tables
  • 26. sample x observation contingency matrix Samples OTUs Observation counts
  • 27. sample x observation contingency matrix Samples Taxa Observation counts
  • 28. sample x observation contingency matrix Metagenomes Functions Observation counts
  • 29. sample x observation contingency matrix Samples Genomes Samples OTUs Ortholog Taxa groups Marker Comparative Marker gene (e.g., 16S) genomics gene (e.g., 16S) surveys surveys Samples Metagenomes Functions Metabolites Metagenomics Metatranscriptomics Metabolomics ...
  • 30. The Biological Observation Matrix (BIOM) Format or: How I Learned To Stop Worrying and Love the Ome-ome JSON-based format for representing arbitrary sample x observation contingency tables with optional metadata McDonald et al., GigaScience (2012). http://www.biom-format.org
  • 31. Comparative genomic (B) and metagenome analysis (C) with QIIME
  • 32. Working with OTU tables • single_rarefaction.py: even sampling (very important if you have different numbers of seqs/sample!) • filter_otus_from_otu_table.py • filter_samples_from_otu_table.py • per_library_stats.py
  • 34. OTU picking • De Novo – Reads are clustered based on similarity to one another. • Reference-based – Closed reference: any reads which don’t hit a reference sequence are discarded – Open reference: any reads which don’t hit a reference sequence are clustered de novo
  • 35. De novo OTU picking • Pros – All reads are clustered • Cons – Not parallelizable – OTUs may be defined by erroneous reads
  • 36. Closed-reference OTU picking • Pros – Built-in quality filter – Easily parallelizable – OTUs are defined by high-quality, trusted sequences • Cons – Reads that don’t hit reference dataset are excluded, so you can never observe new OTUs
  • 37. Percentage of reads that do not hit the reference collection, by environment type.
  • 38. Open-reference OTU picking • Pros – All reads are clustered – Partially parallelizable • Cons – Only partially parallelizable – Mix of high quality sequences defining OTUs (i.e., the database sequences) and possible low quality sequences defining OTUs (i.e., the sequencing reads)
  • 40. Variation in sampling depth is an important consideration Human skin, colored by individual, at 500 sequence/sample Image/analysis credit: Justin Kuczynski Data reference: Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R. Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
  • 41. Variation in sampling depth is an important consideration Human skin, colored by sampling depth, at either 50 or 500 sequences/sample Image/analysis credit: Justin Kuczynski Data reference: Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R. Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
  • 42. Variation in sampling depth is an important consideration Human skin, colored by sampling depth, at either 50 (blue) or 500 (red) sequences/sample Image/analysis credit: Justin Kuczynski Data reference: Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R. Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
  • 43. How deep is deep enough? It depends on the question… – Differences between community types: not many sequences. – Rare biosphere: more (but be careful about sequencing noise!)
  • 44. How deep is deep enough? 100 sequences/sample 10 sequences/sample 1 sequence/sample PC2 (8 .4 %) PC2 (1 1 %) PC2 (1 7 %) PC1 (2 4 %) PC1 (1 3 %) PC1 (8 .6 %) PC3 (9 .7 %) PC3 (8 .1 %) PC3 (6 .2 %) Direct sequencing of the human microbiome readily reveals community differences. J Kuczynski et al. Genome Biology (2011).
  • 45. Figure 1 (A) (B) 10 100 1 (C)
  • 46. Can we get accurate taxonomic assignment from short reads?
  • 47.
  • 48.
  • 50. Elizabeth K. Costello, et al. Science 2009. Bacterial Community Variation in Human Body Habitats Across Space and Time.
  • 51.
  • 52.
  • 53. This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Feel free to use or modify these slides, but please credit me by placing the following attribution information where you feel that it makes sense: Greg Caporaso, www.caporaso.us.