SlideShare une entreprise Scribd logo
1  sur  108
Télécharger pour lire hors ligne
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Lecture 18:
EVE 161:

Microbial Phylogenomics
!
Lecture #18:
Era IV: Metagenomics Case Study
!
UC Davis, Winter 2014
Instructor: Jonathan Eisen
!1
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
• Next week Student Presentations
• Each student gets 10 minutes total
• Eight minutes to present and 2 minutes for questions
• Possible presentation timing
! 2 minutes Overview and Methods
! 4 minutes R & D
! 2 minutes Conclusions and Future Ideas
! 2 minutes Questions
• Contact Holly Ganz hhganz@ucdavis.edu for non Eisen
guidance
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
ARTICLES
A human gut microbial gene catalogue
established by metagenomic sequencing
Junjie Qin1
*, Ruiqiang Li1
*, Jeroen Raes2,3
, Manimozhiyan Arumugam2
, Kristoffer Solvsten Burgdorf4
,
Chaysavanh Manichanh5
, Trine Nielsen4
, Nicolas Pons6
, Florence Levenez6
, Takuji Yamada2
, Daniel R. Mende2
,
Junhua Li1,7
, Junming Xu1
, Shaochuan Li1
, Dongfang Li1,8
, Jianjun Cao1
, Bo Wang1
, Huiqing Liang1
, Huisong Zheng1
,
Yinlong Xie1,7
, Julien Tap6
, Patricia Lepage6
, Marcelo Bertalan9
, Jean-Michel Batto6
, Torben Hansen4
, Denis Le
Paslier10
, Allan Linneberg11
, H. Bjørn Nielsen9
, Eric Pelletier10
, Pierre Renault6
, Thomas Sicheritz-Ponten9
,
Keith Turner12
, Hongmei Zhu1
, Chang Yu1
, Shengting Li1
, Min Jian1
, Yan Zhou1
, Yingrui Li1
, Xiuqing Zhang1
,
Songgang Li1
, Nan Qin1
, Huanming Yang1
, Jian Wang1
, Søren Brunak9
, Joel Dore´6
, Francisco Guarner5
,
Karsten Kristiansen13
, Oluf Pedersen4,14
, Julian Parkhill12
, Jean Weissenbach10
, MetaHIT Consortium{, Peer Bork2
,
S. Dusko Ehrlich6
& Jun Wang1,13
To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here
we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant
microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals. The gene set,
,150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent)
microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes. The
genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entire
cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which are
also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of
functions present in all individuals and most bacteria, respectively.
Vol 464|4 March 2010|doi:10.1038/nature08821
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
ARTICLES
A human gut microbial gene catalogue
established by metagenomic sequencing
Junjie Qin1
*, Ruiqiang Li1
*, Jeroen Raes2,3
, Manimozhiyan Arumugam2
, Kristoffer Solvsten Burgdorf4
,
Chaysavanh Manichanh5
, Trine Nielsen4
, Nicolas Pons6
, Florence Levenez6
, Takuji Yamada2
, Daniel R. Mende2
,
Junhua Li1,7
, Junming Xu1
, Shaochuan Li1
, Dongfang Li1,8
, Jianjun Cao1
, Bo Wang1
, Huiqing Liang1
, Huisong Zheng1
,
Yinlong Xie1,7
, Julien Tap6
, Patricia Lepage6
, Marcelo Bertalan9
, Jean-Michel Batto6
, Torben Hansen4
, Denis Le
Paslier10
, Allan Linneberg11
, H. Bjørn Nielsen9
, Eric Pelletier10
, Pierre Renault6
, Thomas Sicheritz-Ponten9
,
Keith Turner12
, Hongmei Zhu1
, Chang Yu1
, Shengting Li1
, Min Jian1
, Yan Zhou1
, Yingrui Li1
, Xiuqing Zhang1
,
Songgang Li1
, Nan Qin1
, Huanming Yang1
, Jian Wang1
, Søren Brunak9
, Joel Dore´6
, Francisco Guarner5
,
Karsten Kristiansen13
, Oluf Pedersen4,14
, Julian Parkhill12
, Jean Weissenbach10
, MetaHIT Consortium{, Peer Bork2
,
S. Dusko Ehrlich6
& Jun Wang1,13
To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here
we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant
microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals. The gene set,
,150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent)
microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes. The
genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entire
cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which are
also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of
functions present in all individuals and most bacteria, respectively.
Vol 464|4 March 2010|doi:10.1038/nature08821
THAT”S A LOT OF
AUTHORS
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
METHODS
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Human faecal samples were collected, frozen immediately and DNA was
purified by standard methods22. For all 124 individuals, paired-end
libraries were constructed with different clone insert sizes and subjected
to Illumina GA sequencing. All reads were assembled using
SOAPdenovo19, with specific parameter ‘2M 3’ for metagenomics data.
MetaGene was used for gene prediction. A non-redundant gene set was
constructed by pair-wise comparison of all genes, using BLAT36 under
the criteria of identity .95% and overlap .90%. Gene taxonomic
assignments were made on the basis of BLASTP37 search (e-value ,1 3
1025) of the NCBI-NR database and 126 known gut bacteria genomes.
Gene functional annotations were made by BLASTP search (e-value ,1 3
1025) with eggNOG and KEGG (v48.2) databases. The total and shared
number of orthologous groups and/or gene families were computed using
a random combination of n individuals (with n 5 2 to 124, 100 replicates
per bin).
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
As part of the MetaHIT (Metagenomics of the Human Intestinal Tract)
project, we collected faecal specimens from 124 healthy, over- weight
and obese individual human adults, as well as inflammatory bowel
disease (IBD) patients, from Denmark and Spain (Supplementary Table
1)
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
As part of the MetaHIT (Metagenomics of the Human Intestinal Tract)
project, we collected faecal specimens from 124 healthy, over- weight
and obese individual human adults, as well as inflammatory bowel
disease (IBD) patients, from Denmark and Spain (Supplementary Table
1)
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Supplementary Tables
Table 1 | DNA sample information.
All Danish individuals in the present subsample were originally recruited from a larger
population-based sample of middle-aged people living in the northern part of
Copenhagen region and sampled from the centralized personal number register. At the
original recruitment the individuals included in the present study had normal fasting
plasma glucose and normal 2 hour plasma glucose following an oral glucose tolerance
test. At the time of fecal sampling all were examined in the fasting state and had
non-diabetic fasting plasma glucose levels below 7,0 mmol/l. All of the IBD patients
were in clinical remission at the time of fecal sampling. N refers to no IBD, CD & UC
to Crohn’s disease and ulcerative colitis, respectively.
Sample Name Country Gender Age BMI IBD
MH0001 Denmark female 49 25.55 N
MH0002 Denmark female 59 27.28 N
MH0003 Denmark male 69 33.19 N
MH0004 Denmark male 59 31.18 N
MH0005 Denmark male 64 21.68 N
MH0006 Denmark female 59 22.38 N
MH0007 Denmark male 69 33.60 N
MH0008 Denmark male 59 24.35 N
MH0009 Denmark male 64 29.04 N
MH0010 Denmark male 64 33.27 N
MH0011 Denmark female 0 22.31 N
MH0012 Denmark female 42 32.10 N
MH0013 Denmark male 54 20.46 N
MH0014 Denmark female 54 38.49 N
MH0015 Denmark male 59 25.47 N
MH0016 Denmark female 49 30.50 N
MH0017 Denmark male 64 21.81 N
MH0018 Denmark male 49 31.37 N
MH0019 Denmark female 44 20.01 N
MH0020 Denmark female 63 33.23 N
MH0021 Denmark female 49 25.42 N
MH0022 Denmark male 64 24.42 N
MH0023 Denmark male 69 31.74 N
MH0024 Denmark female 59 22.72 N
MH0025 Denmark female 49 34.20 N
MH0026 Denmark female 49 37.32 N
MH0027 Denmark female 59 23.07 N
MH0028 Denmark female 44 22.70 N
MH0030 Denmark male 59 35.21 N
MH0031 Denmark male 69 22.34 N
MH0032 Denmark male 69 35.28 N
MH0033 Denmark female 59 31.95 N
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
METHODS!
& !
RESULTS!
(mixed)
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Total DNA was extracted from the faecal specimens18 and an average of
4.5 Gb (ranging between 2 and 7.3 Gb) of sequence was generated for
each sample, allowing us to capture most of the novelty (see Methods
and Supplementary Table 2). In total, we obtained 576.7 Gb of sequence
(Supplementary Table 3).	

!
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Total DNA was extracted from the faecal specimens18 and an average of
4.5 Gb (ranging between 2 and 7.3 Gb) of sequence was generated for
each sample, allowing us to capture most of the novelty (see Methods
and Supplementary Table 2). In total, we obtained 576.7 Gb of sequence
(Supplementary Table 3).	

!
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Table 2 | Summary of Sanger reads. The reads were sequenced by 3730xl.
Low-quality sequences at both ends with phred score less than 20 were trimmed. Very
short reads with length less than 100 bp were filtered.
Sample ID # Sanger reads Average length (bp) Total length (bp)
MH0006 237,567 660.65 156,949,306
MH0012 230,768 670.26 154,675,458
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Table 3 | Summary of Illumina GA reads. We constructed libraries with three
different insert sizes of about 135 bp, 200 bp, and 400 bp. The insert sizes of each
library were estimated by re-aligning the paired-end reads on the assembled contigs.
Sample
ID
Paired-end
insert size (bp)
Read
length
(bp) # of reads
Data
(Gb)
human
reads, %
# of high
quality reads
MH0047 136/378 75 35,355,400 2.65 0.18 26,932,064
MH0021 134/354 75 36,454,400 2.73 0.12 26,258,326
MH0079 135/360 75 38,011,600 2.85 0.40 27,418,899
MH0078 146/373 75 38,038,200 2.85 1.56 26,051,537
MH0052 141/367 75 39,538,000 2.97 0.08 28,575,036
MH0049 134/343 75 40,444,200 3.03 0.06 30,654,842
MH0076 134/409 75 40,697,000 3.05 0.42 30,650,106
MH0051 143/374 75 41,911,800 3.14 0.32 25,963,104
MH0048 143/349 75 42,923,600 3.22 0.26 26,972,970
O2.UC-14 141/355 75 43,343,000 3.25 0.06 26,942,750
MH0015 235 44 44,671,400 1.97 0.04 33,014,675
MH0018 233 44 45,081,400 1.98 2.14 36,609,695
MH0027 238 44 45,190,000 1.99 0.09 32,377,390
MH0017 223 44 45,557,200 2.00 0.04 36,154,362
MH0022 256 44 46,415,000 2.04 0.21 37,112,508
MH0023 237 44 48,598,400 2.14 0.04 37,782,998
MH0019 249 44 49,229,400 2.17 0.06 38,856,780
MH0026 156/398 75 49,812,000 3.74 0.05 37,484,066
MH0013 238 44 50,257,200 2.21 1.63 40,028,120
MH0005 237 44 50,704,800 2.23 0.23 39,407,333
MH0007 195 44 50,719,800 2.23 0.31 36,956,284
MH0008 219 44 51,411,000 2.26 0.10 38,156,496
V1.UC-7 141/356 75 51,911,400 3.89 14.67 36,788,540
MH0010 220 44 52,218,200 2.30 0.08 39,169,850
V1.CD-12 148/361 75 53,519,400 4.01 0.02 40,609,134
O2.UC-20 141/362 75 53,637,200 4.02 0.03 38,376,747
V1.CD-15 143/351 75 53,938,600 4.05 2.85 40,560,446
O2.UC-19 133/352 75 54,537,600 4.09 0.01 38,459,550
MH0004 218 44 55,829,800 2.46 0.95 40,288,492
MH0062 144/357 75 57,128,400 4.28 14.32 36,809,224
MH0066 147/429 75 57,234,200 4.29 0.05 36,114,997
O2.UC-21 142/362 75 57,856,000 4.34 0.03 34,832,308
V1.CD-13 139/352 75 58,145,800 4.36 0.04 42,560,831
MH0080 140/376 75 58,220,800 4.37 0.13 46,590,749
V1.UC-13 131/352 75 58,381,400 4.38 9.77 38,553,580
MH0032 142/370 75 58,822,400 3.93 0.39 50,110,067
O2.UC-12 153/384 75 58,927,800 4.42 0.12 36,908,526
MH0001 214 44 59,239,200 2.61 0.06 45,016,612
V1.UC-6 142/376 75 59,270,800 4.45 0.41 43,150,856
MH0060 142/367 75 60,156,000 4.51 0.07 41,112,227
MH0053 137/416 75 60,788,600 4.56 0.07 43,283,564
MH0002 139/370 75 61,077,000 4.58 0.15 46,570,095
O2.UC-11 142/377 75 61,253,800 4.59 0.14 38,507,042
MH0059 136/370 75 61,574,600 4.62 0.18 41,025,606
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Wanting to generate an extensive catalogue of microbial genes from the
human gut, we first assembled the short Illumina reads into longer
contigs, which could then be analysed and annotated by standard
methods. Using SOAPdenovo19, a de Bruijn graph-based tool specially
designed for assembling very short reads, we performed de novo
assembly for all of the Illumina GA sequence data. Because a high
diversity between individuals is expected8,16,17, we first assembled each
sample independently (Supplementary Fig. 3). As much as 42.7% of the
Illumina GA reads was assembled into a total of 6.58 million contigs of a
length .500 bp, giving a total contig length of 10.3 Gb, with an N50
length of 2.2 kb (Supplementary Fig. 4) and the range of 12.3 to 237.6
Mb (Supplementary Table 4). Almost 35% of reads from any one sample
could be mapped to contigs from other samples, indicating the existence
of a common sequence core.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Wanting to generate an extensive catalogue of microbial genes from the
human gut, we first assembled the short Illumina reads into longer
contigs, which could then be analysed and annotated by standard
methods. Using SOAPdenovo19, a de Bruijn graph-based tool specially
designed for assembling very short reads, we performed de novo
assembly for all of the Illumina GA sequence data. Because a high
diversity between individuals is expected8,16,17, we first assembled
each sample independently (Supplementary Fig. 3). As much as 42.7%
of the Illumina GA reads was assembled into a total of 6.58 million
contigs of a length .500 bp, giving a total contig length of 10.3 Gb, with
an N50 length of 2.2 kb (Supplementary Fig. 4) and the range of 12.3 to
237.6 Mb (Supplementary Table 4). Almost 35% of reads from any one
sample could be mapped to contigs from other samples, indicating the
existence of a common sequence core.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Figure 3 | Flowchart of human gut microbiome data analysis process. We performed
de novo short reads assembly for each sample independently, then all the unassembled
reads were pooled for another round of assembly. ORFs were predicted in each of the
contig set, and were merged by removing redundancy. The non-redundant gene set was
used in all further analysis.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Figure 4 | Length distribution of assembled contigs. The number of contigs in
different length bins for each individual was computed, and the data from all 124
individuals were pooled. Boxes denote 25% and 75% percentiles, the red line
corresponds to the median, and the “whiskers” indicate interquartile range from either
or both ends of the box
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Table 4 | Summary of de novo assembly results. Assembled sequences with length
below 500 bp were excluded from the contig set.
Sample ID # of contigs
Contig N50
(bp)
Total length
(Mb)
% reads
assembled
Unassembled
reads (Gb)
MH0001 14,301 1,618 19.69 46.34 1.06
MH0002 65,392 1,680 88.77 45.31 1.91
MH0003 68,658 2,640 119.59 54.40 1.72
MH0004 23,793 1,681 31.92 41.54 1.05
MH0005 14,339 1,684 19.62 40.22 1.04
MH0006 144,440 2,025 217.77 52.39 5.24
MH0007 28,108 1,270 32.00 29.15 1.16
MH0008 26,506 1,768 37.24 43.53 0.95
MH0009 70,014 2,440 112.96 44.14 2.45
MH0010 25,674 1,815 36.52 48.77 0.88
MH0011 86,201 2,158 134.25 46.09 2.37
MH0012 140,991 2,478 237.58 42.77 7.99
MH0013 20,495 2,332 32.20 41.22 1.05
MH0014 66,724 2,957 120.54 50.90 2.08
MH0015 25,933 1,645 34.46 35.53 0.94
MH0016 64,124 2,915 114.03 53.89 1.88
MH0017 24,948 1,679 34.06 39.57 0.96
MH0018 13,247 1,619 17.73 35.23 1.07
MH0019 28,786 1,977 41.95 46.76 0.91
MH0020 44,930 4,708 98.78 56.81 1.49
MH0021 54,101 1,608 70.67 46.49 1.06
MH0022 21,872 1,773 30.00 35.04 1.06
MH0023 16,214 2,100 25.57 35.80 1.07
MH0024 43,145 1,512 54.45 33.21 2.41
MH0025 76,287 1,968 111.48 42.24 2.11
MH0026 33,408 3,769 69.38 43.97 1.58
MH0027 20,369 985 19.72 19.96 1.14
MH0028 61,004 2,630 104.54 49.65 1.80
MH0030 39,267 2,828 66.60 36.73 2.15
MH0031 53,292 1,878 75.84 36.41 2.33
MH0032 37,287 1,921 54.46 20.72 2.60
MH0033 61,782 2,616 102.04 49.87 1.69
MH0034 24,508 2,107 37.63 14.53 2.40
MH0035 68,287 2,075 102.94 48.22 1.93
MH0036 58,690 2,330 94.35 49.44 1.81
MH0037 48,356 2,526 80.44 48.21 1.63
MH0038 50,381 2,921 90.35 47.75 1.79
MH0039 66,509 2,087 104.17 47.66 1.68
MH0040 73,068 2,225 115.15 49.53 1.68
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
To assess the quality of the Illumina GA-based assembly we mapped the
contigs of samples MH0006 and MH0012 to the Sanger reads from the
same samples (Supplementary Table 2). A total of 98.7% of the contigs
that map to at least one Sanger read were collinear over 99.6% of the
mapped regions. This is comparable to the contigs that were generated by
454 sequencing for one of the two samples (MH0006) as a control, of
which 97.9% were collinear over 99.5% of the mapped regions. We
estimate assembly errors to be 14.2 and 20.7 per megabase (Mb) of
Illumina- and 454-based contigs, respectively (see Methods and
Supplementary Fig. 5), indicating that the short- and long-read- based
assemblies have comparable accuracies.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
To assess the quality of the Illumina GA-based assembly we mapped the
contigs of samples MH0006 and MH0012 to the Sanger reads from the
same samples (Supplementary Table 2). A total of 98.7% of the contigs
that map to at least one Sanger read were collinear over 99.6% of the
mapped regions. This is comparable to the contigs that were generated by
454 sequencing for one of the two samples (MH0006) as a control, of
which 97.9% were collinear over 99.5% of the mapped regions. We
estimate assembly errors to be 14.2 and 20.7 per megabase (Mb) of
Illumina- and 454-based contigs, respectively (see Methods and
Supplementary Fig. 5), indicating that the short- and long-read- based
assemblies have comparable accuracies.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Table 2 | Summary of Sanger reads. The reads were sequenced by 3730xl.
Low-quality sequences at both ends with phred score less than 20 were trimmed. Very
short reads with length less than 100 bp were filtered.
Sample ID # Sanger reads Average length (bp) Total length (bp)
MH0006 237,567 660.65 156,949,306
MH0012 230,768 670.26 154,675,458
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Figure 5 | Validating Illumina contigs using Sanger reads. Illumina/454 contigs from
samples MH0006 and MH0012 were mapped to Sanger reads from the same samples.
Aligned regions were scanned for breakage of collinearity, and each unique break is
counted as an error. a. number of errors per Mb of Illumina/454 contigs mapped to
Sanger reads. b. percentage of collinear Illumina/454 contigs and collinear basepairs in
those contigs.
b
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
To complete the contig set we pooled the unassembled reads from all 124
samples, and repeated the de novo assembly process. About 0.4 million
additional contigs were thus generated, having a length of 370 Mb and an
N50 length of 939 bp. The total length of our final contig set was thus
10.7 Gb. Some 80% of the 576.7 Gb of Illumina GA sequence could be
aligned to the contigs at a threshold of 90% identity, allowing for
accommodation of sequencing errors and strain variability in the gut
(Fig. 1), almost twice the 42.7% of sequence that was assembled into
contigs by SOAPdenovo, because assembly uses more stringent criteria.
This indicates that a vast majority of the Illumina sequence is represented
by our contigs.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
To complete the contig set we pooled the unassembled reads from all
124 samples, and repeated the de novo assembly process. About 0.4
million additional contigs were thus generated, having a length of 370
Mb and an N50 length of 939 bp. The total length of our final contig set
was thus 10.7 Gb. Some 80% of the 576.7 Gb of Illumina GA sequence
could be aligned to the contigs at a threshold of 90% identity, allowing
for accommodation of sequencing errors and strain variability in the gut
(Fig. 1), almost twice the 42.7% of sequence that was assembled into
contigs by SOAPdenovo, because assembly uses more stringent criteria.
This indicates that a vast majority of the Illumina sequence is
represented by our contigs.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
the 90% identity threshold. A total of 70.1% and 85.9% of the reads
from the Japanese and US samples, respectively, could be aligned to
(the highest
indicates tha
Although th
that the catal
genes of the
Each indiv
genes (Supp
gene pool m
found in on
20%, wherea
term these ‘
depth; seque
catalogue ge
Nevertheless
harboured 2
cating that a
Interestingly
than the ind
consistent w
diversity tha
Common ba
Deep metag
the existence
100
50
0
Assembled
contig set
Known human
gut bacteria
GenBank
bacteria
Coverageofsequencingreads(%)
Figure 1 | Coverage of human gut microbiome. The three human microbial
sequencing read sets—Illumina GA reads generated from 124 individuals in
this study (black; n 5 124), Roche/454 reads from 18 human twins and their
mothers (grey; n 5 18) and Sanger reads from 13 Japanese individuals
(white; n 5 13)—were aligned to each of the reference sequence sets. Mean
values 6 s.e.m. are plotted.
60
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
To compare the representation of the human gut microbiome in our
contigs with that from previous work, we aligned them to the reads from
the two largest published gut metagenome studies (1.83Gb of Roche/454
sequencing reads from 18 US adults8, and 0.79 Gb of Sanger reads from
13 Japanese adults and infants17), using the 90% identity threshold. A
total of 70.1% and 85.9% of the reads from the Japanese and US
samples, respectively, could be aligned to our contigs (Fig. 1), showing
that the contigs include a high fraction of sequences from previous
studies. In contrast, 85.7% and 69.5% of our contigs were not covered by
the reads from the Japanese and US samples, respectively, highlighting
the novelty we captured.	

!
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
To compare the representation of the human gut microbiome in our
contigs with that from previous work, we aligned them to the reads from
the two largest published gut metagenome studies (1.83Gb of Roche/454
sequencing reads from 18 US adults8, and 0.79 Gb of Sanger reads from
13 Japanese adults and infants17), using the 90% identity threshold. A
total of 70.1% and 85.9% of the reads from the Japanese and US
samples, respectively, could be aligned to our contigs (Fig. 1), showing
that the contigs include a high fraction of sequences from previous
studies. In contrast, 85.7% and 69.5% of our contigs were not covered by
the reads from the Japanese and US samples, respectively, highlighting
the novelty we captured.	

!
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
the 90% identity threshold. A total of 70.1% and 85.9% of the reads
from the Japanese and US samples, respectively, could be aligned to
(the highest
indicates tha
Although th
that the catal
genes of the
Each indiv
genes (Supp
gene pool m
found in on
20%, wherea
term these ‘
depth; seque
catalogue ge
Nevertheless
harboured 2
cating that a
Interestingly
than the ind
consistent w
diversity tha
Common ba
Deep metag
the existence
100
50
0
Assembled
contig set
Known human
gut bacteria
GenBank
bacteria
Coverageofsequencingreads(%)
Figure 1 | Coverage of human gut microbiome. The three human microbial
sequencing read sets—Illumina GA reads generated from 124 individuals in
this study (black; n 5 124), Roche/454 reads from 18 human twins and their
mothers (grey; n 5 18) and Sanger reads from 13 Japanese individuals
(white; n 5 13)—were aligned to each of the reference sequence sets. Mean
values 6 s.e.m. are plotted.
60
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Only 31.0–48.8% of the reads from the two previous studies and the
present study could be aligned to 194 public human gut bacterial
genomes (Supplementary Table 5), and 7.6–21.2% to the bacterial
genomes deposited in GenBank (Fig. 1). This indicates that the reference
gene set obtained by sequencing genomes of isolated bac- terial strains is
still of a limited scale.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Only 31.0–48.8% of the reads from the two previous studies and the
present study could be aligned to 194 public human gut bacterial
genomes (Supplementary Table 5), and 7.6–21.2% to the bacterial
genomes deposited in GenBank (Fig. 1). This indicates that the
reference gene set obtained by sequencing genomes of isolated
bacterial strains is still of a limited scale.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
the 90% identity threshold. A total of 70.1% and 85.9% of the reads
from the Japanese and US samples, respectively, could be aligned to
(the highest
indicates tha
Although th
that the catal
genes of the
Each indiv
genes (Supp
gene pool m
found in on
20%, wherea
term these ‘
depth; seque
catalogue ge
Nevertheless
harboured 2
cating that a
Interestingly
than the ind
consistent w
diversity tha
Common ba
Deep metag
the existence
100
50
0
Assembled
contig set
Known human
gut bacteria
GenBank
bacteria
Coverageofsequencingreads(%)
Figure 1 | Coverage of human gut microbiome. The three human microbial
sequencing read sets—Illumina GA reads generated from 124 individuals in
this study (black; n 5 124), Roche/454 reads from 18 human twins and their
mothers (grey; n 5 18) and Sanger reads from 13 Japanese individuals
(white; n 5 13)—were aligned to each of the reference sequence sets. Mean
values 6 s.e.m. are plotted.
60
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
A gene catalogue of the human gut microbiome	

To establish a non-redundant human gut microbiome gene set we first
used the MetaGene20 program to predict ORFs in our contigs and found
14,048,045 ORFs longer than 100bp (Supplementary Table 6). They
occupied 86.7% of the contigs, comparable to the value found for fully
sequenced genomes (,86%). Two-thirds of the ORFs appeared
incomplete, possibly due to the size of our contigs (N50 of 2.2 kb). We
next removed the redundant ORFs, by pair-wise comparison, using a
very stringent criterion of 95% identity over 90% of the shorter ORF
length, which can fuse orthologues but avoids inflation of the data set due
to possible sequencing errors (see Methods). Yet, the final non-redundant
gene set contained as many as 3,299,822 ORFs with an average length of
704 bp (Supplementary Table 7).
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
A gene catalogue of the human gut microbiome	

To establish a non-redundant human gut microbiome gene set we first
used the MetaGene20 program to predict ORFs in our contigs and found
14,048,045 ORFs longer than 100bp (Supplementary Table 6). They
occupied 86.7% of the contigs, comparable to the value found for fully
sequenced genomes (,86%). Two-thirds of the ORFs appeared
incomplete, possibly due to the size of our contigs (N50 of 2.2 kb). We
next removed the redundant ORFs, by pair-wise comparison, using a
very stringent criterion of 95% identity over 90% of the shorter ORF
length, which can fuse orthologues but avoids inflation of the data set due
to possible sequencing errors (see Methods). Yet, the final non-redundant
gene set contained as many as 3,299,822 ORFs with an average length of
704 bp (Supplementary Table 7).
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Table 7 | Non-redundant genes. Genes were compared at 95 % identity cut-off. Those
that were overlapped over 90% length were considered redundant and removed.
Common and rare genes were present in >50% and < 20% of individuals, respectively.
# of genes Total length (bp) Mean length (bp)
Non-redundant gene set 3,299,822 2,323,171,095 704.03
Common 294,110 292,960,308 996.09
Rare 2,375,655 1,510,527,924 635.84
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
We term the genes of the non-redundant set ‘prevalent genes’, as they are
encoded on contigs assembled from the most abundant reads (see
Methods). The minimal relative abundance of the prevalent genes was ,
631027, as estimated from the minimum sequence coverage of the
unique genes (close to 3), and the total Illumina sequence length
generated for each individual (on average, 4.5 Gb), assuming the average
gene length of 0.85 kb (that is, 3 3 0.85 3 103/ 4.5 3 109).
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
We term the genes of the non-redundant set ‘prevalent genes’, as they
are encoded on contigs assembled from the most abundant reads (see
Methods). The minimal relative abundance of the prevalent genes was ,
63x-7, as estimated from the minimum sequence coverage of the unique
genes (close to 3), and the total Illumina sequence length generated for
each individual (on average, 4.5 Gb), assuming the average gene length
of 0.85 kb (that is, 3 3 0.85 3 103/ 4.5 3 109).
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
We mapped the 3.3 million gut ORFs to the 319,812 genes (target genes)
of the 89 frequent reference microbial genomes in the human gut. At a
90% identity threshold, 80% of the target genes had at least 80% of their
length covered by a single gut ORF (Fig. 2b). This indicates that the gene
set includes most of the known human gut bacterial genes.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
We mapped the 3.3 million gut ORFs to the 319,812 genes (target genes)
of the 89 frequent reference microbial genomes in the human gut. At a
90% identity threshold, 80% of the target genes had at least 80% of their
length covered by a single gut ORF (Fig. 2b). This indicates that the
gene set includes most of the known human gut bacterial genes.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
We examined the number of prevalent genes identified across all
individuals as a function of the extent of sequencing, demanding at least
two supporting reads for a gene call (Fig. 2a). The incidence- based
coverage richness estimator (ICE), determined at 100 individuals (the
highest number the EstimateS21 program could accommodate), indicates
that our catalogue captures 85.3% of the prevalent genes. Although this is
probably an underestimate, it nevertheless indicates that the catalogue
contains an overwhelming majority of the prevalent genes of the cohort.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
We examined the number of prevalent genes identified across all
individuals as a function of the extent of sequencing, demanding at
least two supporting reads for a gene call (Fig. 2a). The incidence- based
coverage richness estimator (ICE), determined at 100 individuals (the
highest number the EstimateS21 program could accommodate), indicates
that our catalogue captures 85.3% of the prevalent genes. Although
this is probably an underestimate, it nevertheless indicates that the
catalogue contains an overwhelming majority of the prevalent genes of
the cohort.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014the cohort. For this purpose, we used a non-redundant set of 650 A complex pattern of species relatedness, character
Relative abundance
Blautia hansenii
Clostridium scindens
Enterococcus faecalis TX0104
Clostridium asparagiforme
Bacteroides fragilis 3_1_12
Bacteroides intestinalis
Ruminococcus gnavus
Anaerotruncus colihominis
Bacteroides pectinophilus
Clostridium nexile
Clostridium sp. L2−50
Parabacteroides johnsonii
Bacteroides finegoldii
Butyrivibrio crossotus
Bacteroides eggerthii
Clostridium sp. M62 1
Coprococcus eutactus
Bacteroides stercoris
Holdemania filiformis
Clostridium leptum
Streptococcus thermophilus LMD−9
Bacteroides capillosus
Subdoligranulum variabile
Ruminococcus obeum A2−162
Bacteroides dorei
Eubacterium ventriosum
Bacteroides sp. D4
Bacteroides sp. D1
Coprococcus comes SL7 1
Bacteriodes xylanisolvens XB1A
Eubacterium rectale M104 1
Bacteroides sp. 2_2_4
Bacteroides sp. 4_3_47FAA
Bacteroides ovatus
Bacteroides sp. 9_1_42FAA
Parabacteroides distasonis ATCC 8503
Eubacterium siraeum 70 3
Bacteroides sp. 2_1_7
Roseburia intestinalis M50 1
Bacteroides vulgatus ATCC 8482
Dorea formicigenerans
Collinsella aerofaciens
Ruminococcus lactaris
Faecalibacterium prausnitzii SL3 3
Ruminococcus sp. SR1 5
Unknown sp. SS3 4
Ruminococcus torques L2−14
Eubacterium hallii
Bacteroides thetaiotaomicron VPI−5482
Clostridium sp. SS2−1
Bacteroides caccae
Ruminococcus bromii L2−63
Dorea longicatena
Parabacteroides merdae
Alistipes putredinis
Bacteroides uniformis
–4 –3
Figure 3 | Relative abundance of 57 frequent microbial ge
individuals of the cohort. See Fig. 2c for definition of box
See Methods for computation.
1
Number of individuals sampled
Numberoforthologousgroups/genefamilies(×103
)
25 50 75 100 124
a b
c
320,000
280, 000
240,000
200,000
160,000
1.0 0.8 0.6 0.4 0.2 0
0.6
0.7
0.8
0.9
1.0
0.5
85%
90%
95%
0
5
10
15
20
0
1
2
3
4
1 20 40 60 80 100
OGs + novel gene families
Known + unknown OGs
Known OGs
Numberofnon-redundant
genes(×106
)
Numberoftargetgenescovered
Fractionoftargetgenes
covered
Number of samples Fraction of gene length covered
Figure 2 | Predicted ORFs in the human gut microbiome. a, Number of
uniquegenes as a function ofthe extent ofsequencing. The gene accumulation
curve corresponds to the Sobs (Mao Tau) values (number of observed genes),
calculated using EstimateS21
(version 8.2.0) on randomly chosen 100 samples
(due to memory limitation). b, Coverage of genes from 89 frequent gut
microbial species (Supplementary Table 12). c, Number of functions captured
by number of samples investigated, based on known (well characterized)
orthologous groups (OGs; bottom), known plus unknown orthologous
groups (including, for example, putative, predicted, conserved hypothetical
functions; middle) and orthologous groups plus novel gene families (.20
proteins) recovered from the metagenome (top). Boxes denote the
interquartile range (IQR) between the first and third quartiles (25th and 75th
percentiles, respectively) and the line inside denotes the median. Whiskers
denote the lowest and highest values within 1.5 times IQR from the first and
third quartiles, respectively. Circles denote outliers beyond the whiskers.
NATURE|Vol 464|4 March 2010
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
)
a b
c
0.
0.
0.
0.
1.
0.
20
0
1
2
3
4
1 20 40 60 80 100
Numberofnon-redundant
genes(×106
)
Fractionoftargetgenes
covered
Number of samples
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Each individual carried 536,112 ±12,167 (mean 6 s.e.m.) prevalent genes
(Supplementary Fig. 6b), indicating that most of the 3.3 million gene
pool must be shared. However, most of the prevalent genes were found in
only a few individuals: 2,375,655 were present in less than 20%, whereas
294,110 were found in at least 50% of individuals (we term these
‘common’ genes). These values depend on the sampling depth;
sequencing of MH0006 and MH0012 revealed more of the catalogue
genes, present at a low abundance (Supplementary Fig. 7). Nevertheless,
even at our routine sampling depth, each individual harboured
204,05663,603 (mean ± s.e.m.) common genes, indi- cating that about
38% of an individual’s total gene pool is shared. Interestingly, the IBD
patients harboured, on average, 25% fewer genes than the individuals not
suffering from IBD (Supplementary Fig. 8), consistent with the
observation that the former have lower bacterial diversity than the
latter22.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Each individual carried 536,112 ±12,167 (mean 6 s.e.m.) prevalent genes
(Supplementary Fig. 6b), indicating that most of the 3.3 million gene
pool must be shared. However, most of the prevalent genes were
found in only a few individuals: 2,375,655 were present in less than
20%, whereas 294,110 were found in at least 50% of individuals (we
term these ‘common’ genes). These values depend on the sampling
depth; sequencing of MH0006 and MH0012 revealed more of the
catalogue genes, present at a low abundance (Supplementary Fig. 7).
Nevertheless, even at our routine sampling depth, each individual
harboured 204,056 ± 3,603 (mean ± s.e.m.) common genes, indi- cating
that about 38% of an individual’s total gene pool is shared. Interestingly,
the IBD patients harboured, on average, 25% fewer genes than the
individuals not suffering from IBD (Supplementary Fig. 8), consistent
with the observation that the former have lower bacterial diversity than
the latter22.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Figure 7 | Number of unique genes identified with increase of sequencing depth in
sample MH0006 and MH0012.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Figure 8 | Distribution of nonredundant bacterial genes in IBD patients and
healthy controls. The proportion of individuals having a given number of genes
(classes of 100 thousand genes were used) is shown.The average gene number for IBD
patients and individuals not suffering from IBD was425,397 + 126,685 (s.d.; n=25) and
564,070 + 121,962 (s.d.; n=99), respectively; p<10-6
(one-tailed Student t test).
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Common bacterial core	

Deep metagenomic sequencing provides the opportunity to explore the
existence of a common set of microbial species (common core) in the
cohort. For this purpose, we used a non-redundant set of 650 sequenced
bacterial and archaeal genomes (see Methods). We aligned the Illumina
GA reads of each human gut microbial sample onto the genome set,
using a 90% identity threshold, and determined the proportion of the
genomes covered by the reads that aligned onto only a single position in
the set. At a 1% coverage, which for a typical gut bacterial genome
corresponds to an average length of about 40 kb, some 25-fold more than
that of the 16S gene generally used for species identification, we detected
18 species in all individuals, 57 in ≥90% and 75 in ≥50% of individuals
(Supplementary Table 8). At 10% coverage, requiring ,10-fold higher
abundance in a sample, we still found 13 of the above species in ≥90% of
individuals and 35 in ≥50%.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Common bacterial core	

Deep metagenomic sequencing provides the opportunity to explore the
existence of a common set of microbial species (common core) in the
cohort. For this purpose, we used a non-redundant set of 650 sequenced
bacterial and archaeal genomes (see Methods). We aligned the Illumina
GA reads of each human gut microbial sample onto the genome set,
using a 90% identity threshold, and determined the proportion of the
genomes covered by the reads that aligned onto only a single position in
the set. At a 1% coverage, which for a typical gut bacterial genome
corresponds to an average length of about 40 kb, some 25-fold more than
that of the 16S gene generally used for species identification, we detected
18 species in all individuals, 57 in ≥90% and 75 in ≥50% of
individuals (Supplementary Table 8). At 10% coverage, requiring ,10-
fold higher abundance in a sample, we still found 13 of the above
species in ≥90% of individuals and 35 in ≥50%.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
When the cumulated sequence length increased from 3.96 Gb to 8.74 Gb
and from 4.41 Gb to 11.6 Gb, for samples MH0006 and MH0012,
respectively, the number of strains common to the two at the 1%
coverage threshold increased by 25%, from 135 to 169. This indicates the
existence of a significantly larger common core than the one we could
observe at the sequence depth routinely used for each individual.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
When the cumulated sequence length increased from 3.96 Gb to 8.74 Gb
and from 4.41 Gb to 11.6 Gb, for samples MH0006 and MH0012,
respectively, the number of strains common to the two at the 1%
coverage threshold increased by 25%, from 135 to 169. This indicates the
existence of a significantly larger common core than the one we could
observe at the sequence depth routinely used for each individual.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
The variability of abundance of microbial species in individuals can
greatly affect identification of the common core. To visualize this
variability, we compared the number of sequencing reads aligned to
different genomes across the individuals of our cohort. Even for the most
common 57 species present in ≥90% of individuals with genome
coverage .1% (Supplementary Table 8), the inter-individual variability
was between 12- and 2,187-fold (Fig. 3). As expected10,23,
Bacteroidetes and Firmicutes had the highest abundance.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
The variability of abundance of microbial species in individuals can
greatly affect identification of the common core. To visualize this
variability, we compared the number of sequencing reads aligned to
different genomes across the individuals of our cohort. Even for the most
common 57 species present in ≥90% of individuals with genome
coverage .1% (Supplementary Table 8), the inter-individual variability
was between 12- and 2,187-fold (Fig. 3). As expected10,23,
Bacteroidetes and Firmicutes had the highest abundance.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014pose, we used a non-redundant set of 650 A complex pattern of species relatedness, characterized by clusters
Relative abundance (log10
)
Blautia hansenii
Clostridium scindens
Enterococcus faecalis TX0104
Clostridium asparagiforme
Bacteroides fragilis 3_1_12
Bacteroides intestinalis
Ruminococcus gnavus
Anaerotruncus colihominis
Bacteroides pectinophilus
Clostridium nexile
Clostridium sp. L2−50
Parabacteroides johnsonii
Bacteroides finegoldii
Butyrivibrio crossotus
Bacteroides eggerthii
Clostridium sp. M62 1
Coprococcus eutactus
Bacteroides stercoris
Holdemania filiformis
Clostridium leptum
Streptococcus thermophilus LMD−9
Bacteroides capillosus
Subdoligranulum variabile
Ruminococcus obeum A2−162
Bacteroides dorei
Eubacterium ventriosum
Bacteroides sp. D4
Bacteroides sp. D1
Coprococcus comes SL7 1
Bacteriodes xylanisolvens XB1A
Eubacterium rectale M104 1
Bacteroides sp. 2_2_4
Bacteroides sp. 4_3_47FAA
Bacteroides ovatus
Bacteroides sp. 9_1_42FAA
Parabacteroides distasonis ATCC 8503
Eubacterium siraeum 70 3
Bacteroides sp. 2_1_7
Roseburia intestinalis M50 1
Bacteroides vulgatus ATCC 8482
Dorea formicigenerans
Collinsella aerofaciens
Ruminococcus lactaris
Faecalibacterium prausnitzii SL3 3
Ruminococcus sp. SR1 5
Unknown sp. SS3 4
Ruminococcus torques L2−14
Eubacterium hallii
Bacteroides thetaiotaomicron VPI−5482
Clostridium sp. SS2−1
Bacteroides caccae
Ruminococcus bromii L2−63
Dorea longicatena
Parabacteroides merdae
Alistipes putredinis
Bacteroides uniformis
–4 –3 –2 –1
Figure 3 | Relative abundance of 57 frequent microbial genomes among
individuals of the cohort. See Fig. 2c for definition of box and whisker plot.
See Methods for computation.
Number of individuals sampled
50 75 100 124
b 320,000
280, 000
240,000
200,000
160,000
1.0 0.8 0.6 0.4 0.2 0
0.6
0.7
0.8
0.9
1.0
0.5
85%
90%
95%
80 100
OGs + novel gene families
Known + unknown OGs
Known OGs
Numberoftargetgenescovered
Fractionoftargetgenes
covered
Fraction of gene length covered
n the human gut microbiome. a, Number of
fthe extent ofsequencing. The gene accumulation
bs (Mao Tau) values (number of observed genes),
(version 8.2.0) on randomly chosen 100 samples
. b, Coverage of genes from 89 frequent gut
ntary Table 12). c, Number of functions captured
tigated, based on known (well characterized)
ottom), known plus unknown orthologous
ple, putative, predicted, conserved hypothetical
ologous groups plus novel gene families (.20
e metagenome (top). Boxes denote the
tween the first and third quartiles (25th and 75th
d the line inside denotes the median. Whiskers
st values within 1.5 times IQR from the first and
Circles denote outliers beyond the whiskers.
2010 ARTICLES
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
A complex pattern of species relatedness, characterized by clusters at the
genus and family levels, emerges from the analysis of the net- work
based on the pair-wise Pearson correlation coefficients of 155 species
present in at least one individual at$ ≥1% coverage (Supplementary Fig.
9). Prominent clusters include some of the most abundant gut species,
such as members of the Bacteroidetes and Dorea/Eubacterium/
Ruminococcus groups and also bifidobacteria, Proteobacteria and
streptococci/lactobacilli groups. These observa- tions indicate that similar
constellations of bacteria may be present in different individuals of our
cohort, for reasons that remain to be established.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
A complex pattern of species relatedness, characterized by clusters at the
genus and family levels, emerges from the analysis of the network based
on the pair-wise Pearson correlation coefficients of 155 species present in
at least one individual at ≥1% coverage (Supplementary Fig. 9).
Prominent clusters include some of the most abundant gut species, such
as members of the Bacteroidetes and Dorea/Eubacterium/Ruminococcus
groups and also bifidobacteria, Proteobacteria and streptococci/
lactobacilli groups. These observa- tions indicate that similar
constellations of bacteria may be present in different individuals of our
cohort, for reasons that remain to be established.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Figure 9 | Relations between the most abundant bacterial species. The network was
deduced from the analysis of 155 bacterial species present in at least 1 individual at a
genome coverage of 1%. Size of the nodes (circles) indicates species abundance over
the cohort, width of the edges (lines connecting the circles) indicates the value of the
Pearson correlation coefficient (only the 342 values above 0.4 or below -0.4 out of a
total of 11,935 were used for the network).
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
The above result indicates that the Illumina-based bacterial profiling
should reveal differences between the healthy individuals and patients.
To test this hypothesis we compared the IBD patients and healthy
controls (Supplementary Table 1), as it was previously reported that the
two have different microbiota22. The principal component analysis,
based on the same 155 species, clearly separates patients from healthy
individuals and the ulcerative colitis from the Crohn’s disease patients
(Fig. 4), confirming our hypothesis.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
The above result indicates that the Illumina-based bacterial profiling
should reveal differences between the healthy individuals and patients.
To test this hypothesis we compared the IBD patients and healthy
controls (Supplementary Table 1), as it was previously reported that the
two have different microbiota22. The principal component analysis,
based on the same 155 species, clearly separates patients from healthy
individuals and the ulcerative colitis from the Crohn’s disease
patients (Fig. 4), confirming our hypothesis.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Almost all (99.96%) of the phylogenetically assigned genes belonged
were within t
This suggests t
(Supplementa
functions imp
We found t
required in all
40
30
20
10
0
Cluster(%)
1
Figure 5 | Clust
were ranked by
length and copy
clusters with th
groups of 100 c
that contains 86
•
•
•
•
•
• •
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
Healthy
Crohn’s disease
Ulcerative colitis
P value: 0.031
PC2
PC1
Figure 4 | Bacterial species abundance differentiates IBD patients and
healthy individuals. Principal component analysis with health status as
instrumental variables, based on the abundance of 155 species with $1%
genome coverage by the Illumina reads in at least 1 individual of the cohort,
was carried out with 14 healthy individuals and 25 IBD patients (21 ulcerative
colitis and 4 Crohn’s disease) from Spain (Supplementary Table 1). Two first
components (PC1 and PC2) were plotted and represented 7.3% of whole
inertia. Individuals (represented by points) were clustered and centre of
gravity computed for each class; P-value of the link between health status and
species abundance was assessed using a Monte-Carlo test (999 replicates).
ARTICLES
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Functions encoded by the prevalent gene set	

We classified the predicted genes by aligning them to the integrated NCBI-NR database of non-
redundant protein sequences, the genes in the KEGG (Kyoto Encyclopedia of Genes and
Genomes)24 pathways, and COG (Clusters of Orthologous Groups)25 and eggNOG26 data-
bases. There were 77.1% genes classified into phylotypes, 57.5% to eggNOG clusters, 47.0% to
KEGG orthology and 18.7% genes assigned to KEGG pathways, respectively (Supplementary
Table 9). Almost all (99.96%) of the phylogenetically assigned genes belonged to the Bacteria and
Archaea, reflecting their predominance in the gut. Genes that were not mapped to orthologous
groups were clustered into gene families (see Methods). To investigate the functional con- tent of
the prevalent gene set we computed the total number of orthologous groups and/or gene families
present in any combination of n individuals (with n 5 2–124; see Fig. 2c). This rarefaction ana-
lysis shows that the ‘known’ functions (annotated in eggNOG or KEGG) quickly saturate (a value
of 5,569 groups was observed): when sampling any subset of 50 individuals, most have been
detected. However, three-quarters of the prevalent gut functionalities consists of uncharacterized
orthologous groups and/or completely novel gene families (Fig. 2c). When including these
groups, the rarefaction curve only starts to plateau at the very end, at a much higher level (19,338
groups were detected), confirming that the extensive sampling of a large number of individuals
was necessary to capture this considerable amount of novel/unknown functionality.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Functions encoded by the prevalent gene set	

We classified the predicted genes by aligning them to the integrated NCBI-NR database of non-
redundant protein sequences, the genes in the KEGG (Kyoto Encyclopedia of Genes and
Genomes)24 pathways, and COG (Clusters of Orthologous Groups)25 and eggNOG26 data-
bases. There were 77.1% genes classified into phylotypes, 57.5% to eggNOG clusters, 47.0% to
KEGG orthology and 18.7% genes assigned to KEGG pathways, respectively (Supplementary
Table 9). Almost all (99.96%) of the phylogenetically assigned genes belonged to the Bacteria and
Archaea, reflecting their predominance in the gut. Genes that were not mapped to orthologous
groups were clustered into gene families (see Methods). To investigate the functional con- tent of
the prevalent gene set we computed the total number of orthologous groups and/or gene families
present in any combination of n individuals (with n 5 2–124; see Fig. 2c). This rarefaction ana-
lysis shows that the ‘known’ functions (annotated in eggNOG or KEGG) quickly saturate (a
value of 5,569 groups was observed): when sampling any subset of 50 individuals, most have
been detected. However, three-quarters of the prevalent gut functionalities consists of
uncharacterized orthologous groups and/or completely novel gene families (Fig. 2c). When
including these groups, the rarefaction curve only starts to plateau at the very end, at a much
higher level (19,338 groups were detected), confirming that the extensive sampling of a large
number of individuals was necessary to capture this considerable amount of novel/unknown
functionality.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
1
Number of individuals sampled
Numberoforthologousgroups/genefamilies(×103
)
25 50 75 100 124
c
160,000
1.0 0.8 0.6 0.4 0.2 0
0.5
95%
0
5
10
15
20
0
1 20 40 60 80 100
OGs + novel gene families
Known + unknown OGs
Known OGs
Nu
covered
Fra
Number of samples Fraction of gene length covered
Figure 2 | Predicted ORFs in the human gut microbiome. a, Number of
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Bacterial functions important for life in the gut	

The extensive non-redundant catalogue of the bacterial genes from the
human intestinal tract provides an opportunity to identify bacterial
functions important for life in this environment. There are functions
necessary for a bacterium to thrive in a gut context (that is, the ‘minimal
gut genome’) and those involved in the homeostasis of the whole
ecosystem, encoded across many species (the ‘minimal gut
metagenome’). The first set of functions is expected to be present in most
or all gut bacterial species; the second set in most or all individuals’ gut
samples.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
To identify the functions encoded by the minimal gut genome we use the fact that they
should be present in most or all gut bacterial species and therefore appear in the gene
catalogue at a frequency above that of the functions present in only some of the gut
bacterial species. The relative frequency of different functions can be deduced from the
number of genes recruited to different eggNOG clusters, after normalization for gene
length and copy number (Supplementary Fig. 10a, b). We ranked all the clusters by gene
frequencies and determined the range that included the clusters specifying well- known
essential bacterial functions, such as those determined experi-mentally for a well-
studied firmicute, Bacillus subtilis27, hypothesizing that additional clusters in this range
are equally important. As expected, the range that included most of B. subtilis essential
clusters (86%) was at the very top of the ranking order (Fig. 5). Some 76% of the
clusters with essential genes of Escherichia coli28 were within this range, confirming
the validity of our approach. This suggests that 1,244 metagenomic clusters found
within the range (Supplementary Table 10; termed ‘range clusters’ hereafter) specify
functions important for life in the gut.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
To identify the functions encoded by the minimal gut genome we use the fact that they
should be present in most or all gut bacterial species and therefore appear in the gene
catalogue at a frequency above that of the functions present in only some of the gut
bacterial species. The relative frequency of different functions can be deduced from the
number of genes recruited to different eggNOG clusters, after normalization for gene
length and copy number (Supplementary Fig. 10a, b). We ranked all the clusters by gene
frequencies and determined the range that included the clusters specifying well- known
essential bacterial functions, such as those determined experi-mentally for a well-
studied firmicute, Bacillus subtilis27, hypothesizing that additional clusters in this range
are equally important. As expected, the range that included most of B. subtilis essential
clusters (86%) was at the very top of the ranking order (Fig. 5). Some 76% of the
clusters with essential genes of Escherichia coli28 were within this range, confirming
the validity of our approach. This suggests that 1,244 metagenomic clusters found
within the range (Supplementary Table 10; termed ‘range clusters’ hereafter) specify
functions important for life in the gut.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
40
30
20
10
0
Cluster(%)
1 2,001 4,001 6,001 8,001 10,001
Cluster rank
Range
Figure 5 | Clusters that contain the B. subtilis essential genes. The clusters
were ranked by the number of genes they contain, normalized by average
length and copy number (see Supplementary Fig. 10), and the proportion of
clusters with the essential B. subtilis genes was determined for successive
groups of 100 clusters. Range indicates the part of the cluster distribution
that contains 86% of the B. subtilis essential genes.
and
us as
$1%
cohort,
cerative
Two first
NATURE|Vol 464|4 March 2010
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
We found two types of functions among the range clusters: those required
in all bacteria (housekeeping) and those potentially specific for the gut.
Among many examples of the first category are the functions that are part
of main metabolic pathways (for example, central carbon metabolism,
amino acid synthesis), and important protein complexes (RNA and DNA
polymerase, ATP synthase, general secretory apparatus). Not surprisingly,
projection of the range clusters on the KEGG metabolic pathways gives a
highly integrated picture of the global gut cell metabolism (Fig. 6a).	

The putative gut-specific functions include those involved in adhe- sion
to the host proteins (collagen, fibrinogen, fibronectin) or in harvesting
sugars of the globoseries glycolipids, which are carried on blood and
epithelial cells. Furthermore, 15% of range clusters encode functions that
are present in ,10% of the eggNOG genomes (see Supplementary Fig. 11)
and are largely (74.3%) not defined (Fig. 6b). Detailed studies of these
should lead to a deeper compre- hension of bacterial life in the gut.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
We found two types of functions among the range clusters: those
required in all bacteria (housekeeping) and those potentially specific for
the gut. Among many examples of the first category are the functions
that are part of main metabolic pathways (for example, central carbon
metabolism, amino acid synthesis), and important protein complexes
(RNA and DNA polymerase, ATP synthase, general secretory apparatus).
Not surprisingly, projection of the range clusters on the KEGG metabolic
pathways gives a highly integrated picture of the global gut cell
metabolism (Fig. 6a).	

The putative gut-specific functions include those involved in adhe-sion
to the host proteins (collagen, fibrinogen, fibronectin) or in harvesting
sugars of the globoseries glycolipids, which are carried on blood and
epithelial cells. Furthermore, 15% of range clusters encode functions that
are present in ,10% of the eggNOG genomes (see Supplementary Fig. 11)
and are largely (74.3%) not defined (Fig. 6b). Detailed studies of these
should lead to a deeper comprehension of bacterial life in the gut.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
c
a b
map00565
map00350
map00629
resistance
β-Lactam
map00643
map00620
map00780
map00260
map00940
map00512
map00670
map00513
map00220
map00628
Limonene and pinene
degradation
map00040
map00632
map00563
map00920
biosynthesis II
Alkaloid
map00600
map00625
map00400
map00941
map00520
map00790
map00330
map00621
map00271
map00591
map00072
map00480
map00031
map00460
map00910
map00604
map00631
map00900
map00010
map00331
map00290
map00240
map00300
map00561
map00196
map00053
map00071
map00660
map00860
map00440
map00601
Tetracycline
biosynthesis
map00641
map00642
map00750
map00710
map00195
map00251
map00052
map00531
map00051
map00410
map00540
map00140
map00120
map00252
map00380
map00627
biosynthesis
Penicillins and cephalosporins
map00830
map00623
Monoterpenoid
biosynthesis
map00360
map00472
map00562
map00530
map00650
map00770
map00062
map00640
map00730
map00473
map00130
map00760
map00950
map00510
map00272
map00622
map00363
map00680
Diterpenoid
biosynthesis
Streptomycin
biosynthesis
map00340
map00791
map00564
map00020
map00500
map00720
map00362
map00310
map00230
map00550
map00630
map00603
map00471
map00901
map00602
map00590
map00351
map00626
map00030
map00534
map00532
map00190
map00740
map00430
map00624
map00061
map00150
biosynthesis
Novobiocin
map00280
map00906
map00100
map00361
map00930
map00450
Carbohydrate
metabolism
and metabolism
Glycan biosynthesis
metabolism
Amino acid
metabolism
Energy
Lipid
metabolism
xenobiotics
Biodegradation of
Metabolism of
other amino acids
metabolism
Nucleotide
Metabolism of
cofactors and vitamins
Biosynthesis of
secondary metabolites
0.0%
10.0%
20.0%
30.0%
General function - R
Translation - J
Amino acids - E
DNA - L
Unknown - S
Envelope - M
Carbohydrates - G
Energy - C
Transcription - K
Coenzymes - H
Nucleotides - F
Inorganic - P
Protein turnover - O
Lipids - I
Signal transduction - T
Secretion - U
Cell cycle - D
Defence- V
Second metabolites - Q
Cell motility - N
RNA - A
Chromatin - B
Extracellular - W
Nuclear structure - Y
Cytoskeleton - Z
Rare minimal
Rare minimal
Frequent min
Frequent min
Projection of the minimal gut genome on the
KEGG pathways using the iPath tool38. b,
Functional composition of the minimal gut
genome and metagenome. Rare and frequent
refer to the presence in sequenced eggNOG
genomes. c, Estimation of the minimal gut
metagenome size. Known orthologous groups
(red), known plus unknown orthologous groups
(blue) and orthologous groups plus novel gene
families (>20 proteins; grey) are shown (see
Fig. 2c for definition of box and whisker plot).
The inset shows composition of the gut minimal
microbiome. Large circle: classification in the
minimal metagenome according to orthologous
group occurrence in STRING739 bacterial
genomes. Common (25%), uncommon (35%)
and rare (45%) refer to functions that are
present in >50%, <50% but >10%, and <10%
of STRING bacteria genomes, respectively.
Small circle: composition of the rare
orthologous groups. Unknown (80%) have no
annotation or are poorly characterized, whereas
known bacterial (19%) and phage-related (1%)
orthologous groups have functional description.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
b
map00350
resistance
β-Lactam
map00780
map00260
map00670
map00220
map00628
map00632
biosynthesis II
Alkaloid
map00400
map00790
map00330
map00271
map00910
map00290
map00240
map00300
map00860
Tetracycline
biosynthesis
map00750
map00251
map00380
biosynthesis
Penicillins and cephalosporins
map00830
map00623
map00360
map00472
map00770
map00730
map00130
map00760
map00950
map00791
map00362
map00310
map00230
map00351
map00626
map00740
0
map00624
biosynthesis
Novobiocin
map00930
metabolism
Amino acid
sm
Metabolism of
other amino acids
bolism
eotide
Metabolism of
cofactors and vitamins
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
General function - R
Translation - J
Amino acids - E
DNA - L
Unknown - S
Envelope - M
Carbohydrates - G
Energy - C
Transcription - K
Coenzymes - H
Nucleotides - F
Inorganic - P
Protein turnover - O
Lipids - I
Signal transduction - T
Secretion - U
Cell cycle - D
Defence- V
Second metabolites - Q
Cell motility - N
RNA - A
Chromatin - B
Extracellular - W
Nuclear structure - Y
Cytoskeleton - Z
Rare minimal genome
Rare minimal metagenome
Frequent minimal genome
Frequent minimal metagenome
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Common
Uncommon
Rare
Unknown
Known
Phage-associated
2,000
4,000
6,000
8,000
10,000
12,000
14,000
Number of individuals sampled
Minimalmetagenomesize
1 25 50 75 100
c
map00629
map00940
map00628
Limonene and pinene
degradation
map00941
map00621
map00331 map00627
map00623
Monoterpenoid
biosynthesis
map00622
map00363Diterpenoid
biosynthesis
map00791
map00362
map00901
map00351
map00626
map00190
map00361
map00930
xenobiotics
Biodegradation of
Biosynthesis of
secondary metabolites
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
Cytoskeleton - Z
acterization of the minimal gut genome and metagenome.
the minimal gut genome on the KEGG pathways using the
composition of the gut minimal microbiome. Large circle: c
the minimal metagenome according to orthologous group o
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
To identify the functions encoded by the minimal gut metagenome, we
computed the orthologous groups that are shared by individuals of our
cohort. This minimal set, of 6,313 functions, is much larger than the one
estimated in a previous study8. There are only 2,069 functionally
annotated orthologous groups, showing that they gravely underesti- mate
the true size of the common functional complement among indi- viduals
(Fig. 6c). The minimal gut metagenome includes a considerable fraction
of functions (,45%) that are present in ,10% of the sequenced bacterial
genomes (Fig. 6c, inset). These otherwise rare func- tionalities that are
found in each of the 124 individuals may be necessary for the gut
ecosystem. Eighty per cent of these orthologous groups contain genes
with at best poorly characterized function, underscoring our limited
knowledge of gut functioning.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
To identify the functions encoded by the minimal gut metagenome, we
computed the orthologous groups that are shared by individuals of our
cohort. This minimal set, of 6,313 functions, is much larger than the one
estimated in a previous study8. There are only 2,069 functionally
annotated orthologous groups, showing that they gravely underesti- mate
the true size of the common functional complement among indi- viduals
(Fig. 6c). The minimal gut metagenome includes a considerable fraction
of functions (,45%) that are present in ,10% of the sequenced bacterial
genomes (Fig. 6c, inset). These otherwise rare func- tionalities that are
found in each of the 124 individuals may be necessary for the gut
ecosystem. Eighty per cent of these orthologous groups contain genes
with at best poorly characterized function, underscoring our limited
knowledge of gut functioning.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Of the known fraction, about 5% codes for (pro)phage-related proteins,
implying a universal presence and possible important eco- logical role of
bacteriophages in gut homeostasis. The most striking secondary
metabolism that seems crucial for the minimal metage- nome relates, not
unexpectedly, to biodegradation of complex sugars and glycans harvested
from the host diet and/or intestinal lining. Examples include degradation
and uptake pathways for pectin (and its monomer, rhamnose) and
sorbitol, sugars which are omni- present in fruits and vegetables, but
which are not or poorly absorbed by humans. As some gut
microorganisms were found to degrade both of them29,30, this capacity
seems to be selected for by the gut ecosystem as a non-competitive
source of energy. Besides these, capacity to ferment, for example,
mannose, fructose, cellulose and sucrose is also part of the minimal
metagenome. Together, these emphasize the strong dependence of the gut
ecosystem on complex sugar degrada- tion for its functioning.	

!
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Of the known fraction, about 5% codes for (pro)phage-related proteins,
implying a universal presence and possible important eco- logical role of
bacteriophages in gut homeostasis. The most striking secondary
metabolism that seems crucial for the minimal metage- nome relates,
not unexpectedly, to biodegradation of complex sugars and glycans
harvested from the host diet and/or intestinal lining. Examples
include degradation and uptake pathways for pectin (and its monomer,
rhamnose) and sorbitol, sugars which are omni- present in fruits and
vegetables, but which are not or poorly absorbed by humans. As some
gut microorganisms were found to degrade both of them29,30, this
capacity seems to be selected for by the gut ecosystem as a non-
competitive source of energy. Besides these, capacity to ferment, for
example, mannose, fructose, cellulose and sucrose is also part of the
minimal metagenome. Together, these emphasize the strong dependence
of the gut ecosystem on complex sugar degrada- tion for its functioning.	

!
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Functional complementarities of the genome and metagenome	

Detailed analysis of the complementarities between the gut metage-
nome and the human genome is beyond the scope of the present work. To
provide an overview, we considered two factors: conservation of the
functions in the minimal metagenome and presence/absence of func-
tions in one or the other (Supplementary Table 11). Gut bacteria use
mostly fermentation to generate energy, converting sugars, in part, to
short-chain fatty acid, that are used by the host as energy source. Acetate
is important for muscle, heart and brain cells31, propionate is used in
host hepatic neoglucogenic processes, whereas, in addition, butyrate is
important for enterocytes32. Beyond short-chain fatty acid, a number of
amino acids are indispensable to humans33 and can be provided by
bacteria34. Similarly, bacteria can contribute certain vitamins3 (for
example, biotin, phylloquinone) to the host. All of the steps of biosyn-
thesis of these molecules are encoded by the minimal metagenome
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Gut bacteria seem to be able to degrade numerous xenobiotics,
including non-modified and halogenated aromatic compounds (Sup-
plementary Table 11), even if the steps of most pathways are not part of
the minimal metagenome and are found in a fraction of individuals only.
A particularly interesting example is that of benzoate, which is a common
food supplement, known as E211. Its degradation by the coenzyme-A
ligation pathway, encoded in the minimal metagenome, leads to
pimeloyl-coenzyme-A, which is a precursor of biotin, indi- cating that
this food supplement can have a potentially beneficial role for human
health.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
DISCUSSION
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Discussion	

We have used extensive Illumina GA short-read-based sequencing of
total faecal DNA from a cohort of 124 individuals of European (Nordic
and Mediterranean) origin to establish a catalogue of non- redundant
human intestinal microbial genes. The catalogue contains 3.3 million
microbial genes, 150-fold more than the human gene complement, and
includes an overwhelming majority (.86%) of prevalent genes harboured
by our cohort. The catalogue probably contains a large majority of
prevalent intestinal microbial genes in the human population, for the
following reasons: (1) over 70% of the metagenomic reads from three
previous studies, including American and Japanese individuals8,16,17,
can be mapped on our contigs; (2) about 80% of the microbial genes
from 89 frequent gut reference genomes are present in our set. This result
represents a proof of principle that short-read sequencing can be used to
characterize complex microbiomes.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
The full bacterial gene complement of each individual was not sampled
in our work. Nevertheless, we have detected some 536,000 prevalent
unique genes in each, out of the total of 3.3 million carried by our cohort.
Inevitably, the individuals largely share the genes of the common pool.
At the present depth of sequencing, we found that almost 40% of the
genes from each individual are shared with at least half of the individuals
of the cohort. Future studies of world-wide span, envisaged within the
International Human Microbiome Consortium, will complete, as
necessary, our gene catalogue and establish boundaries to the proportion
of shared genes.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Essentially all (99.1%) of the genes of our catalogue are of bacterial
origin, the remainder being mostly archaeal, with only 0.1% of eukar-
yotic and viral origins. The gene catalogue is therefore equivalent to that
of some 1,000 bacterial species with an average-sized genome, encoding
about 3,364 non-redundant genes. We estimate that no more than 15% of
prevalent genes of our cohort may be missing from the catalogue, and
suggest that the cohort harbours no more than ,1,150 bacterial species
abundant enough to be detected by our sampling. Given the large overlap
between microbial sequences in this and previous studies we suggest that
the number of abundant intestinal bacterial species may be not much
higher than that observed in our cohort. Each individual of our cohort
harbours at least 160 such bacterial species, as estimated by the average
prevalent gene number, and many must thus be shared.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
`
We assigned about 12% of the reference set genes (404,000) to the 194
sequenced intestinal bacterial genomes, and can thus associate them with
bacterial species. Sequencing of at least 1,000 human- associated
bacterial genomes is foreseen within the International Human
Microbiome Consortium, via the Human Microbiome Project and
MetaHIT. This is commensurate with the number of dominant species in
our cohort and expected more broadly in human gut, and should enable a
much more extensive gene to species assign- ment. Nevertheless, we
used the presently available sequenced genomes to explore further the
concept of largely shared species among our cohort and identified 75
species common to .50% of individuals and 57 species common to .90%.
These numbers are likely to increase with the number of sequenced
reference strains and a deeper sampling. Indeed, a 2–3-fold increase in
sequencing depth raised by 25% the number of species that we could
detect as shared between two individuals. A large number of shared
species supports the view that the prevalent human microbiome is of a
finite and not overly large size.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
How can this view be reconciled with that of a considerable inter-
personal diversity of innumerable bacterial species in the gut, arising
from most previous studies using the 16S RNA marker gene4,8,10,11?
Possibly the depth of sampling of these studies was insufficient to reveal
common species when present at low abundance, and empha- sized the
difference in the composition of a relatively few dominant species. We
found a very high variability of abundance (12- to 2,200- fold) for the 57
most common species across the individuals of our cohort. Nevertheless,
a recent 16S rRNA-based study concluded that a common bacterial
species ‘core’, shared among at least 50% of individuals under study,
exists35	

!
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Detailed comparisons of bacterial genes across the individuals of our cohort will be
carried out in the future, within the context of the ongoing MetaHIT clinical studies of
which they are part. Nevertheless, clustering of the genes in families allowed us to
capture a virtually full functional potential of the prevalent gene set and revealed a
considerable novelty, extending the functional categories by some 30% in regard to
previous work8. Similarly, this analysis has revealed a functional core, conserved in
each individual of the cohort, which reflects the full minimal human gut metagenome,
encoded across many species and probably required for the proper functioning of the
gut ecosystem. The size of this minimal metagenome exceeds several-fold that of the
core metagenome reported previously8. It includes functions known to be important to
the host–bacterial inter- action, such as degradation of complex polysaccharides,
synthesis of short-chain fatty acids, indispensable amino acids and vitamins. Finally, we
also identified functions that we attribute to a minimal gut bacterial genome, likely to be
required by any bacterium to thrive in this ecosystem. Besides general housekeeping
functions, the minimal genome encompasses many genes of unknown function, rare in
sequenced genomes and possibly specifically required in the gut.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Beyond providing the global view of the human gut microbiome, the
extensive gene catalogue we have established enables future studies of
association of the microbial genes with human phenotypes and, even
more broadly, human living habits, taking into account the environment,
including diet, from birth to old age. We anticipate that these studies will
lead to a much more complete understanding of human biology than the
one we presently have.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
LETTERS
A core gut microbiome in obese and lean twins
Peter J. Turnbaugh1
, Micah Hamady3
, Tanya Yatsunenko1
, Brandi L. Cantarel5
, Alexis Duncan2
, Ruth E. Ley1
,
Mitchell L. Sogin6
, William J. Jones7
, Bruce A. Roe8
, Jason P. Affourtit9
, Michael Egholm9
, Bernard Henrissat5
,
Andrew C. Heath2
, Rob Knight4
& Jeffrey I. Gordon1
The human distal gut harbours a vast ensemble of microbes (the
microbiota) that provide important metabolic capabilities, includ-
ing the ability to extract energy from otherwise indigestible dietary
polysaccharides1–6
. Studies of a few unrelated, healthy adults have
revealed substantial diversity in their gut communities, as mea-
sured by sequencing 16S rRNA genes6–8
, yet how this diversity
relates to function and to the rest of the genes in the collective
genomes of the microbiota (the gut microbiome) remains obscure.
Studies of lean and obese mice suggest that the gut microbiota
affects energy balance by influencing the efficiency of calorie har-
vest from the diet, and how this harvested energy is used and
stored3–5
. Here we characterize the faecal microbial communities
of adult female monozygotic and dizygotic twin pairs concordant
for leanness or obesity, and their mothers, to address how host
genotype, environmental exposure and host adiposity influence
the gut microbiome. Analysis of 154 individuals yielded 9,920 near
full-length and 1,937,461 partial bacterial 16S rRNA sequences,
plus 2.14 gigabases from their microbiomes. The results reveal that
the human gut microbiome is shared among family members, but
that each person’s gut microbial community varies in the specific
bacterial lineages present, with a comparable degree of co-variation
between adult monozygotic and dizygotic twin pairs. However,
there was a wide array of shared microbial genes among sampled
individuals, comprising an extensive, identifiable ‘core micro-
biome’ at the gene, rather than at the organismal lineage, level.
Obesity is associated with phylum-level changes in the microbiota,
reduced bacterial diversity and altered representation of bacterial
genes and metabolic pathways. These results demonstrate that a
diversity of organismal assemblages can nonetheless yield a core
microbiome at a functional level, and that deviations from this core
are associated with different physiological states (obese compared
leanness (BMI 5 18.5–24.9 kg m22
) (one twin pair was lean/over-
weight (overweight defined as BMI $ 25 and , 30) and six pairs were
overweight/obese). They had not taken antibiotics for at least
5.49 6 0.09 months. Each participant completed a detailed medical,
lifestyle and dietary questionnaire: study enrolees were broadly
representative of the overall Missouri population for BMI, parity,
education and marital status (see Supplementary Results).
Although all were born in Missouri, they currently live throughout
the USA: 29% live in the same house, but some live more than 800 km
apart. Because faecal samples are readily attainable and representative
of interpersonal differences in gut microbial ecology7
, they were col-
lected from each individual and frozen immediately. The collection
procedure was repeated again with an average interval between
sampling of 57 6 4 days.
To characterize the bacterial lineages present in the faecal micro-
biotas of these 154 individuals, we performed 16S rRNA sequencing,
targeting the full-length gene with an ABI 3730xl capillary sequencer.
Additionally, we performed multiplex pyrosequencing with a 454
FLX instrument to survey the gene’s V2 variable region13
and its
V6 hypervariable region14
(Supplementary Tables 1–3).
Complementary phylogenetic and taxon-based methods were
used to compare 16S rRNA sequences among faecal communities
(see Methods). No matter which region of the gene was examined,
individuals from the same family (a twin and her co-twin, or twins
and their mother) had a more similar bacterial community structure
than unrelated individuals (Fig. 1a and Supplementary Fig. 1a, b),
and shared significantly more species-level phylotypes (16S rRNA
sequences with $97% identity comprise each phylotype)
(G 5 55.2, P , 10212
(V2); G 5 12.3, P , 0.001 (V6); G 5 11.3,
P , 0.001 (full-length)). No significant correlation was seen between
the degree of physical separation of family members’ current homes
Vol 457|22 January 2009|doi:10.1038/nature07540
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
in coverage, all analyses were performed on an equal number of
randomly selected sequences (200 full-length, 1,000 V2 and 10,000
V6). At this level of coverage, there was little overlap between the
sampled faecal communities. Moreover, the number of 16S rRNA
gene sequences belonging to each phylotype varied greatly between
Analysis of 16S rRNA data sets produced by the three P
methods, plus shotgun sequencing of community DNA (se
revealed a lower proportion of Bacteroidetes and a higher p
of Actinobacteria in obese compared with lean individua
ancestries (Supplementary Table 9). Combining the ind
values across these independent analyses using Fisher’s me
closed significantly fewer Bacteroidetes (P 5 0.003
Actinobacteria (P 5 0.002) but no significant diffe
Firmicutes (P 5 0.09). These findings agree with previ
showing comparable differences in both taxa in mice2
and a
ive increase in the representation of Bacteroidetes when 12 u
obese humans lost weight after being placed on one of two
calorie diets6
.
Across all methods, obesity was associated with a s
decrease in the level of diversity (Fig. 1b and Suppleme
1c–f). This reduced diversity suggests an analogy: the
microbiota is not like a rainforest or reef, which are adapte
energy flux and are highly diverse; rather, it may be m
fertilizer runoff where a reduced-diversity microbial co
blooms with abnormal energy input16
.
We subsequently characterized the microbial lineage
content of the faecal microbiomes of 18 individuals rep
six of the families (three lean and three obese European
monozygotic twin pairs and their mothers) through shotg
sequencing (Supplementary Tables 4 and 5) and BLASTX
isons against several databases (KEGG17
(version 44) and S
plus a custom database of 44 reference human gut microbia
(Supplementary Figs 7–10 and Supplementary Results). Ou
parameters were validated using control data sets compr
domly fragmented microbial genes with annotations in t
database17
(Supplementary Fig. 11 and Supplementary M
We also tested how technical advances that produce lon
might improve these assignments by sequencing faecal co
samples from one twin pair using Titanium pyrosequencing
(average read length of 341 6 134 nucleotides (s.d.) versus
nucleotides for the standard FLX method). Supplementa
shows that the frequency and quality of sequence assign
improved as read length increases from 200 to 350 nucleo
The 18 microbiomes were searched to identify sequences
domains from experimentally validated carbohydr
enzymes (CAZymes). Sequences matching 156 total CAZ
were found within at least one human gut microbiome, inc
glycoside hydrolase, 21 carbohydrate-binding module, 35
transferase, 12 polysaccharide lyase and 11 carbohydrat
families (Supplementary Table 10). On average, 2.62 6
b
a
*
0.66
0.68
0.70
0.72
0.74
0.76
0.78
0.80
0.82
Self
Twin–twin
Mono-
zygotic
Twin–mother
Unrelated
UniFracdistance
Dizygotic Mono-
zygotic
Dizygotic
2
22
42
62
82
102
122
0 2,000 4,000 6,000 8,000 1,0000
Number of sequences
Phylogeneticdiversity
Lean
Obese
*
***
MoresimilarMoredifferent
*
***
**
**
ns
Figure 1 | 16S rRNA gene surveys reveal familial similarity and reduced
diversity of the gut microbiota in obese individuals. a, Average unweighted
UniFrac distance (a measure of differences in bacterial community
structure) between individuals over time (self), twin pairs, twins and their
mother, and unrelated individuals (1,000 sequences per V2 data set;
Student’s t-test with Monte Carlo; *P , 1025
; **P , 10214
; ***P , 10241
;
mean 6 s.e.m.). b, Phylogenetic diversity curves for the microbiota of lean
and obese individuals (based on 1–10,000 sequences per V6 data set;
mean 6 95% confidence intervals shown).
NATURE|Vol 457|22 January 2009 L
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
and taxonomic similarity (see Supplementary Methods) disclosed a
significant association: individuals with similar taxonomic profiles
of b-diversity with respect to the relative abundance of bacte
(Fig. 3a), analysis of the relative abundance of broad functi
egories of genes and metabolic pathways (KEGG) revealed a
consistent pattern regardless of the sample surveyed (Fig
Supplementary Table 11): the pattern is also consistent wi
we obtained from a meta-analysis of previously published gu
biome data sets from nine adults20,21
(Supplementary Fig.
consistency is not simply due to the broad level of these ann
as a similar analysis of Bacteroidetes and Firmicutes refere
omes revealed substantial variation in the relative abundanc
category (see Supplementary Fig. 17). Furthermore, pairw
parisons of metabolic profiles obtained from the 18 microb
this study revealed an average value of R2
of 0.97 6 0.002
indicating a high level of functional similarity.
Overall functional diversity was compared using the
index22
, a measurement that combines diversity (the numb
ferent metabolic pathways) and evenness (the relative abun
each pathway). The human gut microbiomes surveyed had
and high Shannon index value (4.63 6 0.01), close to the m
possible level of functional diversity (5.54; see Suppl
Methods). Despite the presence of a small number of abunda
bolic pathways (listed in Supplementary Table 11), the ove
tional profile of each gut microbiome is quite even (Shannon
of 0.846 0.001 on a scale of 0–1), demonstrating that most m
pathways are found at a similar level of abundance. Interest
level of functional diversity in each microbiome was sig
linked to the relative abundance of the Bacteroidetes (R
P , 1026
); microbiomes enriched for Firmicutes/Actinobac
a lowerleveloffunctionaldiversity.Thisobservation isconsis
an analysis of simulated metagenomic reads generated from e
Bacteroidetes and Firmicutes genomes (Supplementary Fig
average, the Bacteroidetes genomes have a significantly high
both functional diversity and evenness (Mann–Whitne
P , 0.01).
At a finer level, 26–53% of ‘enzyme’-level functiona
(KEGG/CAZy/STRING) were shared across all 18 micr
whereas 8–22% of the groups were unique to a single mic
(Supplementary Fig. 19a–c). The ‘core’ functional groups p
all microbiomes were also highly abundant, representing 93
thetotal sequences. Given thehigherrelative abundance of th
groups, more than 95% were found after 26.11 6 2.02 meg
sequence were collected from a given microbiome, whereas
able’ groups continued to increase substantially with each a
megabase of sequence. Of course, any estimate of the total s
core microbiome will depend on sequencing effort, espe
0.990.980.97R2 value:
PC1 (20%)
Bacteroidetes(%)
a
b c
High Firmicutes/ActinobacteriaHigh Bacteroidetes
Twin versus
twin
Twin versus
mother
Unrelated
pairs
Functionalsimilarity(R2)
*
0.94
0.95
0.96
0.97
0.98
0.99
1
R2 = 0.96
0
20
40
60
80
100
–0.6 –0.4 –0.2 0 0.2 0.4 0.6
1.00 0.98 0.98 0.99 0.99 0.99 0.98 0.98 0.99 0.98
0.98 1.00 0.99 0.99 0.97 0.97 0.98 0.98 0.98 0.97 0.98 0.99
0.98 0.99 1.00 1.00 0.98 0.98 0.98 0.98 0.99 0.99 0.99 0.98 0.98 0.99
0.99 0.99 1.00 1.00 0.99 0.99 0.99 0.97 0.99 1.00 1.00 0.99 0.98 0.98 0.99
0.99 0.98 0.99 1.00 0.99 0.98 0.97 0.99 0.99 0.99 0.99
0.99 0.97 0.98 0.99 0.99 1.00 0.97 0.98 0.99 0.99 0.99
0.97 0.98 0.99 0.98 0.97 1.00 0.97 0.98 0.99 0.99 0.98 0.98 0.97 0.98
0.97 1.00 0.99 0.98
0.98 0.97 0.99 0.98 0.97 0.99 1.00 0.99 0.99 0.98
0.98 0.98 0.99 0.99 0.99 0.98 0.98 0.99 1.00 0.99 0.99
0.99 0.98 0.99 1.00 0.99 0.99 0.99 0.99 0.99 1.00 0.99 0.98 0.97 0.97 0.98
0.98 0.98 0.99 1.00 0.99 0.99 0.99 0.98 0.99 0.99 1.00 0.98 0.98 0.97 0.98
0.98 0.99 0.99 0.98 0.98 0.98 1.00 0.99 0.97 0.98 0.99 0.99
0.97 0.98 0.98 0.98 0.97 0.98 0.99 1.00 0.98 0.99 0.99 0.99
0.97 0.98 1.00 1.00 0.99 0.98
0.98 0.99 1.00 1.00 0.99 0.98
0.98 0.98 0.98 0.97 0.97 0.97 0.99 0.99 0.99 0.99 1.00 0.99
0.99 0.99 0.99 0.98 0.98 0.98 0.99 0.99 0.98 0.98 0.99 1.00
F1T1Le
F1T2Le
F1MOv
F2T1Le
F2T2Le
F2MOb
F3T1Le
F3T2Le
F3MOv
F4T1Ob
F4T2Ob
F4MOb
F5T1Ob
F5T2Ob
F5MOv
F6T1Ob
F6T2Ob
F6MOb
F1T1Le
F1T2Le
F1MOv
F2T1Le
F2T2Le
F2MOb
F3T1Le
F3T2Le
F3MOv
F4T1Ob
F4T2Ob
F4MOb
F5T1Ob
F5T2Ob
F5MOv
F6T1Ob
F6T2Ob
F6MOb
<0.97
Figure 2 | Metabolic-pathway-based clustering and analysis of the human
gut microbiome of monozygotic twins. a, Clustering of functional profiles
based on the relative abundance of KEGG metabolic pathways. All pairwise
comparisons were made of the profiles by calculating each R2
value. Sample
identifier nomenclature: family number, twin number or mother, and BMI
category (Le, lean; Ov, overweight; Ob, obese; for example, F1T1Le stands
for family 1, twin 1, lean). b, The relative abundance of Bacteroidetes as a
function of the first principal component derived from an analysis of KEGG
metabolic profiles. c, Comparisons of functional similarity between twin
pairs, between twins and their mother, and between unrelated individuals.
Asterisk indicates significant differences (Student’s t-test with Monte Carlo;
P , 0.01; mean 6 s.e.m.).
a COG categoriesBacterial phylum b
LETTERS NATURE|Vol 457|22 Janu
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
ds) disclosed a
nomic profiles
el test).
s representing
icutes and 14
clustering by
analyses of the
biome-derived
d 26 pathways
upplementary
eral carbohyd-
were enriched
our CAZyme
abundance of
glycosyltrans-
erases in the
n Microbiome
ntifiable ‘core
al capabilities
y of humans1
.
ed a high level
able’ groups continued to increase substantially with each additional
megabase of sequence. Of course, any estimate of the total size of the
core microbiome will depend on sequencing effort, especially for
h Monte Carlo;
[Q]
[P]
[I]
[H]
[F]
[E]
[G]
[C]
[S]
[R]
[O]
[U]
[W]
[Z]
[N]
[M]
[T]
[V]
[Y]
[D]
[B]
[L]
[K]
[A]
[J]
0
20
40
60
80
100
Relativeabundance(%)
OtherProteobacteria
F1T1Le
F1T2Le
F1MOv
F2T1Le
F2T2Le
F2MOb
F3T1Le
F3T2Le
F3MOv
F4T1Ob
F4T2Ob
F4MOb
F5T1Ob
F5T2Ob
F5MOv
F6T1Ob
F6T2Ob
F6MOb
F1T1Le
F1T2Le
F1MOv
F2T1Le
F2T2Le
F2MOb
F3T1Le
F3T2Le
F3MOv
F4T1Ob
F4T2Ob
F4MOb
F5T1Ob
F5T2Ob
F5MOv
F6T1Ob
F6T2Ob
F6MOb
a COG categoriesBacterial phylum b
ActinobacteriaBacteroidetesFirmicutes
Figure 3 | Comparison of taxonomic and functional variations in the human
gut microbiome. a, Relative abundance of major phyla across 18 faecal
microbiomes from monozygotic twins and their mothers, based on BLASTX
comparisons of microbiomes and the National Center for Biotechnology
Information non-redundant database. b, Relative abundance of categories of
genes across each sampled gut microbiome (letters correspond to categories
in the COG database).
millan Publishers Limited. All rights reserved
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
including those involved in transcription and translation (Fig. 4).
Metabolic profile-based clustering indicated that the representation
of ‘core’ functional groups was highly consistent across samples
(Supplementary Fig. 20), and included several pathways that are
To identify metabolic pathways associated with obesity, o
core associated (variable) functional groups were included i
parison of the gut microbiomes of lean versus obese twin
bootstrap analysis23
was used to identify metabolic pathw
were enriched or depleted in the variable obese gut micr
For example, similar to a mouse model of diet-induced
the obese human gut microbiome was enriched for phospho
ase systems involved in microbial processing of carbo
(Supplementary Table 12). All gut microbiome sequences w
pared with the custom database of 44 human gut genomes:
ratio analysis revealed 383 genes that were significantly
between the obese and lean gut microbiome (q value , 0
enriched and 110 depleted in the obese micr
Supplementary Tables 13 and 14). By contrast, only 49 ge
consistently enriched or depleted between all twin p
Supplementary Methods).
These obesity-associated genes were representative of t
nomic differences described above: 75% of the obesity-
genes were from Actinobacteria (compared with 0% of lean-
genes; the other 25% are from Firmicutes) whereas 42% of
enriched genes were from Bacteroidetes (compared with 0
obesity-enriched genes). Their functional annotation indic
many are involved in carbohydrate, lipid and amino-acid
ism (Supplementary Tables 13 and 14). Together, they com
initial set of microbial biomarkers of the obese gut microbi
Our finding that the gut microbial community structures
monozygotic twin pairs had a degree of similarity that was
able to that of dizygotic twin pairs, and only slightly mor
than that of their mothers, is consistent with an earlier finger
study of adult twins24
, and with a recent microarray-based
which revealed that gut community assembly during the fir
life followed a more similar pattern in a pair of dizygotic tw
12 unrelated infants25
. Intriguingly, another fingerprinting
monozygotic and dizygotic twins in childhood showed a
reduced similarity profile in dizygotic twins26
. Thus, compr
time-course studies, comparing monozygotic and dizygo
pairs from birth through adulthood, as well as intergen
analyses of their families’ microbiotas, will be key to dete
the relative contributions of host genotype and environmen
sures to (gut) microbial ecology.
The hypothesis that there is a core human gut microbiom
able by a set of abundant microbial organismal lineages th
share, may be incorrect: by adulthood, no single bacterial p
was detectable at an abundant frequency in the guts o
sampled humans. Instead, it appears that a core gut mic
exists at the level of shared genes, including an important com
involved in various metabolic functions. This conservation s
high degree of redundancy in the gut microbiome and sup
ecological view of each individual as an ‘island’ inhabited b
0 2 4 6 8 10 12 14
Transcription
Translation
Nucleotide metabolism
Amino-acid metabolism
Biosynthesis of
secondary metabolites
Replication and repair
Metabolism of
other amino acids
Glycan biosynthesis
and metabolism
Carbohydrate metabolism
Lipid metabolism
Biosynthesis of polyketides
Cell growth and death
Metabolism of cofactors
and vitamins
Energy metabolism
Xenobiotics biodegradation
and metabolism
Genetic information
processing protein families
Metabolism
protein families
Metabolism unclassified
Membrane transport
Folding, sorting,
and degradation
Cellular processes and
signalling protein families
Cellular processes and
signalling unclassified
Signal transduction
Poorly characterized
unclassified
Genetic information
processing unclassified
Cell motility
Signalling molecules
and interaction
Relative abundance (percentage of KEGG assignments)
KEGG category
Core
Variable
***
*
**
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
Figure 4 | KEGG categories enriched or depleted in the core versus variable
components of the gut microbiome. Sequences from each of the 18 faecal
microbiomes were binned into the ‘core’ or ‘variable’ microbiome based on
the co-occurrence of KEGG orthologous groups (core groups were found in
all 18 microbiomes whereas variable groups were present in fewer (,18)
microbiomes; see Supplementary Fig. 19a). Asterisks indicate significant
differences (Student’s t-test, *P , 0.05, **P , 0.001, ***P , 1025
;
mean 6 s.e.m.).
Macmillan Publishers Limited. All rights reserved©2009
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics

Contenu connexe

Tendances

UC Davis EVE161 Lecture 10 by @phylogenomics
UC Davis EVE161 Lecture 10 by @phylogenomicsUC Davis EVE161 Lecture 10 by @phylogenomics
UC Davis EVE161 Lecture 10 by @phylogenomics
Jonathan Eisen
 

Tendances (20)

EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13EVE 161 Winter 2018 Class 13
EVE 161 Winter 2018 Class 13
 
EVE 161 Lecture 6
EVE 161 Lecture 6EVE 161 Lecture 6
EVE 161 Lecture 6
 
EveMicrobial Phylogenomics (EVE161) Class 9
EveMicrobial Phylogenomics (EVE161) Class 9EveMicrobial Phylogenomics (EVE161) Class 9
EveMicrobial Phylogenomics (EVE161) Class 9
 
EVE 161 Winter 2018 Class 18
EVE 161 Winter 2018 Class 18EVE 161 Winter 2018 Class 18
EVE 161 Winter 2018 Class 18
 
Microbial Phylogenomics (EVE161) Class 7: rRNA PCR and Major Groups
Microbial Phylogenomics (EVE161) Class 7: rRNA PCR and Major Groups Microbial Phylogenomics (EVE161) Class 7: rRNA PCR and Major Groups
Microbial Phylogenomics (EVE161) Class 7: rRNA PCR and Major Groups
 
UC Davis EVE161 Lecture 17 by @phylogenomics
 UC Davis EVE161 Lecture 17 by @phylogenomics UC Davis EVE161 Lecture 17 by @phylogenomics
UC Davis EVE161 Lecture 17 by @phylogenomics
 
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingMicrobial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
 
UC Davis EVE161 Lecture 9 by @phylogenomics
UC Davis EVE161 Lecture 9 by @phylogenomicsUC Davis EVE161 Lecture 9 by @phylogenomics
UC Davis EVE161 Lecture 9 by @phylogenomics
 
UC Davis EVE161 Lecture 15 by @phylogenomics
UC Davis EVE161 Lecture 15 by @phylogenomicsUC Davis EVE161 Lecture 15 by @phylogenomics
UC Davis EVE161 Lecture 15 by @phylogenomics
 
Microbial Phylogenomics (EVE161) Class 3: Woese and the Tree of Life
Microbial Phylogenomics (EVE161) Class 3: Woese and the Tree of LifeMicrobial Phylogenomics (EVE161) Class 3: Woese and the Tree of Life
Microbial Phylogenomics (EVE161) Class 3: Woese and the Tree of Life
 
Microbial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 16: Shotgun MetagenomicsMicrobial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
 
EVE 161 Winter 2018 Class 15
EVE 161 Winter 2018 Class 15EVE 161 Winter 2018 Class 15
EVE 161 Winter 2018 Class 15
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
American Gut Project presentation at Masaryk University
American Gut Project presentation at Masaryk UniversityAmerican Gut Project presentation at Masaryk University
American Gut Project presentation at Masaryk University
 
Marine Host-Microbiome Interactions: Challenges and Opportunities
Marine Host-Microbiome Interactions: Challenges and OpportunitiesMarine Host-Microbiome Interactions: Challenges and Opportunities
Marine Host-Microbiome Interactions: Challenges and Opportunities
 
UC Davis EVE161 Lecture 8 - rRNA ecology - by Jonathan Eisen @phylogenomics
UC Davis EVE161 Lecture 8 - rRNA ecology - by Jonathan Eisen @phylogenomicsUC Davis EVE161 Lecture 8 - rRNA ecology - by Jonathan Eisen @phylogenomics
UC Davis EVE161 Lecture 8 - rRNA ecology - by Jonathan Eisen @phylogenomics
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
 
The Seagrass Microbiome Project
The Seagrass Microbiome Project The Seagrass Microbiome Project
The Seagrass Microbiome Project
 
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
 
UC Davis EVE161 Lecture 10 by @phylogenomics
UC Davis EVE161 Lecture 10 by @phylogenomicsUC Davis EVE161 Lecture 10 by @phylogenomics
UC Davis EVE161 Lecture 10 by @phylogenomics
 

Similaire à UC Davis EVE161 Lecture 18 by @phylogenomics

Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
Vall d'Hebron Institute of Research (VHIR)
 
Gut bacteria in young diabetic kids show differences
Gut bacteria in young diabetic kids show differencesGut bacteria in young diabetic kids show differences
Gut bacteria in young diabetic kids show differences
Univ. of Tripoli
 
Human genome project(ibri)
Human genome project(ibri)Human genome project(ibri)
Human genome project(ibri)
ajay vishwakrma
 

Similaire à UC Davis EVE161 Lecture 18 by @phylogenomics (20)

The Systems Biology Dynamics of the Human Immune System and Gut Microbiome
The Systems Biology Dynamics of the Human Immune System and Gut MicrobiomeThe Systems Biology Dynamics of the Human Immune System and Gut Microbiome
The Systems Biology Dynamics of the Human Immune System and Gut Microbiome
 
High-throughput Sequencing Analysis and Function Prediction of Lung Microbiot...
High-throughput Sequencing Analysis and Function Prediction of Lung Microbiot...High-throughput Sequencing Analysis and Function Prediction of Lung Microbiot...
High-throughput Sequencing Analysis and Function Prediction of Lung Microbiot...
 
Discovering the Other 90% of our Human Superorganism
Discovering the Other 90% of our Human SuperorganismDiscovering the Other 90% of our Human Superorganism
Discovering the Other 90% of our Human Superorganism
 
Using Genetic Sequencing to Unravel the Dynamics of Your Superorganism Body
Using Genetic Sequencing to Unravel the Dynamics of Your Superorganism BodyUsing Genetic Sequencing to Unravel the Dynamics of Your Superorganism Body
Using Genetic Sequencing to Unravel the Dynamics of Your Superorganism Body
 
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
 
Big Data and Superorganism Genomics: Microbial Metagenomics Meets Human Genomics
Big Data and Superorganism Genomics: Microbial Metagenomics Meets Human GenomicsBig Data and Superorganism Genomics: Microbial Metagenomics Meets Human Genomics
Big Data and Superorganism Genomics: Microbial Metagenomics Meets Human Genomics
 
The Human Microbiome and the Revolution in Digital Health
The Human Microbiome and the Revolution in Digital HealthThe Human Microbiome and the Revolution in Digital Health
The Human Microbiome and the Revolution in Digital Health
 
Micro
MicroMicro
Micro
 
Tracking Large Variations in My Immune Biomarkers and My Gut Microbiome: Infl...
Tracking Large Variations in My Immune Biomarkers and My Gut Microbiome: Infl...Tracking Large Variations in My Immune Biomarkers and My Gut Microbiome: Infl...
Tracking Large Variations in My Immune Biomarkers and My Gut Microbiome: Infl...
 
Measuring the Human Brain-Gut Microbiome-Immune System Dynamics: a Big Data C...
Measuring the Human Brain-Gut Microbiome-Immune System Dynamics: a Big Data C...Measuring the Human Brain-Gut Microbiome-Immune System Dynamics: a Big Data C...
Measuring the Human Brain-Gut Microbiome-Immune System Dynamics: a Big Data C...
 
Biofilms
BiofilmsBiofilms
Biofilms
 
Gut bacteria in young diabetic kids show differences
Gut bacteria in young diabetic kids show differencesGut bacteria in young diabetic kids show differences
Gut bacteria in young diabetic kids show differences
 
Human genome project(ibri)
Human genome project(ibri)Human genome project(ibri)
Human genome project(ibri)
 
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism BodyFrom N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
From N=1 to N=100: What I Have Learned from Quantifying My Superorganism Body
 
Interactions of the Immune System with the Gut Microbiome in Inflammatory Bo...
Interactions of the Immune System with the Gut Microbiome in Inflammatory Bo...Interactions of the Immune System with the Gut Microbiome in Inflammatory Bo...
Interactions of the Immune System with the Gut Microbiome in Inflammatory Bo...
 
Publication 3 - 3rd Author
Publication 3 - 3rd AuthorPublication 3 - 3rd Author
Publication 3 - 3rd Author
 
Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Quantifying the Time Progression of the Interaction of the Human Immune Syste...Quantifying the Time Progression of the Interaction of the Human Immune Syste...
Quantifying the Time Progression of the Interaction of the Human Immune Syste...
 
Wips
WipsWips
Wips
 
Fungal Contamination
Fungal ContaminationFungal Contamination
Fungal Contamination
 
Seminario
Seminario Seminario
Seminario
 

Plus de Jonathan Eisen

EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID Vaccines
Jonathan Eisen
 

Plus de Jonathan Eisen (20)

Eisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfEisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdf
 
Phylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesPhylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of Microbes
 
Talk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingTalk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meeting
 
Thoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsThoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current Actions
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2
 
EVE198 Summer Session Class 4
EVE198 Summer Session Class 4EVE198 Summer Session Class 4
EVE198 Summer Session Class 4
 
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1
 
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines
 
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionEVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 Introduction
 
EVE198 Spring2021 Class2
EVE198 Spring2021 Class2EVE198 Spring2021 Class2
EVE198 Spring2021 Class2
 
EVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesEVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 Vaccines
 
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionEVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA Detection
 
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionEVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 Introduction
 
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingEVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID Testing
 
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID Vaccines
 
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionEVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID Transmission
 
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesEVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
 
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingEVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
 
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionEVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
 
Evolution of microbiomes and the evolution of the study and politics of micro...
Evolution of microbiomes and the evolution of the study and politics of micro...Evolution of microbiomes and the evolution of the study and politics of micro...
Evolution of microbiomes and the evolution of the study and politics of micro...
 

Dernier

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Dernier (20)

Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 

UC Davis EVE161 Lecture 18 by @phylogenomics

  • 1. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Lecture 18: EVE 161:
 Microbial Phylogenomics ! Lecture #18: Era IV: Metagenomics Case Study ! UC Davis, Winter 2014 Instructor: Jonathan Eisen !1
  • 2. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 • Next week Student Presentations • Each student gets 10 minutes total • Eight minutes to present and 2 minutes for questions • Possible presentation timing ! 2 minutes Overview and Methods ! 4 minutes R & D ! 2 minutes Conclusions and Future Ideas ! 2 minutes Questions • Contact Holly Ganz hhganz@ucdavis.edu for non Eisen guidance
  • 3. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 ARTICLES A human gut microbial gene catalogue established by metagenomic sequencing Junjie Qin1 *, Ruiqiang Li1 *, Jeroen Raes2,3 , Manimozhiyan Arumugam2 , Kristoffer Solvsten Burgdorf4 , Chaysavanh Manichanh5 , Trine Nielsen4 , Nicolas Pons6 , Florence Levenez6 , Takuji Yamada2 , Daniel R. Mende2 , Junhua Li1,7 , Junming Xu1 , Shaochuan Li1 , Dongfang Li1,8 , Jianjun Cao1 , Bo Wang1 , Huiqing Liang1 , Huisong Zheng1 , Yinlong Xie1,7 , Julien Tap6 , Patricia Lepage6 , Marcelo Bertalan9 , Jean-Michel Batto6 , Torben Hansen4 , Denis Le Paslier10 , Allan Linneberg11 , H. Bjørn Nielsen9 , Eric Pelletier10 , Pierre Renault6 , Thomas Sicheritz-Ponten9 , Keith Turner12 , Hongmei Zhu1 , Chang Yu1 , Shengting Li1 , Min Jian1 , Yan Zhou1 , Yingrui Li1 , Xiuqing Zhang1 , Songgang Li1 , Nan Qin1 , Huanming Yang1 , Jian Wang1 , Søren Brunak9 , Joel Dore´6 , Francisco Guarner5 , Karsten Kristiansen13 , Oluf Pedersen4,14 , Julian Parkhill12 , Jean Weissenbach10 , MetaHIT Consortium{, Peer Bork2 , S. Dusko Ehrlich6 & Jun Wang1,13 To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals. The gene set, ,150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent) microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entire cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions present in all individuals and most bacteria, respectively. Vol 464|4 March 2010|doi:10.1038/nature08821
  • 4. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 ARTICLES A human gut microbial gene catalogue established by metagenomic sequencing Junjie Qin1 *, Ruiqiang Li1 *, Jeroen Raes2,3 , Manimozhiyan Arumugam2 , Kristoffer Solvsten Burgdorf4 , Chaysavanh Manichanh5 , Trine Nielsen4 , Nicolas Pons6 , Florence Levenez6 , Takuji Yamada2 , Daniel R. Mende2 , Junhua Li1,7 , Junming Xu1 , Shaochuan Li1 , Dongfang Li1,8 , Jianjun Cao1 , Bo Wang1 , Huiqing Liang1 , Huisong Zheng1 , Yinlong Xie1,7 , Julien Tap6 , Patricia Lepage6 , Marcelo Bertalan9 , Jean-Michel Batto6 , Torben Hansen4 , Denis Le Paslier10 , Allan Linneberg11 , H. Bjørn Nielsen9 , Eric Pelletier10 , Pierre Renault6 , Thomas Sicheritz-Ponten9 , Keith Turner12 , Hongmei Zhu1 , Chang Yu1 , Shengting Li1 , Min Jian1 , Yan Zhou1 , Yingrui Li1 , Xiuqing Zhang1 , Songgang Li1 , Nan Qin1 , Huanming Yang1 , Jian Wang1 , Søren Brunak9 , Joel Dore´6 , Francisco Guarner5 , Karsten Kristiansen13 , Oluf Pedersen4,14 , Julian Parkhill12 , Jean Weissenbach10 , MetaHIT Consortium{, Peer Bork2 , S. Dusko Ehrlich6 & Jun Wang1,13 To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals. The gene set, ,150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent) microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entire cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions present in all individuals and most bacteria, respectively. Vol 464|4 March 2010|doi:10.1038/nature08821 THAT”S A LOT OF AUTHORS
  • 5. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 METHODS
  • 6. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Human faecal samples were collected, frozen immediately and DNA was purified by standard methods22. For all 124 individuals, paired-end libraries were constructed with different clone insert sizes and subjected to Illumina GA sequencing. All reads were assembled using SOAPdenovo19, with specific parameter ‘2M 3’ for metagenomics data. MetaGene was used for gene prediction. A non-redundant gene set was constructed by pair-wise comparison of all genes, using BLAT36 under the criteria of identity .95% and overlap .90%. Gene taxonomic assignments were made on the basis of BLASTP37 search (e-value ,1 3 1025) of the NCBI-NR database and 126 known gut bacteria genomes. Gene functional annotations were made by BLASTP search (e-value ,1 3 1025) with eggNOG and KEGG (v48.2) databases. The total and shared number of orthologous groups and/or gene families were computed using a random combination of n individuals (with n 5 2 to 124, 100 replicates per bin).
  • 7. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 As part of the MetaHIT (Metagenomics of the Human Intestinal Tract) project, we collected faecal specimens from 124 healthy, over- weight and obese individual human adults, as well as inflammatory bowel disease (IBD) patients, from Denmark and Spain (Supplementary Table 1)
  • 8. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 As part of the MetaHIT (Metagenomics of the Human Intestinal Tract) project, we collected faecal specimens from 124 healthy, over- weight and obese individual human adults, as well as inflammatory bowel disease (IBD) patients, from Denmark and Spain (Supplementary Table 1)
  • 9. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Supplementary Tables Table 1 | DNA sample information. All Danish individuals in the present subsample were originally recruited from a larger population-based sample of middle-aged people living in the northern part of Copenhagen region and sampled from the centralized personal number register. At the original recruitment the individuals included in the present study had normal fasting plasma glucose and normal 2 hour plasma glucose following an oral glucose tolerance test. At the time of fecal sampling all were examined in the fasting state and had non-diabetic fasting plasma glucose levels below 7,0 mmol/l. All of the IBD patients were in clinical remission at the time of fecal sampling. N refers to no IBD, CD & UC to Crohn’s disease and ulcerative colitis, respectively. Sample Name Country Gender Age BMI IBD MH0001 Denmark female 49 25.55 N MH0002 Denmark female 59 27.28 N MH0003 Denmark male 69 33.19 N MH0004 Denmark male 59 31.18 N MH0005 Denmark male 64 21.68 N MH0006 Denmark female 59 22.38 N MH0007 Denmark male 69 33.60 N MH0008 Denmark male 59 24.35 N MH0009 Denmark male 64 29.04 N MH0010 Denmark male 64 33.27 N MH0011 Denmark female 0 22.31 N MH0012 Denmark female 42 32.10 N MH0013 Denmark male 54 20.46 N MH0014 Denmark female 54 38.49 N MH0015 Denmark male 59 25.47 N MH0016 Denmark female 49 30.50 N MH0017 Denmark male 64 21.81 N MH0018 Denmark male 49 31.37 N MH0019 Denmark female 44 20.01 N MH0020 Denmark female 63 33.23 N MH0021 Denmark female 49 25.42 N MH0022 Denmark male 64 24.42 N MH0023 Denmark male 69 31.74 N MH0024 Denmark female 59 22.72 N MH0025 Denmark female 49 34.20 N MH0026 Denmark female 49 37.32 N MH0027 Denmark female 59 23.07 N MH0028 Denmark female 44 22.70 N MH0030 Denmark male 59 35.21 N MH0031 Denmark male 69 22.34 N MH0032 Denmark male 69 35.28 N MH0033 Denmark female 59 31.95 N
  • 10. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 METHODS! & ! RESULTS! (mixed)
  • 11. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Total DNA was extracted from the faecal specimens18 and an average of 4.5 Gb (ranging between 2 and 7.3 Gb) of sequence was generated for each sample, allowing us to capture most of the novelty (see Methods and Supplementary Table 2). In total, we obtained 576.7 Gb of sequence (Supplementary Table 3). !
  • 12. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Total DNA was extracted from the faecal specimens18 and an average of 4.5 Gb (ranging between 2 and 7.3 Gb) of sequence was generated for each sample, allowing us to capture most of the novelty (see Methods and Supplementary Table 2). In total, we obtained 576.7 Gb of sequence (Supplementary Table 3). !
  • 13. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Table 2 | Summary of Sanger reads. The reads were sequenced by 3730xl. Low-quality sequences at both ends with phred score less than 20 were trimmed. Very short reads with length less than 100 bp were filtered. Sample ID # Sanger reads Average length (bp) Total length (bp) MH0006 237,567 660.65 156,949,306 MH0012 230,768 670.26 154,675,458
  • 14. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Table 3 | Summary of Illumina GA reads. We constructed libraries with three different insert sizes of about 135 bp, 200 bp, and 400 bp. The insert sizes of each library were estimated by re-aligning the paired-end reads on the assembled contigs. Sample ID Paired-end insert size (bp) Read length (bp) # of reads Data (Gb) human reads, % # of high quality reads MH0047 136/378 75 35,355,400 2.65 0.18 26,932,064 MH0021 134/354 75 36,454,400 2.73 0.12 26,258,326 MH0079 135/360 75 38,011,600 2.85 0.40 27,418,899 MH0078 146/373 75 38,038,200 2.85 1.56 26,051,537 MH0052 141/367 75 39,538,000 2.97 0.08 28,575,036 MH0049 134/343 75 40,444,200 3.03 0.06 30,654,842 MH0076 134/409 75 40,697,000 3.05 0.42 30,650,106 MH0051 143/374 75 41,911,800 3.14 0.32 25,963,104 MH0048 143/349 75 42,923,600 3.22 0.26 26,972,970 O2.UC-14 141/355 75 43,343,000 3.25 0.06 26,942,750 MH0015 235 44 44,671,400 1.97 0.04 33,014,675 MH0018 233 44 45,081,400 1.98 2.14 36,609,695 MH0027 238 44 45,190,000 1.99 0.09 32,377,390 MH0017 223 44 45,557,200 2.00 0.04 36,154,362 MH0022 256 44 46,415,000 2.04 0.21 37,112,508 MH0023 237 44 48,598,400 2.14 0.04 37,782,998 MH0019 249 44 49,229,400 2.17 0.06 38,856,780 MH0026 156/398 75 49,812,000 3.74 0.05 37,484,066 MH0013 238 44 50,257,200 2.21 1.63 40,028,120 MH0005 237 44 50,704,800 2.23 0.23 39,407,333 MH0007 195 44 50,719,800 2.23 0.31 36,956,284 MH0008 219 44 51,411,000 2.26 0.10 38,156,496 V1.UC-7 141/356 75 51,911,400 3.89 14.67 36,788,540 MH0010 220 44 52,218,200 2.30 0.08 39,169,850 V1.CD-12 148/361 75 53,519,400 4.01 0.02 40,609,134 O2.UC-20 141/362 75 53,637,200 4.02 0.03 38,376,747 V1.CD-15 143/351 75 53,938,600 4.05 2.85 40,560,446 O2.UC-19 133/352 75 54,537,600 4.09 0.01 38,459,550 MH0004 218 44 55,829,800 2.46 0.95 40,288,492 MH0062 144/357 75 57,128,400 4.28 14.32 36,809,224 MH0066 147/429 75 57,234,200 4.29 0.05 36,114,997 O2.UC-21 142/362 75 57,856,000 4.34 0.03 34,832,308 V1.CD-13 139/352 75 58,145,800 4.36 0.04 42,560,831 MH0080 140/376 75 58,220,800 4.37 0.13 46,590,749 V1.UC-13 131/352 75 58,381,400 4.38 9.77 38,553,580 MH0032 142/370 75 58,822,400 3.93 0.39 50,110,067 O2.UC-12 153/384 75 58,927,800 4.42 0.12 36,908,526 MH0001 214 44 59,239,200 2.61 0.06 45,016,612 V1.UC-6 142/376 75 59,270,800 4.45 0.41 43,150,856 MH0060 142/367 75 60,156,000 4.51 0.07 41,112,227 MH0053 137/416 75 60,788,600 4.56 0.07 43,283,564 MH0002 139/370 75 61,077,000 4.58 0.15 46,570,095 O2.UC-11 142/377 75 61,253,800 4.59 0.14 38,507,042 MH0059 136/370 75 61,574,600 4.62 0.18 41,025,606
  • 15. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Wanting to generate an extensive catalogue of microbial genes from the human gut, we first assembled the short Illumina reads into longer contigs, which could then be analysed and annotated by standard methods. Using SOAPdenovo19, a de Bruijn graph-based tool specially designed for assembling very short reads, we performed de novo assembly for all of the Illumina GA sequence data. Because a high diversity between individuals is expected8,16,17, we first assembled each sample independently (Supplementary Fig. 3). As much as 42.7% of the Illumina GA reads was assembled into a total of 6.58 million contigs of a length .500 bp, giving a total contig length of 10.3 Gb, with an N50 length of 2.2 kb (Supplementary Fig. 4) and the range of 12.3 to 237.6 Mb (Supplementary Table 4). Almost 35% of reads from any one sample could be mapped to contigs from other samples, indicating the existence of a common sequence core.
  • 16. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Wanting to generate an extensive catalogue of microbial genes from the human gut, we first assembled the short Illumina reads into longer contigs, which could then be analysed and annotated by standard methods. Using SOAPdenovo19, a de Bruijn graph-based tool specially designed for assembling very short reads, we performed de novo assembly for all of the Illumina GA sequence data. Because a high diversity between individuals is expected8,16,17, we first assembled each sample independently (Supplementary Fig. 3). As much as 42.7% of the Illumina GA reads was assembled into a total of 6.58 million contigs of a length .500 bp, giving a total contig length of 10.3 Gb, with an N50 length of 2.2 kb (Supplementary Fig. 4) and the range of 12.3 to 237.6 Mb (Supplementary Table 4). Almost 35% of reads from any one sample could be mapped to contigs from other samples, indicating the existence of a common sequence core.
  • 17. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Figure 3 | Flowchart of human gut microbiome data analysis process. We performed de novo short reads assembly for each sample independently, then all the unassembled reads were pooled for another round of assembly. ORFs were predicted in each of the contig set, and were merged by removing redundancy. The non-redundant gene set was used in all further analysis.
  • 18. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Figure 4 | Length distribution of assembled contigs. The number of contigs in different length bins for each individual was computed, and the data from all 124 individuals were pooled. Boxes denote 25% and 75% percentiles, the red line corresponds to the median, and the “whiskers” indicate interquartile range from either or both ends of the box
  • 19. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Table 4 | Summary of de novo assembly results. Assembled sequences with length below 500 bp were excluded from the contig set. Sample ID # of contigs Contig N50 (bp) Total length (Mb) % reads assembled Unassembled reads (Gb) MH0001 14,301 1,618 19.69 46.34 1.06 MH0002 65,392 1,680 88.77 45.31 1.91 MH0003 68,658 2,640 119.59 54.40 1.72 MH0004 23,793 1,681 31.92 41.54 1.05 MH0005 14,339 1,684 19.62 40.22 1.04 MH0006 144,440 2,025 217.77 52.39 5.24 MH0007 28,108 1,270 32.00 29.15 1.16 MH0008 26,506 1,768 37.24 43.53 0.95 MH0009 70,014 2,440 112.96 44.14 2.45 MH0010 25,674 1,815 36.52 48.77 0.88 MH0011 86,201 2,158 134.25 46.09 2.37 MH0012 140,991 2,478 237.58 42.77 7.99 MH0013 20,495 2,332 32.20 41.22 1.05 MH0014 66,724 2,957 120.54 50.90 2.08 MH0015 25,933 1,645 34.46 35.53 0.94 MH0016 64,124 2,915 114.03 53.89 1.88 MH0017 24,948 1,679 34.06 39.57 0.96 MH0018 13,247 1,619 17.73 35.23 1.07 MH0019 28,786 1,977 41.95 46.76 0.91 MH0020 44,930 4,708 98.78 56.81 1.49 MH0021 54,101 1,608 70.67 46.49 1.06 MH0022 21,872 1,773 30.00 35.04 1.06 MH0023 16,214 2,100 25.57 35.80 1.07 MH0024 43,145 1,512 54.45 33.21 2.41 MH0025 76,287 1,968 111.48 42.24 2.11 MH0026 33,408 3,769 69.38 43.97 1.58 MH0027 20,369 985 19.72 19.96 1.14 MH0028 61,004 2,630 104.54 49.65 1.80 MH0030 39,267 2,828 66.60 36.73 2.15 MH0031 53,292 1,878 75.84 36.41 2.33 MH0032 37,287 1,921 54.46 20.72 2.60 MH0033 61,782 2,616 102.04 49.87 1.69 MH0034 24,508 2,107 37.63 14.53 2.40 MH0035 68,287 2,075 102.94 48.22 1.93 MH0036 58,690 2,330 94.35 49.44 1.81 MH0037 48,356 2,526 80.44 48.21 1.63 MH0038 50,381 2,921 90.35 47.75 1.79 MH0039 66,509 2,087 104.17 47.66 1.68 MH0040 73,068 2,225 115.15 49.53 1.68
  • 20. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 To assess the quality of the Illumina GA-based assembly we mapped the contigs of samples MH0006 and MH0012 to the Sanger reads from the same samples (Supplementary Table 2). A total of 98.7% of the contigs that map to at least one Sanger read were collinear over 99.6% of the mapped regions. This is comparable to the contigs that were generated by 454 sequencing for one of the two samples (MH0006) as a control, of which 97.9% were collinear over 99.5% of the mapped regions. We estimate assembly errors to be 14.2 and 20.7 per megabase (Mb) of Illumina- and 454-based contigs, respectively (see Methods and Supplementary Fig. 5), indicating that the short- and long-read- based assemblies have comparable accuracies.
  • 21. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 To assess the quality of the Illumina GA-based assembly we mapped the contigs of samples MH0006 and MH0012 to the Sanger reads from the same samples (Supplementary Table 2). A total of 98.7% of the contigs that map to at least one Sanger read were collinear over 99.6% of the mapped regions. This is comparable to the contigs that were generated by 454 sequencing for one of the two samples (MH0006) as a control, of which 97.9% were collinear over 99.5% of the mapped regions. We estimate assembly errors to be 14.2 and 20.7 per megabase (Mb) of Illumina- and 454-based contigs, respectively (see Methods and Supplementary Fig. 5), indicating that the short- and long-read- based assemblies have comparable accuracies.
  • 22. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Table 2 | Summary of Sanger reads. The reads were sequenced by 3730xl. Low-quality sequences at both ends with phred score less than 20 were trimmed. Very short reads with length less than 100 bp were filtered. Sample ID # Sanger reads Average length (bp) Total length (bp) MH0006 237,567 660.65 156,949,306 MH0012 230,768 670.26 154,675,458
  • 23. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Figure 5 | Validating Illumina contigs using Sanger reads. Illumina/454 contigs from samples MH0006 and MH0012 were mapped to Sanger reads from the same samples. Aligned regions were scanned for breakage of collinearity, and each unique break is counted as an error. a. number of errors per Mb of Illumina/454 contigs mapped to Sanger reads. b. percentage of collinear Illumina/454 contigs and collinear basepairs in those contigs. b
  • 24. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 To complete the contig set we pooled the unassembled reads from all 124 samples, and repeated the de novo assembly process. About 0.4 million additional contigs were thus generated, having a length of 370 Mb and an N50 length of 939 bp. The total length of our final contig set was thus 10.7 Gb. Some 80% of the 576.7 Gb of Illumina GA sequence could be aligned to the contigs at a threshold of 90% identity, allowing for accommodation of sequencing errors and strain variability in the gut (Fig. 1), almost twice the 42.7% of sequence that was assembled into contigs by SOAPdenovo, because assembly uses more stringent criteria. This indicates that a vast majority of the Illumina sequence is represented by our contigs.
  • 25. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 To complete the contig set we pooled the unassembled reads from all 124 samples, and repeated the de novo assembly process. About 0.4 million additional contigs were thus generated, having a length of 370 Mb and an N50 length of 939 bp. The total length of our final contig set was thus 10.7 Gb. Some 80% of the 576.7 Gb of Illumina GA sequence could be aligned to the contigs at a threshold of 90% identity, allowing for accommodation of sequencing errors and strain variability in the gut (Fig. 1), almost twice the 42.7% of sequence that was assembled into contigs by SOAPdenovo, because assembly uses more stringent criteria. This indicates that a vast majority of the Illumina sequence is represented by our contigs.
  • 26. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 the 90% identity threshold. A total of 70.1% and 85.9% of the reads from the Japanese and US samples, respectively, could be aligned to (the highest indicates tha Although th that the catal genes of the Each indiv genes (Supp gene pool m found in on 20%, wherea term these ‘ depth; seque catalogue ge Nevertheless harboured 2 cating that a Interestingly than the ind consistent w diversity tha Common ba Deep metag the existence 100 50 0 Assembled contig set Known human gut bacteria GenBank bacteria Coverageofsequencingreads(%) Figure 1 | Coverage of human gut microbiome. The three human microbial sequencing read sets—Illumina GA reads generated from 124 individuals in this study (black; n 5 124), Roche/454 reads from 18 human twins and their mothers (grey; n 5 18) and Sanger reads from 13 Japanese individuals (white; n 5 13)—were aligned to each of the reference sequence sets. Mean values 6 s.e.m. are plotted. 60
  • 27. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 To compare the representation of the human gut microbiome in our contigs with that from previous work, we aligned them to the reads from the two largest published gut metagenome studies (1.83Gb of Roche/454 sequencing reads from 18 US adults8, and 0.79 Gb of Sanger reads from 13 Japanese adults and infants17), using the 90% identity threshold. A total of 70.1% and 85.9% of the reads from the Japanese and US samples, respectively, could be aligned to our contigs (Fig. 1), showing that the contigs include a high fraction of sequences from previous studies. In contrast, 85.7% and 69.5% of our contigs were not covered by the reads from the Japanese and US samples, respectively, highlighting the novelty we captured. !
  • 28. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 To compare the representation of the human gut microbiome in our contigs with that from previous work, we aligned them to the reads from the two largest published gut metagenome studies (1.83Gb of Roche/454 sequencing reads from 18 US adults8, and 0.79 Gb of Sanger reads from 13 Japanese adults and infants17), using the 90% identity threshold. A total of 70.1% and 85.9% of the reads from the Japanese and US samples, respectively, could be aligned to our contigs (Fig. 1), showing that the contigs include a high fraction of sequences from previous studies. In contrast, 85.7% and 69.5% of our contigs were not covered by the reads from the Japanese and US samples, respectively, highlighting the novelty we captured. !
  • 29. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 the 90% identity threshold. A total of 70.1% and 85.9% of the reads from the Japanese and US samples, respectively, could be aligned to (the highest indicates tha Although th that the catal genes of the Each indiv genes (Supp gene pool m found in on 20%, wherea term these ‘ depth; seque catalogue ge Nevertheless harboured 2 cating that a Interestingly than the ind consistent w diversity tha Common ba Deep metag the existence 100 50 0 Assembled contig set Known human gut bacteria GenBank bacteria Coverageofsequencingreads(%) Figure 1 | Coverage of human gut microbiome. The three human microbial sequencing read sets—Illumina GA reads generated from 124 individuals in this study (black; n 5 124), Roche/454 reads from 18 human twins and their mothers (grey; n 5 18) and Sanger reads from 13 Japanese individuals (white; n 5 13)—were aligned to each of the reference sequence sets. Mean values 6 s.e.m. are plotted. 60
  • 30. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Only 31.0–48.8% of the reads from the two previous studies and the present study could be aligned to 194 public human gut bacterial genomes (Supplementary Table 5), and 7.6–21.2% to the bacterial genomes deposited in GenBank (Fig. 1). This indicates that the reference gene set obtained by sequencing genomes of isolated bac- terial strains is still of a limited scale.
  • 31. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Only 31.0–48.8% of the reads from the two previous studies and the present study could be aligned to 194 public human gut bacterial genomes (Supplementary Table 5), and 7.6–21.2% to the bacterial genomes deposited in GenBank (Fig. 1). This indicates that the reference gene set obtained by sequencing genomes of isolated bacterial strains is still of a limited scale.
  • 32. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 the 90% identity threshold. A total of 70.1% and 85.9% of the reads from the Japanese and US samples, respectively, could be aligned to (the highest indicates tha Although th that the catal genes of the Each indiv genes (Supp gene pool m found in on 20%, wherea term these ‘ depth; seque catalogue ge Nevertheless harboured 2 cating that a Interestingly than the ind consistent w diversity tha Common ba Deep metag the existence 100 50 0 Assembled contig set Known human gut bacteria GenBank bacteria Coverageofsequencingreads(%) Figure 1 | Coverage of human gut microbiome. The three human microbial sequencing read sets—Illumina GA reads generated from 124 individuals in this study (black; n 5 124), Roche/454 reads from 18 human twins and their mothers (grey; n 5 18) and Sanger reads from 13 Japanese individuals (white; n 5 13)—were aligned to each of the reference sequence sets. Mean values 6 s.e.m. are plotted. 60
  • 33. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 A gene catalogue of the human gut microbiome To establish a non-redundant human gut microbiome gene set we first used the MetaGene20 program to predict ORFs in our contigs and found 14,048,045 ORFs longer than 100bp (Supplementary Table 6). They occupied 86.7% of the contigs, comparable to the value found for fully sequenced genomes (,86%). Two-thirds of the ORFs appeared incomplete, possibly due to the size of our contigs (N50 of 2.2 kb). We next removed the redundant ORFs, by pair-wise comparison, using a very stringent criterion of 95% identity over 90% of the shorter ORF length, which can fuse orthologues but avoids inflation of the data set due to possible sequencing errors (see Methods). Yet, the final non-redundant gene set contained as many as 3,299,822 ORFs with an average length of 704 bp (Supplementary Table 7).
  • 34. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 A gene catalogue of the human gut microbiome To establish a non-redundant human gut microbiome gene set we first used the MetaGene20 program to predict ORFs in our contigs and found 14,048,045 ORFs longer than 100bp (Supplementary Table 6). They occupied 86.7% of the contigs, comparable to the value found for fully sequenced genomes (,86%). Two-thirds of the ORFs appeared incomplete, possibly due to the size of our contigs (N50 of 2.2 kb). We next removed the redundant ORFs, by pair-wise comparison, using a very stringent criterion of 95% identity over 90% of the shorter ORF length, which can fuse orthologues but avoids inflation of the data set due to possible sequencing errors (see Methods). Yet, the final non-redundant gene set contained as many as 3,299,822 ORFs with an average length of 704 bp (Supplementary Table 7).
  • 35. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Table 7 | Non-redundant genes. Genes were compared at 95 % identity cut-off. Those that were overlapped over 90% length were considered redundant and removed. Common and rare genes were present in >50% and < 20% of individuals, respectively. # of genes Total length (bp) Mean length (bp) Non-redundant gene set 3,299,822 2,323,171,095 704.03 Common 294,110 292,960,308 996.09 Rare 2,375,655 1,510,527,924 635.84
  • 36. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 We term the genes of the non-redundant set ‘prevalent genes’, as they are encoded on contigs assembled from the most abundant reads (see Methods). The minimal relative abundance of the prevalent genes was , 631027, as estimated from the minimum sequence coverage of the unique genes (close to 3), and the total Illumina sequence length generated for each individual (on average, 4.5 Gb), assuming the average gene length of 0.85 kb (that is, 3 3 0.85 3 103/ 4.5 3 109).
  • 37. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 We term the genes of the non-redundant set ‘prevalent genes’, as they are encoded on contigs assembled from the most abundant reads (see Methods). The minimal relative abundance of the prevalent genes was , 63x-7, as estimated from the minimum sequence coverage of the unique genes (close to 3), and the total Illumina sequence length generated for each individual (on average, 4.5 Gb), assuming the average gene length of 0.85 kb (that is, 3 3 0.85 3 103/ 4.5 3 109).
  • 38. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 We mapped the 3.3 million gut ORFs to the 319,812 genes (target genes) of the 89 frequent reference microbial genomes in the human gut. At a 90% identity threshold, 80% of the target genes had at least 80% of their length covered by a single gut ORF (Fig. 2b). This indicates that the gene set includes most of the known human gut bacterial genes.
  • 39. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 We mapped the 3.3 million gut ORFs to the 319,812 genes (target genes) of the 89 frequent reference microbial genomes in the human gut. At a 90% identity threshold, 80% of the target genes had at least 80% of their length covered by a single gut ORF (Fig. 2b). This indicates that the gene set includes most of the known human gut bacterial genes.
  • 40. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 We examined the number of prevalent genes identified across all individuals as a function of the extent of sequencing, demanding at least two supporting reads for a gene call (Fig. 2a). The incidence- based coverage richness estimator (ICE), determined at 100 individuals (the highest number the EstimateS21 program could accommodate), indicates that our catalogue captures 85.3% of the prevalent genes. Although this is probably an underestimate, it nevertheless indicates that the catalogue contains an overwhelming majority of the prevalent genes of the cohort.
  • 41. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 We examined the number of prevalent genes identified across all individuals as a function of the extent of sequencing, demanding at least two supporting reads for a gene call (Fig. 2a). The incidence- based coverage richness estimator (ICE), determined at 100 individuals (the highest number the EstimateS21 program could accommodate), indicates that our catalogue captures 85.3% of the prevalent genes. Although this is probably an underestimate, it nevertheless indicates that the catalogue contains an overwhelming majority of the prevalent genes of the cohort.
  • 42. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014the cohort. For this purpose, we used a non-redundant set of 650 A complex pattern of species relatedness, character Relative abundance Blautia hansenii Clostridium scindens Enterococcus faecalis TX0104 Clostridium asparagiforme Bacteroides fragilis 3_1_12 Bacteroides intestinalis Ruminococcus gnavus Anaerotruncus colihominis Bacteroides pectinophilus Clostridium nexile Clostridium sp. L2−50 Parabacteroides johnsonii Bacteroides finegoldii Butyrivibrio crossotus Bacteroides eggerthii Clostridium sp. M62 1 Coprococcus eutactus Bacteroides stercoris Holdemania filiformis Clostridium leptum Streptococcus thermophilus LMD−9 Bacteroides capillosus Subdoligranulum variabile Ruminococcus obeum A2−162 Bacteroides dorei Eubacterium ventriosum Bacteroides sp. D4 Bacteroides sp. D1 Coprococcus comes SL7 1 Bacteriodes xylanisolvens XB1A Eubacterium rectale M104 1 Bacteroides sp. 2_2_4 Bacteroides sp. 4_3_47FAA Bacteroides ovatus Bacteroides sp. 9_1_42FAA Parabacteroides distasonis ATCC 8503 Eubacterium siraeum 70 3 Bacteroides sp. 2_1_7 Roseburia intestinalis M50 1 Bacteroides vulgatus ATCC 8482 Dorea formicigenerans Collinsella aerofaciens Ruminococcus lactaris Faecalibacterium prausnitzii SL3 3 Ruminococcus sp. SR1 5 Unknown sp. SS3 4 Ruminococcus torques L2−14 Eubacterium hallii Bacteroides thetaiotaomicron VPI−5482 Clostridium sp. SS2−1 Bacteroides caccae Ruminococcus bromii L2−63 Dorea longicatena Parabacteroides merdae Alistipes putredinis Bacteroides uniformis –4 –3 Figure 3 | Relative abundance of 57 frequent microbial ge individuals of the cohort. See Fig. 2c for definition of box See Methods for computation. 1 Number of individuals sampled Numberoforthologousgroups/genefamilies(×103 ) 25 50 75 100 124 a b c 320,000 280, 000 240,000 200,000 160,000 1.0 0.8 0.6 0.4 0.2 0 0.6 0.7 0.8 0.9 1.0 0.5 85% 90% 95% 0 5 10 15 20 0 1 2 3 4 1 20 40 60 80 100 OGs + novel gene families Known + unknown OGs Known OGs Numberofnon-redundant genes(×106 ) Numberoftargetgenescovered Fractionoftargetgenes covered Number of samples Fraction of gene length covered Figure 2 | Predicted ORFs in the human gut microbiome. a, Number of uniquegenes as a function ofthe extent ofsequencing. The gene accumulation curve corresponds to the Sobs (Mao Tau) values (number of observed genes), calculated using EstimateS21 (version 8.2.0) on randomly chosen 100 samples (due to memory limitation). b, Coverage of genes from 89 frequent gut microbial species (Supplementary Table 12). c, Number of functions captured by number of samples investigated, based on known (well characterized) orthologous groups (OGs; bottom), known plus unknown orthologous groups (including, for example, putative, predicted, conserved hypothetical functions; middle) and orthologous groups plus novel gene families (.20 proteins) recovered from the metagenome (top). Boxes denote the interquartile range (IQR) between the first and third quartiles (25th and 75th percentiles, respectively) and the line inside denotes the median. Whiskers denote the lowest and highest values within 1.5 times IQR from the first and third quartiles, respectively. Circles denote outliers beyond the whiskers. NATURE|Vol 464|4 March 2010
  • 43. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 ) a b c 0. 0. 0. 0. 1. 0. 20 0 1 2 3 4 1 20 40 60 80 100 Numberofnon-redundant genes(×106 ) Fractionoftargetgenes covered Number of samples
  • 44. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Each individual carried 536,112 ±12,167 (mean 6 s.e.m.) prevalent genes (Supplementary Fig. 6b), indicating that most of the 3.3 million gene pool must be shared. However, most of the prevalent genes were found in only a few individuals: 2,375,655 were present in less than 20%, whereas 294,110 were found in at least 50% of individuals (we term these ‘common’ genes). These values depend on the sampling depth; sequencing of MH0006 and MH0012 revealed more of the catalogue genes, present at a low abundance (Supplementary Fig. 7). Nevertheless, even at our routine sampling depth, each individual harboured 204,05663,603 (mean ± s.e.m.) common genes, indi- cating that about 38% of an individual’s total gene pool is shared. Interestingly, the IBD patients harboured, on average, 25% fewer genes than the individuals not suffering from IBD (Supplementary Fig. 8), consistent with the observation that the former have lower bacterial diversity than the latter22.
  • 45. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Each individual carried 536,112 ±12,167 (mean 6 s.e.m.) prevalent genes (Supplementary Fig. 6b), indicating that most of the 3.3 million gene pool must be shared. However, most of the prevalent genes were found in only a few individuals: 2,375,655 were present in less than 20%, whereas 294,110 were found in at least 50% of individuals (we term these ‘common’ genes). These values depend on the sampling depth; sequencing of MH0006 and MH0012 revealed more of the catalogue genes, present at a low abundance (Supplementary Fig. 7). Nevertheless, even at our routine sampling depth, each individual harboured 204,056 ± 3,603 (mean ± s.e.m.) common genes, indi- cating that about 38% of an individual’s total gene pool is shared. Interestingly, the IBD patients harboured, on average, 25% fewer genes than the individuals not suffering from IBD (Supplementary Fig. 8), consistent with the observation that the former have lower bacterial diversity than the latter22.
  • 46. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Figure 7 | Number of unique genes identified with increase of sequencing depth in sample MH0006 and MH0012.
  • 47. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Figure 8 | Distribution of nonredundant bacterial genes in IBD patients and healthy controls. The proportion of individuals having a given number of genes (classes of 100 thousand genes were used) is shown.The average gene number for IBD patients and individuals not suffering from IBD was425,397 + 126,685 (s.d.; n=25) and 564,070 + 121,962 (s.d.; n=99), respectively; p<10-6 (one-tailed Student t test).
  • 48. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Common bacterial core Deep metagenomic sequencing provides the opportunity to explore the existence of a common set of microbial species (common core) in the cohort. For this purpose, we used a non-redundant set of 650 sequenced bacterial and archaeal genomes (see Methods). We aligned the Illumina GA reads of each human gut microbial sample onto the genome set, using a 90% identity threshold, and determined the proportion of the genomes covered by the reads that aligned onto only a single position in the set. At a 1% coverage, which for a typical gut bacterial genome corresponds to an average length of about 40 kb, some 25-fold more than that of the 16S gene generally used for species identification, we detected 18 species in all individuals, 57 in ≥90% and 75 in ≥50% of individuals (Supplementary Table 8). At 10% coverage, requiring ,10-fold higher abundance in a sample, we still found 13 of the above species in ≥90% of individuals and 35 in ≥50%.
  • 49. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Common bacterial core Deep metagenomic sequencing provides the opportunity to explore the existence of a common set of microbial species (common core) in the cohort. For this purpose, we used a non-redundant set of 650 sequenced bacterial and archaeal genomes (see Methods). We aligned the Illumina GA reads of each human gut microbial sample onto the genome set, using a 90% identity threshold, and determined the proportion of the genomes covered by the reads that aligned onto only a single position in the set. At a 1% coverage, which for a typical gut bacterial genome corresponds to an average length of about 40 kb, some 25-fold more than that of the 16S gene generally used for species identification, we detected 18 species in all individuals, 57 in ≥90% and 75 in ≥50% of individuals (Supplementary Table 8). At 10% coverage, requiring ,10- fold higher abundance in a sample, we still found 13 of the above species in ≥90% of individuals and 35 in ≥50%.
  • 50. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 When the cumulated sequence length increased from 3.96 Gb to 8.74 Gb and from 4.41 Gb to 11.6 Gb, for samples MH0006 and MH0012, respectively, the number of strains common to the two at the 1% coverage threshold increased by 25%, from 135 to 169. This indicates the existence of a significantly larger common core than the one we could observe at the sequence depth routinely used for each individual.
  • 51. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 When the cumulated sequence length increased from 3.96 Gb to 8.74 Gb and from 4.41 Gb to 11.6 Gb, for samples MH0006 and MH0012, respectively, the number of strains common to the two at the 1% coverage threshold increased by 25%, from 135 to 169. This indicates the existence of a significantly larger common core than the one we could observe at the sequence depth routinely used for each individual.
  • 52. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 The variability of abundance of microbial species in individuals can greatly affect identification of the common core. To visualize this variability, we compared the number of sequencing reads aligned to different genomes across the individuals of our cohort. Even for the most common 57 species present in ≥90% of individuals with genome coverage .1% (Supplementary Table 8), the inter-individual variability was between 12- and 2,187-fold (Fig. 3). As expected10,23, Bacteroidetes and Firmicutes had the highest abundance.
  • 53. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 The variability of abundance of microbial species in individuals can greatly affect identification of the common core. To visualize this variability, we compared the number of sequencing reads aligned to different genomes across the individuals of our cohort. Even for the most common 57 species present in ≥90% of individuals with genome coverage .1% (Supplementary Table 8), the inter-individual variability was between 12- and 2,187-fold (Fig. 3). As expected10,23, Bacteroidetes and Firmicutes had the highest abundance.
  • 54. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014pose, we used a non-redundant set of 650 A complex pattern of species relatedness, characterized by clusters Relative abundance (log10 ) Blautia hansenii Clostridium scindens Enterococcus faecalis TX0104 Clostridium asparagiforme Bacteroides fragilis 3_1_12 Bacteroides intestinalis Ruminococcus gnavus Anaerotruncus colihominis Bacteroides pectinophilus Clostridium nexile Clostridium sp. L2−50 Parabacteroides johnsonii Bacteroides finegoldii Butyrivibrio crossotus Bacteroides eggerthii Clostridium sp. M62 1 Coprococcus eutactus Bacteroides stercoris Holdemania filiformis Clostridium leptum Streptococcus thermophilus LMD−9 Bacteroides capillosus Subdoligranulum variabile Ruminococcus obeum A2−162 Bacteroides dorei Eubacterium ventriosum Bacteroides sp. D4 Bacteroides sp. D1 Coprococcus comes SL7 1 Bacteriodes xylanisolvens XB1A Eubacterium rectale M104 1 Bacteroides sp. 2_2_4 Bacteroides sp. 4_3_47FAA Bacteroides ovatus Bacteroides sp. 9_1_42FAA Parabacteroides distasonis ATCC 8503 Eubacterium siraeum 70 3 Bacteroides sp. 2_1_7 Roseburia intestinalis M50 1 Bacteroides vulgatus ATCC 8482 Dorea formicigenerans Collinsella aerofaciens Ruminococcus lactaris Faecalibacterium prausnitzii SL3 3 Ruminococcus sp. SR1 5 Unknown sp. SS3 4 Ruminococcus torques L2−14 Eubacterium hallii Bacteroides thetaiotaomicron VPI−5482 Clostridium sp. SS2−1 Bacteroides caccae Ruminococcus bromii L2−63 Dorea longicatena Parabacteroides merdae Alistipes putredinis Bacteroides uniformis –4 –3 –2 –1 Figure 3 | Relative abundance of 57 frequent microbial genomes among individuals of the cohort. See Fig. 2c for definition of box and whisker plot. See Methods for computation. Number of individuals sampled 50 75 100 124 b 320,000 280, 000 240,000 200,000 160,000 1.0 0.8 0.6 0.4 0.2 0 0.6 0.7 0.8 0.9 1.0 0.5 85% 90% 95% 80 100 OGs + novel gene families Known + unknown OGs Known OGs Numberoftargetgenescovered Fractionoftargetgenes covered Fraction of gene length covered n the human gut microbiome. a, Number of fthe extent ofsequencing. The gene accumulation bs (Mao Tau) values (number of observed genes), (version 8.2.0) on randomly chosen 100 samples . b, Coverage of genes from 89 frequent gut ntary Table 12). c, Number of functions captured tigated, based on known (well characterized) ottom), known plus unknown orthologous ple, putative, predicted, conserved hypothetical ologous groups plus novel gene families (.20 e metagenome (top). Boxes denote the tween the first and third quartiles (25th and 75th d the line inside denotes the median. Whiskers st values within 1.5 times IQR from the first and Circles denote outliers beyond the whiskers. 2010 ARTICLES
  • 55. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 A complex pattern of species relatedness, characterized by clusters at the genus and family levels, emerges from the analysis of the net- work based on the pair-wise Pearson correlation coefficients of 155 species present in at least one individual at$ ≥1% coverage (Supplementary Fig. 9). Prominent clusters include some of the most abundant gut species, such as members of the Bacteroidetes and Dorea/Eubacterium/ Ruminococcus groups and also bifidobacteria, Proteobacteria and streptococci/lactobacilli groups. These observa- tions indicate that similar constellations of bacteria may be present in different individuals of our cohort, for reasons that remain to be established.
  • 56. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 A complex pattern of species relatedness, characterized by clusters at the genus and family levels, emerges from the analysis of the network based on the pair-wise Pearson correlation coefficients of 155 species present in at least one individual at ≥1% coverage (Supplementary Fig. 9). Prominent clusters include some of the most abundant gut species, such as members of the Bacteroidetes and Dorea/Eubacterium/Ruminococcus groups and also bifidobacteria, Proteobacteria and streptococci/ lactobacilli groups. These observa- tions indicate that similar constellations of bacteria may be present in different individuals of our cohort, for reasons that remain to be established.
  • 57. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Figure 9 | Relations between the most abundant bacterial species. The network was deduced from the analysis of 155 bacterial species present in at least 1 individual at a genome coverage of 1%. Size of the nodes (circles) indicates species abundance over the cohort, width of the edges (lines connecting the circles) indicates the value of the Pearson correlation coefficient (only the 342 values above 0.4 or below -0.4 out of a total of 11,935 were used for the network).
  • 58. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 The above result indicates that the Illumina-based bacterial profiling should reveal differences between the healthy individuals and patients. To test this hypothesis we compared the IBD patients and healthy controls (Supplementary Table 1), as it was previously reported that the two have different microbiota22. The principal component analysis, based on the same 155 species, clearly separates patients from healthy individuals and the ulcerative colitis from the Crohn’s disease patients (Fig. 4), confirming our hypothesis.
  • 59. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 The above result indicates that the Illumina-based bacterial profiling should reveal differences between the healthy individuals and patients. To test this hypothesis we compared the IBD patients and healthy controls (Supplementary Table 1), as it was previously reported that the two have different microbiota22. The principal component analysis, based on the same 155 species, clearly separates patients from healthy individuals and the ulcerative colitis from the Crohn’s disease patients (Fig. 4), confirming our hypothesis.
  • 60. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Almost all (99.96%) of the phylogenetically assigned genes belonged were within t This suggests t (Supplementa functions imp We found t required in all 40 30 20 10 0 Cluster(%) 1 Figure 5 | Clust were ranked by length and copy clusters with th groups of 100 c that contains 86 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Healthy Crohn’s disease Ulcerative colitis P value: 0.031 PC2 PC1 Figure 4 | Bacterial species abundance differentiates IBD patients and healthy individuals. Principal component analysis with health status as instrumental variables, based on the abundance of 155 species with $1% genome coverage by the Illumina reads in at least 1 individual of the cohort, was carried out with 14 healthy individuals and 25 IBD patients (21 ulcerative colitis and 4 Crohn’s disease) from Spain (Supplementary Table 1). Two first components (PC1 and PC2) were plotted and represented 7.3% of whole inertia. Individuals (represented by points) were clustered and centre of gravity computed for each class; P-value of the link between health status and species abundance was assessed using a Monte-Carlo test (999 replicates). ARTICLES
  • 61. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Functions encoded by the prevalent gene set We classified the predicted genes by aligning them to the integrated NCBI-NR database of non- redundant protein sequences, the genes in the KEGG (Kyoto Encyclopedia of Genes and Genomes)24 pathways, and COG (Clusters of Orthologous Groups)25 and eggNOG26 data- bases. There were 77.1% genes classified into phylotypes, 57.5% to eggNOG clusters, 47.0% to KEGG orthology and 18.7% genes assigned to KEGG pathways, respectively (Supplementary Table 9). Almost all (99.96%) of the phylogenetically assigned genes belonged to the Bacteria and Archaea, reflecting their predominance in the gut. Genes that were not mapped to orthologous groups were clustered into gene families (see Methods). To investigate the functional con- tent of the prevalent gene set we computed the total number of orthologous groups and/or gene families present in any combination of n individuals (with n 5 2–124; see Fig. 2c). This rarefaction ana- lysis shows that the ‘known’ functions (annotated in eggNOG or KEGG) quickly saturate (a value of 5,569 groups was observed): when sampling any subset of 50 individuals, most have been detected. However, three-quarters of the prevalent gut functionalities consists of uncharacterized orthologous groups and/or completely novel gene families (Fig. 2c). When including these groups, the rarefaction curve only starts to plateau at the very end, at a much higher level (19,338 groups were detected), confirming that the extensive sampling of a large number of individuals was necessary to capture this considerable amount of novel/unknown functionality.
  • 62. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Functions encoded by the prevalent gene set We classified the predicted genes by aligning them to the integrated NCBI-NR database of non- redundant protein sequences, the genes in the KEGG (Kyoto Encyclopedia of Genes and Genomes)24 pathways, and COG (Clusters of Orthologous Groups)25 and eggNOG26 data- bases. There were 77.1% genes classified into phylotypes, 57.5% to eggNOG clusters, 47.0% to KEGG orthology and 18.7% genes assigned to KEGG pathways, respectively (Supplementary Table 9). Almost all (99.96%) of the phylogenetically assigned genes belonged to the Bacteria and Archaea, reflecting their predominance in the gut. Genes that were not mapped to orthologous groups were clustered into gene families (see Methods). To investigate the functional con- tent of the prevalent gene set we computed the total number of orthologous groups and/or gene families present in any combination of n individuals (with n 5 2–124; see Fig. 2c). This rarefaction ana- lysis shows that the ‘known’ functions (annotated in eggNOG or KEGG) quickly saturate (a value of 5,569 groups was observed): when sampling any subset of 50 individuals, most have been detected. However, three-quarters of the prevalent gut functionalities consists of uncharacterized orthologous groups and/or completely novel gene families (Fig. 2c). When including these groups, the rarefaction curve only starts to plateau at the very end, at a much higher level (19,338 groups were detected), confirming that the extensive sampling of a large number of individuals was necessary to capture this considerable amount of novel/unknown functionality.
  • 63. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 1 Number of individuals sampled Numberoforthologousgroups/genefamilies(×103 ) 25 50 75 100 124 c 160,000 1.0 0.8 0.6 0.4 0.2 0 0.5 95% 0 5 10 15 20 0 1 20 40 60 80 100 OGs + novel gene families Known + unknown OGs Known OGs Nu covered Fra Number of samples Fraction of gene length covered Figure 2 | Predicted ORFs in the human gut microbiome. a, Number of
  • 64. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Bacterial functions important for life in the gut The extensive non-redundant catalogue of the bacterial genes from the human intestinal tract provides an opportunity to identify bacterial functions important for life in this environment. There are functions necessary for a bacterium to thrive in a gut context (that is, the ‘minimal gut genome’) and those involved in the homeostasis of the whole ecosystem, encoded across many species (the ‘minimal gut metagenome’). The first set of functions is expected to be present in most or all gut bacterial species; the second set in most or all individuals’ gut samples.
  • 65. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 To identify the functions encoded by the minimal gut genome we use the fact that they should be present in most or all gut bacterial species and therefore appear in the gene catalogue at a frequency above that of the functions present in only some of the gut bacterial species. The relative frequency of different functions can be deduced from the number of genes recruited to different eggNOG clusters, after normalization for gene length and copy number (Supplementary Fig. 10a, b). We ranked all the clusters by gene frequencies and determined the range that included the clusters specifying well- known essential bacterial functions, such as those determined experi-mentally for a well- studied firmicute, Bacillus subtilis27, hypothesizing that additional clusters in this range are equally important. As expected, the range that included most of B. subtilis essential clusters (86%) was at the very top of the ranking order (Fig. 5). Some 76% of the clusters with essential genes of Escherichia coli28 were within this range, confirming the validity of our approach. This suggests that 1,244 metagenomic clusters found within the range (Supplementary Table 10; termed ‘range clusters’ hereafter) specify functions important for life in the gut.
  • 66. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 To identify the functions encoded by the minimal gut genome we use the fact that they should be present in most or all gut bacterial species and therefore appear in the gene catalogue at a frequency above that of the functions present in only some of the gut bacterial species. The relative frequency of different functions can be deduced from the number of genes recruited to different eggNOG clusters, after normalization for gene length and copy number (Supplementary Fig. 10a, b). We ranked all the clusters by gene frequencies and determined the range that included the clusters specifying well- known essential bacterial functions, such as those determined experi-mentally for a well- studied firmicute, Bacillus subtilis27, hypothesizing that additional clusters in this range are equally important. As expected, the range that included most of B. subtilis essential clusters (86%) was at the very top of the ranking order (Fig. 5). Some 76% of the clusters with essential genes of Escherichia coli28 were within this range, confirming the validity of our approach. This suggests that 1,244 metagenomic clusters found within the range (Supplementary Table 10; termed ‘range clusters’ hereafter) specify functions important for life in the gut.
  • 67. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 40 30 20 10 0 Cluster(%) 1 2,001 4,001 6,001 8,001 10,001 Cluster rank Range Figure 5 | Clusters that contain the B. subtilis essential genes. The clusters were ranked by the number of genes they contain, normalized by average length and copy number (see Supplementary Fig. 10), and the proportion of clusters with the essential B. subtilis genes was determined for successive groups of 100 clusters. Range indicates the part of the cluster distribution that contains 86% of the B. subtilis essential genes. and us as $1% cohort, cerative Two first NATURE|Vol 464|4 March 2010
  • 68. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 We found two types of functions among the range clusters: those required in all bacteria (housekeeping) and those potentially specific for the gut. Among many examples of the first category are the functions that are part of main metabolic pathways (for example, central carbon metabolism, amino acid synthesis), and important protein complexes (RNA and DNA polymerase, ATP synthase, general secretory apparatus). Not surprisingly, projection of the range clusters on the KEGG metabolic pathways gives a highly integrated picture of the global gut cell metabolism (Fig. 6a). The putative gut-specific functions include those involved in adhe- sion to the host proteins (collagen, fibrinogen, fibronectin) or in harvesting sugars of the globoseries glycolipids, which are carried on blood and epithelial cells. Furthermore, 15% of range clusters encode functions that are present in ,10% of the eggNOG genomes (see Supplementary Fig. 11) and are largely (74.3%) not defined (Fig. 6b). Detailed studies of these should lead to a deeper compre- hension of bacterial life in the gut.
  • 69. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 We found two types of functions among the range clusters: those required in all bacteria (housekeeping) and those potentially specific for the gut. Among many examples of the first category are the functions that are part of main metabolic pathways (for example, central carbon metabolism, amino acid synthesis), and important protein complexes (RNA and DNA polymerase, ATP synthase, general secretory apparatus). Not surprisingly, projection of the range clusters on the KEGG metabolic pathways gives a highly integrated picture of the global gut cell metabolism (Fig. 6a). The putative gut-specific functions include those involved in adhe-sion to the host proteins (collagen, fibrinogen, fibronectin) or in harvesting sugars of the globoseries glycolipids, which are carried on blood and epithelial cells. Furthermore, 15% of range clusters encode functions that are present in ,10% of the eggNOG genomes (see Supplementary Fig. 11) and are largely (74.3%) not defined (Fig. 6b). Detailed studies of these should lead to a deeper comprehension of bacterial life in the gut.
  • 70. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 c a b map00565 map00350 map00629 resistance β-Lactam map00643 map00620 map00780 map00260 map00940 map00512 map00670 map00513 map00220 map00628 Limonene and pinene degradation map00040 map00632 map00563 map00920 biosynthesis II Alkaloid map00600 map00625 map00400 map00941 map00520 map00790 map00330 map00621 map00271 map00591 map00072 map00480 map00031 map00460 map00910 map00604 map00631 map00900 map00010 map00331 map00290 map00240 map00300 map00561 map00196 map00053 map00071 map00660 map00860 map00440 map00601 Tetracycline biosynthesis map00641 map00642 map00750 map00710 map00195 map00251 map00052 map00531 map00051 map00410 map00540 map00140 map00120 map00252 map00380 map00627 biosynthesis Penicillins and cephalosporins map00830 map00623 Monoterpenoid biosynthesis map00360 map00472 map00562 map00530 map00650 map00770 map00062 map00640 map00730 map00473 map00130 map00760 map00950 map00510 map00272 map00622 map00363 map00680 Diterpenoid biosynthesis Streptomycin biosynthesis map00340 map00791 map00564 map00020 map00500 map00720 map00362 map00310 map00230 map00550 map00630 map00603 map00471 map00901 map00602 map00590 map00351 map00626 map00030 map00534 map00532 map00190 map00740 map00430 map00624 map00061 map00150 biosynthesis Novobiocin map00280 map00906 map00100 map00361 map00930 map00450 Carbohydrate metabolism and metabolism Glycan biosynthesis metabolism Amino acid metabolism Energy Lipid metabolism xenobiotics Biodegradation of Metabolism of other amino acids metabolism Nucleotide Metabolism of cofactors and vitamins Biosynthesis of secondary metabolites 0.0% 10.0% 20.0% 30.0% General function - R Translation - J Amino acids - E DNA - L Unknown - S Envelope - M Carbohydrates - G Energy - C Transcription - K Coenzymes - H Nucleotides - F Inorganic - P Protein turnover - O Lipids - I Signal transduction - T Secretion - U Cell cycle - D Defence- V Second metabolites - Q Cell motility - N RNA - A Chromatin - B Extracellular - W Nuclear structure - Y Cytoskeleton - Z Rare minimal Rare minimal Frequent min Frequent min Projection of the minimal gut genome on the KEGG pathways using the iPath tool38. b, Functional composition of the minimal gut genome and metagenome. Rare and frequent refer to the presence in sequenced eggNOG genomes. c, Estimation of the minimal gut metagenome size. Known orthologous groups (red), known plus unknown orthologous groups (blue) and orthologous groups plus novel gene families (>20 proteins; grey) are shown (see Fig. 2c for definition of box and whisker plot). The inset shows composition of the gut minimal microbiome. Large circle: classification in the minimal metagenome according to orthologous group occurrence in STRING739 bacterial genomes. Common (25%), uncommon (35%) and rare (45%) refer to functions that are present in >50%, <50% but >10%, and <10% of STRING bacteria genomes, respectively. Small circle: composition of the rare orthologous groups. Unknown (80%) have no annotation or are poorly characterized, whereas known bacterial (19%) and phage-related (1%) orthologous groups have functional description.
  • 71. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 b map00350 resistance β-Lactam map00780 map00260 map00670 map00220 map00628 map00632 biosynthesis II Alkaloid map00400 map00790 map00330 map00271 map00910 map00290 map00240 map00300 map00860 Tetracycline biosynthesis map00750 map00251 map00380 biosynthesis Penicillins and cephalosporins map00830 map00623 map00360 map00472 map00770 map00730 map00130 map00760 map00950 map00791 map00362 map00310 map00230 map00351 map00626 map00740 0 map00624 biosynthesis Novobiocin map00930 metabolism Amino acid sm Metabolism of other amino acids bolism eotide Metabolism of cofactors and vitamins 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% General function - R Translation - J Amino acids - E DNA - L Unknown - S Envelope - M Carbohydrates - G Energy - C Transcription - K Coenzymes - H Nucleotides - F Inorganic - P Protein turnover - O Lipids - I Signal transduction - T Secretion - U Cell cycle - D Defence- V Second metabolites - Q Cell motility - N RNA - A Chromatin - B Extracellular - W Nuclear structure - Y Cytoskeleton - Z Rare minimal genome Rare minimal metagenome Frequent minimal genome Frequent minimal metagenome
  • 72. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Common Uncommon Rare Unknown Known Phage-associated 2,000 4,000 6,000 8,000 10,000 12,000 14,000 Number of individuals sampled Minimalmetagenomesize 1 25 50 75 100 c map00629 map00940 map00628 Limonene and pinene degradation map00941 map00621 map00331 map00627 map00623 Monoterpenoid biosynthesis map00622 map00363Diterpenoid biosynthesis map00791 map00362 map00901 map00351 map00626 map00190 map00361 map00930 xenobiotics Biodegradation of Biosynthesis of secondary metabolites 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% Cytoskeleton - Z acterization of the minimal gut genome and metagenome. the minimal gut genome on the KEGG pathways using the composition of the gut minimal microbiome. Large circle: c the minimal metagenome according to orthologous group o
  • 73. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 To identify the functions encoded by the minimal gut metagenome, we computed the orthologous groups that are shared by individuals of our cohort. This minimal set, of 6,313 functions, is much larger than the one estimated in a previous study8. There are only 2,069 functionally annotated orthologous groups, showing that they gravely underesti- mate the true size of the common functional complement among indi- viduals (Fig. 6c). The minimal gut metagenome includes a considerable fraction of functions (,45%) that are present in ,10% of the sequenced bacterial genomes (Fig. 6c, inset). These otherwise rare func- tionalities that are found in each of the 124 individuals may be necessary for the gut ecosystem. Eighty per cent of these orthologous groups contain genes with at best poorly characterized function, underscoring our limited knowledge of gut functioning.
  • 74. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 To identify the functions encoded by the minimal gut metagenome, we computed the orthologous groups that are shared by individuals of our cohort. This minimal set, of 6,313 functions, is much larger than the one estimated in a previous study8. There are only 2,069 functionally annotated orthologous groups, showing that they gravely underesti- mate the true size of the common functional complement among indi- viduals (Fig. 6c). The minimal gut metagenome includes a considerable fraction of functions (,45%) that are present in ,10% of the sequenced bacterial genomes (Fig. 6c, inset). These otherwise rare func- tionalities that are found in each of the 124 individuals may be necessary for the gut ecosystem. Eighty per cent of these orthologous groups contain genes with at best poorly characterized function, underscoring our limited knowledge of gut functioning.
  • 75. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Of the known fraction, about 5% codes for (pro)phage-related proteins, implying a universal presence and possible important eco- logical role of bacteriophages in gut homeostasis. The most striking secondary metabolism that seems crucial for the minimal metage- nome relates, not unexpectedly, to biodegradation of complex sugars and glycans harvested from the host diet and/or intestinal lining. Examples include degradation and uptake pathways for pectin (and its monomer, rhamnose) and sorbitol, sugars which are omni- present in fruits and vegetables, but which are not or poorly absorbed by humans. As some gut microorganisms were found to degrade both of them29,30, this capacity seems to be selected for by the gut ecosystem as a non-competitive source of energy. Besides these, capacity to ferment, for example, mannose, fructose, cellulose and sucrose is also part of the minimal metagenome. Together, these emphasize the strong dependence of the gut ecosystem on complex sugar degrada- tion for its functioning. !
  • 76. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Of the known fraction, about 5% codes for (pro)phage-related proteins, implying a universal presence and possible important eco- logical role of bacteriophages in gut homeostasis. The most striking secondary metabolism that seems crucial for the minimal metage- nome relates, not unexpectedly, to biodegradation of complex sugars and glycans harvested from the host diet and/or intestinal lining. Examples include degradation and uptake pathways for pectin (and its monomer, rhamnose) and sorbitol, sugars which are omni- present in fruits and vegetables, but which are not or poorly absorbed by humans. As some gut microorganisms were found to degrade both of them29,30, this capacity seems to be selected for by the gut ecosystem as a non- competitive source of energy. Besides these, capacity to ferment, for example, mannose, fructose, cellulose and sucrose is also part of the minimal metagenome. Together, these emphasize the strong dependence of the gut ecosystem on complex sugar degrada- tion for its functioning. !
  • 77. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Functional complementarities of the genome and metagenome Detailed analysis of the complementarities between the gut metage- nome and the human genome is beyond the scope of the present work. To provide an overview, we considered two factors: conservation of the functions in the minimal metagenome and presence/absence of func- tions in one or the other (Supplementary Table 11). Gut bacteria use mostly fermentation to generate energy, converting sugars, in part, to short-chain fatty acid, that are used by the host as energy source. Acetate is important for muscle, heart and brain cells31, propionate is used in host hepatic neoglucogenic processes, whereas, in addition, butyrate is important for enterocytes32. Beyond short-chain fatty acid, a number of amino acids are indispensable to humans33 and can be provided by bacteria34. Similarly, bacteria can contribute certain vitamins3 (for example, biotin, phylloquinone) to the host. All of the steps of biosyn- thesis of these molecules are encoded by the minimal metagenome
  • 78. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Gut bacteria seem to be able to degrade numerous xenobiotics, including non-modified and halogenated aromatic compounds (Sup- plementary Table 11), even if the steps of most pathways are not part of the minimal metagenome and are found in a fraction of individuals only. A particularly interesting example is that of benzoate, which is a common food supplement, known as E211. Its degradation by the coenzyme-A ligation pathway, encoded in the minimal metagenome, leads to pimeloyl-coenzyme-A, which is a precursor of biotin, indi- cating that this food supplement can have a potentially beneficial role for human health.
  • 79. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 DISCUSSION
  • 80. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Discussion We have used extensive Illumina GA short-read-based sequencing of total faecal DNA from a cohort of 124 individuals of European (Nordic and Mediterranean) origin to establish a catalogue of non- redundant human intestinal microbial genes. The catalogue contains 3.3 million microbial genes, 150-fold more than the human gene complement, and includes an overwhelming majority (.86%) of prevalent genes harboured by our cohort. The catalogue probably contains a large majority of prevalent intestinal microbial genes in the human population, for the following reasons: (1) over 70% of the metagenomic reads from three previous studies, including American and Japanese individuals8,16,17, can be mapped on our contigs; (2) about 80% of the microbial genes from 89 frequent gut reference genomes are present in our set. This result represents a proof of principle that short-read sequencing can be used to characterize complex microbiomes.
  • 81. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 The full bacterial gene complement of each individual was not sampled in our work. Nevertheless, we have detected some 536,000 prevalent unique genes in each, out of the total of 3.3 million carried by our cohort. Inevitably, the individuals largely share the genes of the common pool. At the present depth of sequencing, we found that almost 40% of the genes from each individual are shared with at least half of the individuals of the cohort. Future studies of world-wide span, envisaged within the International Human Microbiome Consortium, will complete, as necessary, our gene catalogue and establish boundaries to the proportion of shared genes.
  • 82. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Essentially all (99.1%) of the genes of our catalogue are of bacterial origin, the remainder being mostly archaeal, with only 0.1% of eukar- yotic and viral origins. The gene catalogue is therefore equivalent to that of some 1,000 bacterial species with an average-sized genome, encoding about 3,364 non-redundant genes. We estimate that no more than 15% of prevalent genes of our cohort may be missing from the catalogue, and suggest that the cohort harbours no more than ,1,150 bacterial species abundant enough to be detected by our sampling. Given the large overlap between microbial sequences in this and previous studies we suggest that the number of abundant intestinal bacterial species may be not much higher than that observed in our cohort. Each individual of our cohort harbours at least 160 such bacterial species, as estimated by the average prevalent gene number, and many must thus be shared.
  • 83. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 ` We assigned about 12% of the reference set genes (404,000) to the 194 sequenced intestinal bacterial genomes, and can thus associate them with bacterial species. Sequencing of at least 1,000 human- associated bacterial genomes is foreseen within the International Human Microbiome Consortium, via the Human Microbiome Project and MetaHIT. This is commensurate with the number of dominant species in our cohort and expected more broadly in human gut, and should enable a much more extensive gene to species assign- ment. Nevertheless, we used the presently available sequenced genomes to explore further the concept of largely shared species among our cohort and identified 75 species common to .50% of individuals and 57 species common to .90%. These numbers are likely to increase with the number of sequenced reference strains and a deeper sampling. Indeed, a 2–3-fold increase in sequencing depth raised by 25% the number of species that we could detect as shared between two individuals. A large number of shared species supports the view that the prevalent human microbiome is of a finite and not overly large size.
  • 84. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 How can this view be reconciled with that of a considerable inter- personal diversity of innumerable bacterial species in the gut, arising from most previous studies using the 16S RNA marker gene4,8,10,11? Possibly the depth of sampling of these studies was insufficient to reveal common species when present at low abundance, and empha- sized the difference in the composition of a relatively few dominant species. We found a very high variability of abundance (12- to 2,200- fold) for the 57 most common species across the individuals of our cohort. Nevertheless, a recent 16S rRNA-based study concluded that a common bacterial species ‘core’, shared among at least 50% of individuals under study, exists35 !
  • 85. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Detailed comparisons of bacterial genes across the individuals of our cohort will be carried out in the future, within the context of the ongoing MetaHIT clinical studies of which they are part. Nevertheless, clustering of the genes in families allowed us to capture a virtually full functional potential of the prevalent gene set and revealed a considerable novelty, extending the functional categories by some 30% in regard to previous work8. Similarly, this analysis has revealed a functional core, conserved in each individual of the cohort, which reflects the full minimal human gut metagenome, encoded across many species and probably required for the proper functioning of the gut ecosystem. The size of this minimal metagenome exceeds several-fold that of the core metagenome reported previously8. It includes functions known to be important to the host–bacterial inter- action, such as degradation of complex polysaccharides, synthesis of short-chain fatty acids, indispensable amino acids and vitamins. Finally, we also identified functions that we attribute to a minimal gut bacterial genome, likely to be required by any bacterium to thrive in this ecosystem. Besides general housekeeping functions, the minimal genome encompasses many genes of unknown function, rare in sequenced genomes and possibly specifically required in the gut.
  • 86. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Beyond providing the global view of the human gut microbiome, the extensive gene catalogue we have established enables future studies of association of the microbial genes with human phenotypes and, even more broadly, human living habits, taking into account the environment, including diet, from birth to old age. We anticipate that these studies will lead to a much more complete understanding of human biology than the one we presently have.
  • 87. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 LETTERS A core gut microbiome in obese and lean twins Peter J. Turnbaugh1 , Micah Hamady3 , Tanya Yatsunenko1 , Brandi L. Cantarel5 , Alexis Duncan2 , Ruth E. Ley1 , Mitchell L. Sogin6 , William J. Jones7 , Bruce A. Roe8 , Jason P. Affourtit9 , Michael Egholm9 , Bernard Henrissat5 , Andrew C. Heath2 , Rob Knight4 & Jeffrey I. Gordon1 The human distal gut harbours a vast ensemble of microbes (the microbiota) that provide important metabolic capabilities, includ- ing the ability to extract energy from otherwise indigestible dietary polysaccharides1–6 . Studies of a few unrelated, healthy adults have revealed substantial diversity in their gut communities, as mea- sured by sequencing 16S rRNA genes6–8 , yet how this diversity relates to function and to the rest of the genes in the collective genomes of the microbiota (the gut microbiome) remains obscure. Studies of lean and obese mice suggest that the gut microbiota affects energy balance by influencing the efficiency of calorie har- vest from the diet, and how this harvested energy is used and stored3–5 . Here we characterize the faecal microbial communities of adult female monozygotic and dizygotic twin pairs concordant for leanness or obesity, and their mothers, to address how host genotype, environmental exposure and host adiposity influence the gut microbiome. Analysis of 154 individuals yielded 9,920 near full-length and 1,937,461 partial bacterial 16S rRNA sequences, plus 2.14 gigabases from their microbiomes. The results reveal that the human gut microbiome is shared among family members, but that each person’s gut microbial community varies in the specific bacterial lineages present, with a comparable degree of co-variation between adult monozygotic and dizygotic twin pairs. However, there was a wide array of shared microbial genes among sampled individuals, comprising an extensive, identifiable ‘core micro- biome’ at the gene, rather than at the organismal lineage, level. Obesity is associated with phylum-level changes in the microbiota, reduced bacterial diversity and altered representation of bacterial genes and metabolic pathways. These results demonstrate that a diversity of organismal assemblages can nonetheless yield a core microbiome at a functional level, and that deviations from this core are associated with different physiological states (obese compared leanness (BMI 5 18.5–24.9 kg m22 ) (one twin pair was lean/over- weight (overweight defined as BMI $ 25 and , 30) and six pairs were overweight/obese). They had not taken antibiotics for at least 5.49 6 0.09 months. Each participant completed a detailed medical, lifestyle and dietary questionnaire: study enrolees were broadly representative of the overall Missouri population for BMI, parity, education and marital status (see Supplementary Results). Although all were born in Missouri, they currently live throughout the USA: 29% live in the same house, but some live more than 800 km apart. Because faecal samples are readily attainable and representative of interpersonal differences in gut microbial ecology7 , they were col- lected from each individual and frozen immediately. The collection procedure was repeated again with an average interval between sampling of 57 6 4 days. To characterize the bacterial lineages present in the faecal micro- biotas of these 154 individuals, we performed 16S rRNA sequencing, targeting the full-length gene with an ABI 3730xl capillary sequencer. Additionally, we performed multiplex pyrosequencing with a 454 FLX instrument to survey the gene’s V2 variable region13 and its V6 hypervariable region14 (Supplementary Tables 1–3). Complementary phylogenetic and taxon-based methods were used to compare 16S rRNA sequences among faecal communities (see Methods). No matter which region of the gene was examined, individuals from the same family (a twin and her co-twin, or twins and their mother) had a more similar bacterial community structure than unrelated individuals (Fig. 1a and Supplementary Fig. 1a, b), and shared significantly more species-level phylotypes (16S rRNA sequences with $97% identity comprise each phylotype) (G 5 55.2, P , 10212 (V2); G 5 12.3, P , 0.001 (V6); G 5 11.3, P , 0.001 (full-length)). No significant correlation was seen between the degree of physical separation of family members’ current homes Vol 457|22 January 2009|doi:10.1038/nature07540
  • 88. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 in coverage, all analyses were performed on an equal number of randomly selected sequences (200 full-length, 1,000 V2 and 10,000 V6). At this level of coverage, there was little overlap between the sampled faecal communities. Moreover, the number of 16S rRNA gene sequences belonging to each phylotype varied greatly between Analysis of 16S rRNA data sets produced by the three P methods, plus shotgun sequencing of community DNA (se revealed a lower proportion of Bacteroidetes and a higher p of Actinobacteria in obese compared with lean individua ancestries (Supplementary Table 9). Combining the ind values across these independent analyses using Fisher’s me closed significantly fewer Bacteroidetes (P 5 0.003 Actinobacteria (P 5 0.002) but no significant diffe Firmicutes (P 5 0.09). These findings agree with previ showing comparable differences in both taxa in mice2 and a ive increase in the representation of Bacteroidetes when 12 u obese humans lost weight after being placed on one of two calorie diets6 . Across all methods, obesity was associated with a s decrease in the level of diversity (Fig. 1b and Suppleme 1c–f). This reduced diversity suggests an analogy: the microbiota is not like a rainforest or reef, which are adapte energy flux and are highly diverse; rather, it may be m fertilizer runoff where a reduced-diversity microbial co blooms with abnormal energy input16 . We subsequently characterized the microbial lineage content of the faecal microbiomes of 18 individuals rep six of the families (three lean and three obese European monozygotic twin pairs and their mothers) through shotg sequencing (Supplementary Tables 4 and 5) and BLASTX isons against several databases (KEGG17 (version 44) and S plus a custom database of 44 reference human gut microbia (Supplementary Figs 7–10 and Supplementary Results). Ou parameters were validated using control data sets compr domly fragmented microbial genes with annotations in t database17 (Supplementary Fig. 11 and Supplementary M We also tested how technical advances that produce lon might improve these assignments by sequencing faecal co samples from one twin pair using Titanium pyrosequencing (average read length of 341 6 134 nucleotides (s.d.) versus nucleotides for the standard FLX method). Supplementa shows that the frequency and quality of sequence assign improved as read length increases from 200 to 350 nucleo The 18 microbiomes were searched to identify sequences domains from experimentally validated carbohydr enzymes (CAZymes). Sequences matching 156 total CAZ were found within at least one human gut microbiome, inc glycoside hydrolase, 21 carbohydrate-binding module, 35 transferase, 12 polysaccharide lyase and 11 carbohydrat families (Supplementary Table 10). On average, 2.62 6 b a * 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 Self Twin–twin Mono- zygotic Twin–mother Unrelated UniFracdistance Dizygotic Mono- zygotic Dizygotic 2 22 42 62 82 102 122 0 2,000 4,000 6,000 8,000 1,0000 Number of sequences Phylogeneticdiversity Lean Obese * *** MoresimilarMoredifferent * *** ** ** ns Figure 1 | 16S rRNA gene surveys reveal familial similarity and reduced diversity of the gut microbiota in obese individuals. a, Average unweighted UniFrac distance (a measure of differences in bacterial community structure) between individuals over time (self), twin pairs, twins and their mother, and unrelated individuals (1,000 sequences per V2 data set; Student’s t-test with Monte Carlo; *P , 1025 ; **P , 10214 ; ***P , 10241 ; mean 6 s.e.m.). b, Phylogenetic diversity curves for the microbiota of lean and obese individuals (based on 1–10,000 sequences per V6 data set; mean 6 95% confidence intervals shown). NATURE|Vol 457|22 January 2009 L
  • 89. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 and taxonomic similarity (see Supplementary Methods) disclosed a significant association: individuals with similar taxonomic profiles of b-diversity with respect to the relative abundance of bacte (Fig. 3a), analysis of the relative abundance of broad functi egories of genes and metabolic pathways (KEGG) revealed a consistent pattern regardless of the sample surveyed (Fig Supplementary Table 11): the pattern is also consistent wi we obtained from a meta-analysis of previously published gu biome data sets from nine adults20,21 (Supplementary Fig. consistency is not simply due to the broad level of these ann as a similar analysis of Bacteroidetes and Firmicutes refere omes revealed substantial variation in the relative abundanc category (see Supplementary Fig. 17). Furthermore, pairw parisons of metabolic profiles obtained from the 18 microb this study revealed an average value of R2 of 0.97 6 0.002 indicating a high level of functional similarity. Overall functional diversity was compared using the index22 , a measurement that combines diversity (the numb ferent metabolic pathways) and evenness (the relative abun each pathway). The human gut microbiomes surveyed had and high Shannon index value (4.63 6 0.01), close to the m possible level of functional diversity (5.54; see Suppl Methods). Despite the presence of a small number of abunda bolic pathways (listed in Supplementary Table 11), the ove tional profile of each gut microbiome is quite even (Shannon of 0.846 0.001 on a scale of 0–1), demonstrating that most m pathways are found at a similar level of abundance. Interest level of functional diversity in each microbiome was sig linked to the relative abundance of the Bacteroidetes (R P , 1026 ); microbiomes enriched for Firmicutes/Actinobac a lowerleveloffunctionaldiversity.Thisobservation isconsis an analysis of simulated metagenomic reads generated from e Bacteroidetes and Firmicutes genomes (Supplementary Fig average, the Bacteroidetes genomes have a significantly high both functional diversity and evenness (Mann–Whitne P , 0.01). At a finer level, 26–53% of ‘enzyme’-level functiona (KEGG/CAZy/STRING) were shared across all 18 micr whereas 8–22% of the groups were unique to a single mic (Supplementary Fig. 19a–c). The ‘core’ functional groups p all microbiomes were also highly abundant, representing 93 thetotal sequences. Given thehigherrelative abundance of th groups, more than 95% were found after 26.11 6 2.02 meg sequence were collected from a given microbiome, whereas able’ groups continued to increase substantially with each a megabase of sequence. Of course, any estimate of the total s core microbiome will depend on sequencing effort, espe 0.990.980.97R2 value: PC1 (20%) Bacteroidetes(%) a b c High Firmicutes/ActinobacteriaHigh Bacteroidetes Twin versus twin Twin versus mother Unrelated pairs Functionalsimilarity(R2) * 0.94 0.95 0.96 0.97 0.98 0.99 1 R2 = 0.96 0 20 40 60 80 100 –0.6 –0.4 –0.2 0 0.2 0.4 0.6 1.00 0.98 0.98 0.99 0.99 0.99 0.98 0.98 0.99 0.98 0.98 1.00 0.99 0.99 0.97 0.97 0.98 0.98 0.98 0.97 0.98 0.99 0.98 0.99 1.00 1.00 0.98 0.98 0.98 0.98 0.99 0.99 0.99 0.98 0.98 0.99 0.99 0.99 1.00 1.00 0.99 0.99 0.99 0.97 0.99 1.00 1.00 0.99 0.98 0.98 0.99 0.99 0.98 0.99 1.00 0.99 0.98 0.97 0.99 0.99 0.99 0.99 0.99 0.97 0.98 0.99 0.99 1.00 0.97 0.98 0.99 0.99 0.99 0.97 0.98 0.99 0.98 0.97 1.00 0.97 0.98 0.99 0.99 0.98 0.98 0.97 0.98 0.97 1.00 0.99 0.98 0.98 0.97 0.99 0.98 0.97 0.99 1.00 0.99 0.99 0.98 0.98 0.98 0.99 0.99 0.99 0.98 0.98 0.99 1.00 0.99 0.99 0.99 0.98 0.99 1.00 0.99 0.99 0.99 0.99 0.99 1.00 0.99 0.98 0.97 0.97 0.98 0.98 0.98 0.99 1.00 0.99 0.99 0.99 0.98 0.99 0.99 1.00 0.98 0.98 0.97 0.98 0.98 0.99 0.99 0.98 0.98 0.98 1.00 0.99 0.97 0.98 0.99 0.99 0.97 0.98 0.98 0.98 0.97 0.98 0.99 1.00 0.98 0.99 0.99 0.99 0.97 0.98 1.00 1.00 0.99 0.98 0.98 0.99 1.00 1.00 0.99 0.98 0.98 0.98 0.98 0.97 0.97 0.97 0.99 0.99 0.99 0.99 1.00 0.99 0.99 0.99 0.99 0.98 0.98 0.98 0.99 0.99 0.98 0.98 0.99 1.00 F1T1Le F1T2Le F1MOv F2T1Le F2T2Le F2MOb F3T1Le F3T2Le F3MOv F4T1Ob F4T2Ob F4MOb F5T1Ob F5T2Ob F5MOv F6T1Ob F6T2Ob F6MOb F1T1Le F1T2Le F1MOv F2T1Le F2T2Le F2MOb F3T1Le F3T2Le F3MOv F4T1Ob F4T2Ob F4MOb F5T1Ob F5T2Ob F5MOv F6T1Ob F6T2Ob F6MOb <0.97 Figure 2 | Metabolic-pathway-based clustering and analysis of the human gut microbiome of monozygotic twins. a, Clustering of functional profiles based on the relative abundance of KEGG metabolic pathways. All pairwise comparisons were made of the profiles by calculating each R2 value. Sample identifier nomenclature: family number, twin number or mother, and BMI category (Le, lean; Ov, overweight; Ob, obese; for example, F1T1Le stands for family 1, twin 1, lean). b, The relative abundance of Bacteroidetes as a function of the first principal component derived from an analysis of KEGG metabolic profiles. c, Comparisons of functional similarity between twin pairs, between twins and their mother, and between unrelated individuals. Asterisk indicates significant differences (Student’s t-test with Monte Carlo; P , 0.01; mean 6 s.e.m.). a COG categoriesBacterial phylum b LETTERS NATURE|Vol 457|22 Janu
  • 90. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 ds) disclosed a nomic profiles el test). s representing icutes and 14 clustering by analyses of the biome-derived d 26 pathways upplementary eral carbohyd- were enriched our CAZyme abundance of glycosyltrans- erases in the n Microbiome ntifiable ‘core al capabilities y of humans1 . ed a high level able’ groups continued to increase substantially with each additional megabase of sequence. Of course, any estimate of the total size of the core microbiome will depend on sequencing effort, especially for h Monte Carlo; [Q] [P] [I] [H] [F] [E] [G] [C] [S] [R] [O] [U] [W] [Z] [N] [M] [T] [V] [Y] [D] [B] [L] [K] [A] [J] 0 20 40 60 80 100 Relativeabundance(%) OtherProteobacteria F1T1Le F1T2Le F1MOv F2T1Le F2T2Le F2MOb F3T1Le F3T2Le F3MOv F4T1Ob F4T2Ob F4MOb F5T1Ob F5T2Ob F5MOv F6T1Ob F6T2Ob F6MOb F1T1Le F1T2Le F1MOv F2T1Le F2T2Le F2MOb F3T1Le F3T2Le F3MOv F4T1Ob F4T2Ob F4MOb F5T1Ob F5T2Ob F5MOv F6T1Ob F6T2Ob F6MOb a COG categoriesBacterial phylum b ActinobacteriaBacteroidetesFirmicutes Figure 3 | Comparison of taxonomic and functional variations in the human gut microbiome. a, Relative abundance of major phyla across 18 faecal microbiomes from monozygotic twins and their mothers, based on BLASTX comparisons of microbiomes and the National Center for Biotechnology Information non-redundant database. b, Relative abundance of categories of genes across each sampled gut microbiome (letters correspond to categories in the COG database). millan Publishers Limited. All rights reserved
  • 91. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 including those involved in transcription and translation (Fig. 4). Metabolic profile-based clustering indicated that the representation of ‘core’ functional groups was highly consistent across samples (Supplementary Fig. 20), and included several pathways that are To identify metabolic pathways associated with obesity, o core associated (variable) functional groups were included i parison of the gut microbiomes of lean versus obese twin bootstrap analysis23 was used to identify metabolic pathw were enriched or depleted in the variable obese gut micr For example, similar to a mouse model of diet-induced the obese human gut microbiome was enriched for phospho ase systems involved in microbial processing of carbo (Supplementary Table 12). All gut microbiome sequences w pared with the custom database of 44 human gut genomes: ratio analysis revealed 383 genes that were significantly between the obese and lean gut microbiome (q value , 0 enriched and 110 depleted in the obese micr Supplementary Tables 13 and 14). By contrast, only 49 ge consistently enriched or depleted between all twin p Supplementary Methods). These obesity-associated genes were representative of t nomic differences described above: 75% of the obesity- genes were from Actinobacteria (compared with 0% of lean- genes; the other 25% are from Firmicutes) whereas 42% of enriched genes were from Bacteroidetes (compared with 0 obesity-enriched genes). Their functional annotation indic many are involved in carbohydrate, lipid and amino-acid ism (Supplementary Tables 13 and 14). Together, they com initial set of microbial biomarkers of the obese gut microbi Our finding that the gut microbial community structures monozygotic twin pairs had a degree of similarity that was able to that of dizygotic twin pairs, and only slightly mor than that of their mothers, is consistent with an earlier finger study of adult twins24 , and with a recent microarray-based which revealed that gut community assembly during the fir life followed a more similar pattern in a pair of dizygotic tw 12 unrelated infants25 . Intriguingly, another fingerprinting monozygotic and dizygotic twins in childhood showed a reduced similarity profile in dizygotic twins26 . Thus, compr time-course studies, comparing monozygotic and dizygo pairs from birth through adulthood, as well as intergen analyses of their families’ microbiotas, will be key to dete the relative contributions of host genotype and environmen sures to (gut) microbial ecology. The hypothesis that there is a core human gut microbiom able by a set of abundant microbial organismal lineages th share, may be incorrect: by adulthood, no single bacterial p was detectable at an abundant frequency in the guts o sampled humans. Instead, it appears that a core gut mic exists at the level of shared genes, including an important com involved in various metabolic functions. This conservation s high degree of redundancy in the gut microbiome and sup ecological view of each individual as an ‘island’ inhabited b 0 2 4 6 8 10 12 14 Transcription Translation Nucleotide metabolism Amino-acid metabolism Biosynthesis of secondary metabolites Replication and repair Metabolism of other amino acids Glycan biosynthesis and metabolism Carbohydrate metabolism Lipid metabolism Biosynthesis of polyketides Cell growth and death Metabolism of cofactors and vitamins Energy metabolism Xenobiotics biodegradation and metabolism Genetic information processing protein families Metabolism protein families Metabolism unclassified Membrane transport Folding, sorting, and degradation Cellular processes and signalling protein families Cellular processes and signalling unclassified Signal transduction Poorly characterized unclassified Genetic information processing unclassified Cell motility Signalling molecules and interaction Relative abundance (percentage of KEGG assignments) KEGG category Core Variable *** * ** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** Figure 4 | KEGG categories enriched or depleted in the core versus variable components of the gut microbiome. Sequences from each of the 18 faecal microbiomes were binned into the ‘core’ or ‘variable’ microbiome based on the co-occurrence of KEGG orthologous groups (core groups were found in all 18 microbiomes whereas variable groups were present in fewer (,18) microbiomes; see Supplementary Fig. 19a). Asterisks indicate significant differences (Student’s t-test, *P , 0.05, **P , 0.001, ***P , 1025 ; mean 6 s.e.m.). Macmillan Publishers Limited. All rights reserved©2009