Examples of metagenomics use cases for the Phylotastic! web tools. Presented a the Phylotastic hackathon, June 4-8 2012: http://www.evoio.org/wiki/Phylotastic
2. -Omic Dictionary
• Marker gene studies – amplification of a
conserved homologous gene (18S, 16S rRNA)
from environmental samples
• Metagenomics – shotgun sequencing of
random genomic fragments from
environmental DNA
9. Pruning Subtrees from Megatrees
• User inputs a list of reference sequences with
NCBI Taxon IDs Pulls down tree topology
• Unclassified sequences in a reference
phylogeny could be “named” with the most
appropriate higher level taxon
10. Name Matching and TNRS
• Different taxonomic synonyms have different
NCBI taxon IDS
– Shigella: 620 and E.coli: 562
– Species/genus boundaries still debated
• TNRS would provide a “matrix” for
standardizing IDs
– E.g. E.coli/Shigella supergroup: 12345
11. Integrating Comparative Data
• Metadata is a standard part of any well-
constructed metagenomics study
– Depth (marine samples)
– Aquatic/Terrestrial
– Temperature
– pH
– Dissolved Oxygen
12. Integrating Comparative Data
• Metadata also includes information about the
sequences themselves
– Abundance information
– Distribution across sample sites
Branch thickness can be incorporated into XML
tree files and visualized within Archaeopteryx
13. Mashup with Online Data
• Pull down NCBI metadata for a given reference
sequence accession
– Habitat metadata
– Ecological associations –e.g. symbionts
– Genome availability
– Related publications
– Pictures, etc. would be awesome
14. Exploring Trees
Ecologically, wh
at are these
reference taxa
doing??
You can ignore all the other bad –Omic words you hear – conservome?!
Regardless of methdology, focus on:Species assemblages and taxonomic diversityCommunity patterns over space and time – Cosmoplitanism or Regionally restricted?Community changes as a result of natural/human disturbance
Marker genes across all domains – bacteria, archeaa,eukaryotes & virusesrRNA genes,Protein-coding orthologs, lineage-specific gene families----- Meeting Notes (5/22/12 10:42) -----Marker genes to make higher level taxon assignmentsLineages-specific gene families to narrow down assignments to lower taxonomic levels
Head-tail patterns may help us to delimit species and separate out rare taxa (who will have Head-tail patterns) from errors (no apparent pattern)----- Meeting Notes (5/22/12 10:42) -----pplacer and EPA are great tools developed in the last few years.
I see name matching as not just species names, but matching between NCBI taxon ID synonyms
rRNAdata especially needs to be interpreted in a phylogenetic contextPhylo placement allows:1) More robust taxon assignments2) ID divergent/undersampled lineages (that aren't apparent via BLAST searches)What's the ecology/function of these divergent lineages?