Sources: https://github.com/bebatut-slides/galaxy-day_2015
Massive sequencing data from intestinal microbiota are available in public data repositories (Genbank, ENA, …) but are not easy to identify with associated metadata, query and compare because they are dispatched and possibly underwent different analyses. To extract useful information from these datasets, they need to be collected and formatted using a similar workflow based on specific gut microbiota databases. To process and analyze data from gut microbiota, we developed a framework based on a personalized Galaxy instance. In this instance, several tools are incorporated (PRINSEQ, FastQ-Join, SortMeRNA, Reago, usearch, framebot, cd-hit, MetaPhlAn, HUMAnN, QIIME), with databases such as COG (Clusters of Orthologous Groups of proteins) or the catalog of reference genes in the human gut microbiome (Li et al, Nature Biotechnology, 2014). We defined some standard workflows using these tools and databases.
18. R1 sequences R2 sequences
COG databaseNon rRNA sequencesrRNA sequencesLong rRNA sequences
Functional a
Diamond
KEGG module ab
KEGG module c
HUMAnN
COG family coverage
COG family abundance
KEGG pathway abundance
KEGG pathway coverage
Similarity search report
Taxonomic assignation
MetaPhlAn
QIIME
De novo OTU picking
Taxonomic assignation report
of non rRNA sequences
OTU of long
rRNA sequences
QIIME
Taxonomic assignation of OTU
OTU table of long
rRNA sequences
QIIME
Community summary by
taxonomic composition
Taxonomy table of long
rRNA sequences
QIIME
Alpha diversity and alpha
rarefaction computation
Alpha diversity of long
rRNA sequences
action of long
sequences
QIIME
De novo OTU picking
OTU of long
rRNA sequences
QIIME
Taxonomic assignation of OTU
OTU table of long
rRNA sequences
QIIME
Community summary by
taxonomic composition
Taxonomy table of long
rRNA sequences
QIIME
Alpha diversity and alpha
rarefaction computation
Alpha diversity of long
rRNA sequences
Alpha rarefaction of long
rRNA sequences
Paired-end assembly
FastQ Joiner
Quality control
PRINSEQ
Sequence sorting
Reago SortMeRNA
Paired-end assembled sequences
Quality controlled sequences