This document discusses developing a data visualization framework for high-throughput sequencing data. It proposes a $250k, 1-year grant to create an open-source, scalable web framework to visualize standard bioinformatics outputs and biodiversity data in more informative ways than pie charts. The framework would allow exploration of taxonomic patterns, comparisons across samples, and phylogenetic placement of sequences on guide trees. Developing intuitive visualizations for new data types and growing a user and developer community are also goals.
5. “Pitch Interactive dissects large
data sets in search of meaningful
and often hidden patterns that
serve to determine the shape and
form that best tells a story.”
16. Challenge 2: Taxonomic, phylogenetic,
and ecological knowledge is imperative for
making meaningful interpretations of
high-throughput sequence datasets
37. Input Sequences
rRNA workflow
protein workflow
profile HMMs used to align
candidates to reference alignment
Taxonomic
Summaries
parallel option
hmmalign
multiple alignment
LAST
fast candidate search
pplacer
phylogenetic placement
LAST
fast candidate search
LAST
fast candidate search
search input against references
hmmalign
multiple alignment
hmmalign
multiple alignment
Infernal
multiple alignment
LAST
fast candidate search
<600 bp
>600 bp
Sample Analysis &
Comparison
Krona plots,
Number of reads placed
for each marker gene
Edge PCA,
Tree visualization,
Bayes factor tests
eachinputsequencescannedagainstbothworkflows
45. Acknowledgements
:
:
Jonathan Eisen Aaron Darling Guillaume Jospin Dongying Wu
David Coil
:
: Further Information
• hbik@ucdavis.edu
• @Dr_Bik – updates posted to Twitter
• Grant proposal now posted on Figshare!