Slides from an oral presentation by Shigehiro Kuraku given in an internal event 'Sequence Informatics Afternoon' organized by Genome Resource and Analysis Unit of RIKEN CDB in April 2014.
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Brief introduction of aLeaves
1. Shigehiro Kuraku
Unit Leader
Genome Resource & Analysis Unit, RIKEN CDB
http://www.cdb.riken.jp/gra/skuraku.html
The extended version of this presentation as well as its Japanese version
is available at SlideShare ( http://www.slideshare.net/cdb_gras/ )
aLeaves: web server (http://aleaves.cdb.riken.jp/aleaves/)
for handy phylogenetic analysis
2. Tutorial movies available
Powered by
“Collecting amino acid sequences and
building a phylogenetic tree on the aLeaves
and MAFFT servers”
https://www.youtube.com/watch?v=0hpp-IqhpyQ
「aLeavesとMAFFTを使って1つのアミノ酸配列
から系統樹を推定する」
https://www.youtube.com/watch?v=N9qPLRhHfIQ
3. Motivation of aLeaves development
While we have access to various methods for molecular phylogenetic tree
inference and enriched sequence data from large-scale sequencing projects,
phylogenetic tree building is not handy but rather cumbersome for
biologists working in labs.
Launch an online tool which performs comprehensive sequence
searches covering scattered large-scale resources and systematic
data slimming using biologist-friendly cues.
Background
4. What is hidden paralogy ? ex) zebrafish Emx3
Derobert et al., 2002 etc.
Morita et al., 1995
Reviewed in Kuraku, 2010. Integ. Comp. Biol.
5. What is hidden paralogy ? ex) zebrafish Emx3
Derobert et al., 2002 etc.
Morita et al., 1995
Reviewed in Kuraku, 2010. Integ. Comp. Biol.
8. Scattered information prevents our smooth work
EnsemblNCBI Protein
(annotated)
Individual web sites
of genome projects
Your sequences
NCBI Refseq
(annotated)
Ensembl Metazoa
Dataset
9. Collaborators
GRAS, RIKEN CDB CBRC, AIST
&
iFReC, Osaka Univ.
Christian M. Zmasek
Sanford-Burnham
Medical Research Institute
USA
Kazutaka KatohOsamu Nishimura
13. Downstream analysis on MAFFT server
Systematic selection/deletion of seqs based on various criteria
・Sequence length filter
・Delete identical/similar sequences (CD-HIT)
・Delete sequences with large gaps (Max-Align)
・Select only particular species
・Select/delete particular subgroups in a guide-tree
Managed by K. Katoh
14. Heuristic identification of homologs
(in publications, etc.)
Exhaustive collection of homologs Careful refinement of data set
by deleting unnecessary sequences
Phylogenetic tree inference
Retrieval of limited number of
sequences
(on MAFFT server at CBRC, AIST)
(on aLeaves server at CDB, RIKEN)
Workflow using aLeaves-MAFFT
15. Warning
・aLeaves is based on sequence resources already made public in other
online databases and does not release original sequence information.
・aLeaves project does not predict and validate protein coding sequences
available at other web sites and just adopt them for integrative searches.
・aLeaves-MAFFT link allows you to perform sequence data set
refinement and preliminary molecular phylogenetic analysis, but
please perform more sophisticated analyses on your local system
by downloading the data set.