SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Bio.Phylo
A unified phylogenetics toolkit for Biopython


                Eric Talevich

            Institute of Bioinformatics
              University of Georgia


               June 29, 2010
Abstract


       Bio.Phylo is a new phylogenetics library for:

• Exploring, modifying and annotating trees
• Reading & writing standard file formats
• Quick visualization
• Gluing together computational pipelines



                 Availability: Biopython 1.54
A quick survey of file formats

   Newick (a.k.a. New Hampshire) is a simple nested-parens
          format:    (A, (B, C), (D, E))
             • Extended & tweaked, led to NHX (and parsing
               problems)

   Nexus is a collection of formats, including Newick trees
             • More than just tree data. . . still tough to parse

PhyloXML is an XML-based replacement for NHX
             • Annotations formalized as XML elements;
               extensible with user-defined element types

  NeXML is an XML-based successor to Nexus
             • Ontology-based — key-value assignments have
               semantic meaning
Demo: What’s in a tree?




1. Read a simple Newick file
                              4. Promote to a PhyloXML tree
2. Inspect through IPython
                              5. Set branch colors
3. Draw with
                              6. Write a PhyloXML file
   PyLab/matplotlib
# In a terminal, make a simple Newick file
# Then launch the IPython interpreter and read the file


% cat > simple.dnd <<EOF
> (((A,B),(C,D)),(E,F,G))
> EOF

% ipython -pylab
>>> from Bio import Phylo
>>> tree = Phylo.read(’simple.dnd’, ’newick’)
# String representation shows the object structure

>>> print tree

Tree(weight=1.0, rooted=False, name=’’)
    Clade(branch_length=1.0)
        Clade(branch_length=1.0)
            Clade(branch_length=1.0)
                Clade(branch_length=1.0, name=’A’)
                Clade(branch_length=1.0, name=’B’)
            Clade(branch_length=1.0)
                Clade(branch_length=1.0, name=’C’)
                Clade(branch_length=1.0, name=’D’)
        Clade(branch_length=1.0)
            Clade(branch_length=1.0, name=’E’)
            Clade(branch_length=1.0, name=’F’)
            Clade(branch_length=1.0, name=’G’)
# Draw an ASCII-art dendrogram

>>> Phylo.draw_ascii(tree, column_width=52)

                                  ______________   A
                  ______________|
                 |               |______________   B
   ______________|
 |               |                ______________   C
 |               |______________|
_|                               |______________   D
 |
 |                 ______________ E
 |               |
 |______________|______________ F
                 |
                 |______________ G
>>> tree.rooted = True
>>> Phylo.draw graphiz(tree)

                                   D
              A


                                           C



       B

                                       G

                  E
                               F
# Promote a basic tree to PhyloXML
>>> from Bio.Phylo.PhyloXML import Phylogeny
>>> phy = Phylogeny.from_tree(tree)
>>> print phy

Phylogeny(rooted=True, name=’’)
    Clade(branch_length=1.0)
        Clade(branch_length=1.0)
            Clade(branch_length=1.0)
                Clade(branch_length=1.0, name=’A’)
                Clade(branch_length=1.0, name=’B’)
            Clade(branch_length=1.0)
                Clade(branch_length=1.0, name=’C’)
                Clade(branch_length=1.0, name=’D’)
        Clade(branch_length=1.0)
            Clade(branch_length=1.0, name=’E’)
            Clade(branch_length=1.0, name=’F’)
            Clade(branch_length=1.0, name=’G’)
Branch color
>>> phy.root.color = (128, 128, 128)
Or:
>>> phy.root.color = ’#808080’
Or:
>>> phy.root.color = ’gray’

Find clades by attribute values:
>>> mrca = phy.common ancestor({’name’:’E’},
                                 {’name’:’F’})
>>> mrca.color = ’salmon’

Directly index a clade:
>>> phy.clade[0,1].color = ’blue’

>>> Phylo.draw graphviz(phy, prog=’neato’)
D               B


C                       A




        G       F

            E
# Save the color annotations in phyloXML

>>> Phylo.write(phy, ’simple-color.xml’, ’phyloxml’)

<phy:phyloxml xmlns:phy="http://www.phyloxml.org">
  <phylogeny rooted="true">
    <clade>
        <branch_length>1.0</branch_length>
        <color>
            <red>128</red>
            <green>128</green>
            <blue>128</blue>
        </color>
        <clade>
            <branch_length>1.0</branch_length>
            <clade>
                 <branch_length>1.0</branch_length>
                 <clade>
                     <name>A</name>
                     ...
Thanks


Holla:
  • Brad Chapman and Christian Zmasek, GSoC 2009 mentors
  • The Biopython developers, feat. Peter J. A. Cock,
    Frank Kauff & Cymon J. Cox
  • Hilmar Lapp & the NESCent Phyloinformatics program
  • Google’s Open Source Programs Office
  • My professor, Dr. Natarajan Kannan
  • Developers like you
Q&A



• Which 3rd-party applications should we wrap in
  Bio.Phylo.Applications? (e.g. RAxML, MrBayes)
• Which other libraries should we support interoperability with?
  (PyCogent, ape)
• What other algorithms are simple, stable and relevant?
  (Consensus, rooting)
• Features for systematics? (Geography, PopGen integration?)
Extra: Tree methods
>>> dir(tree)

collapse                      get terminals
collapse all                  is bifurcating
common ancestor               is monophyletic
count terminals               is parent of
depths                        is preterminal
distance                      ladderize
find any                      prune
find clades                   split
find elements                 total branch length
get nonterminals              trace
get path

   See: http://biopython.org/DIST/docs/api/Bio.Phylo.
             BaseTree.TreeMixin-class.html
Extra: The Bio.Phylo class hierarchy




Figure: Inheritance relationship among the core classes
Extra: PhyloXML classes

 $ pydoc Bio.Phylo.PhyloXML

Accession              Date                 Point
Alphabet               Distribution         Polygon
Annotation             DomainArchitecture   Property
BaseTree               Events               ProteinDomain
BinaryCharacters       Id                   Reference
BranchColor            MolSeq               Sequence
Clade                  Other                SequenceRelation
CladeRelation          Phylogeny            Taxonomy
Confidence              Phyloxml             Uri


            See: http://biopython.org/wiki/PhyloXML

Contenu connexe

Similaire à Bio.Phylo: Phylogenetics in Biopython (BOSC 2010)

Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Paul Richards
 
A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaRoderic Page
 
Package-based Description Logics – Preliminary Results
Package-based Description Logics – Preliminary ResultsPackage-based Description Logics – Preliminary Results
Package-based Description Logics – Preliminary ResultsJie Bao
 
Bioinformatics p5-bioperl v2013-wim_vancriekinge
Bioinformatics p5-bioperl v2013-wim_vancriekingeBioinformatics p5-bioperl v2013-wim_vancriekinge
Bioinformatics p5-bioperl v2013-wim_vancriekingeProf. Wim Van Criekinge
 
Representing and Reasoning with Modular Ontologies
Representing and Reasoning with Modular OntologiesRepresenting and Reasoning with Modular Ontologies
Representing and Reasoning with Modular OntologiesJie Bao
 
Phylogenetics Analysis in R
Phylogenetics Analysis in RPhylogenetics Analysis in R
Phylogenetics Analysis in RKlaus Schliep
 
Querying XML: XPath and XQuery
Querying XML: XPath and XQueryQuerying XML: XPath and XQuery
Querying XML: XPath and XQueryKatrien Verbert
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesElsevier
 
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docxCS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docxannettsparrow
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationRutger Vos
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
 
Plays Well with Others, or What I’ve learned as a data provider in an intero...
Plays Well with Others, or What I’ve learned as a data provider in an intero...Plays Well with Others, or What I’ve learned as a data provider in an intero...
Plays Well with Others, or What I’ve learned as a data provider in an intero...Chris Freeland
 
These questions will be a bit advanced level 2
These questions will be a bit advanced level 2These questions will be a bit advanced level 2
These questions will be a bit advanced level 2sadhana312471
 
Perl%20SYLLABUS%20PB
Perl%20SYLLABUS%20PBPerl%20SYLLABUS%20PB
Perl%20SYLLABUS%20PBtutorialsruby
 

Similaire à Bio.Phylo: Phylogenetics in Biopython (BOSC 2010) (20)

Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
 
A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-Baca
 
PYTHON 101.pptx
PYTHON 101.pptxPYTHON 101.pptx
PYTHON 101.pptx
 
Package-based Description Logics – Preliminary Results
Package-based Description Logics – Preliminary ResultsPackage-based Description Logics – Preliminary Results
Package-based Description Logics – Preliminary Results
 
biopython, doctest and makefiles
biopython, doctest and makefilesbiopython, doctest and makefiles
biopython, doctest and makefiles
 
Uncovering Library Features from API Usage on Stack Overflow
Uncovering Library Features from API Usage on Stack OverflowUncovering Library Features from API Usage on Stack Overflow
Uncovering Library Features from API Usage on Stack Overflow
 
Bioinformatica p6-bioperl
Bioinformatica p6-bioperlBioinformatica p6-bioperl
Bioinformatica p6-bioperl
 
Bioinformatics p5-bioperl v2013-wim_vancriekinge
Bioinformatics p5-bioperl v2013-wim_vancriekingeBioinformatics p5-bioperl v2013-wim_vancriekinge
Bioinformatics p5-bioperl v2013-wim_vancriekinge
 
Representing and Reasoning with Modular Ontologies
Representing and Reasoning with Modular OntologiesRepresenting and Reasoning with Modular Ontologies
Representing and Reasoning with Modular Ontologies
 
Phylogenetics Analysis in R
Phylogenetics Analysis in RPhylogenetics Analysis in R
Phylogenetics Analysis in R
 
Querying XML: XPath and XQuery
Querying XML: XPath and XQueryQuerying XML: XPath and XQuery
Querying XML: XPath and XQuery
 
Semi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific TablesSemi-automated Exploration and Extraction of Data in Scientific Tables
Semi-automated Exploration and Extraction of Data in Scientific Tables
 
philogenetic tree
philogenetic treephilogenetic tree
philogenetic tree
 
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docxCS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
 
i18n and L10n in TYPO3 Flow
i18n and L10n in TYPO3 Flowi18n and L10n in TYPO3 Flow
i18n and L10n in TYPO3 Flow
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integration
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
 
Plays Well with Others, or What I’ve learned as a data provider in an intero...
Plays Well with Others, or What I’ve learned as a data provider in an intero...Plays Well with Others, or What I’ve learned as a data provider in an intero...
Plays Well with Others, or What I’ve learned as a data provider in an intero...
 
These questions will be a bit advanced level 2
These questions will be a bit advanced level 2These questions will be a bit advanced level 2
These questions will be a bit advanced level 2
 
Perl%20SYLLABUS%20PB
Perl%20SYLLABUS%20PBPerl%20SYLLABUS%20PB
Perl%20SYLLABUS%20PB
 

Dernier

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Dernier (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Bio.Phylo: Phylogenetics in Biopython (BOSC 2010)

  • 1. Bio.Phylo A unified phylogenetics toolkit for Biopython Eric Talevich Institute of Bioinformatics University of Georgia June 29, 2010
  • 2. Abstract Bio.Phylo is a new phylogenetics library for: • Exploring, modifying and annotating trees • Reading & writing standard file formats • Quick visualization • Gluing together computational pipelines Availability: Biopython 1.54
  • 3. A quick survey of file formats Newick (a.k.a. New Hampshire) is a simple nested-parens format: (A, (B, C), (D, E)) • Extended & tweaked, led to NHX (and parsing problems) Nexus is a collection of formats, including Newick trees • More than just tree data. . . still tough to parse PhyloXML is an XML-based replacement for NHX • Annotations formalized as XML elements; extensible with user-defined element types NeXML is an XML-based successor to Nexus • Ontology-based — key-value assignments have semantic meaning
  • 4. Demo: What’s in a tree? 1. Read a simple Newick file 4. Promote to a PhyloXML tree 2. Inspect through IPython 5. Set branch colors 3. Draw with 6. Write a PhyloXML file PyLab/matplotlib
  • 5. # In a terminal, make a simple Newick file # Then launch the IPython interpreter and read the file % cat > simple.dnd <<EOF > (((A,B),(C,D)),(E,F,G)) > EOF % ipython -pylab >>> from Bio import Phylo >>> tree = Phylo.read(’simple.dnd’, ’newick’)
  • 6. # String representation shows the object structure >>> print tree Tree(weight=1.0, rooted=False, name=’’) Clade(branch_length=1.0) Clade(branch_length=1.0) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’A’) Clade(branch_length=1.0, name=’B’) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’C’) Clade(branch_length=1.0, name=’D’) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’E’) Clade(branch_length=1.0, name=’F’) Clade(branch_length=1.0, name=’G’)
  • 7. # Draw an ASCII-art dendrogram >>> Phylo.draw_ascii(tree, column_width=52) ______________ A ______________| | |______________ B ______________| | | ______________ C | |______________| _| |______________ D | | ______________ E | | |______________|______________ F | |______________ G
  • 8. >>> tree.rooted = True >>> Phylo.draw graphiz(tree) D A C B G E F
  • 9. # Promote a basic tree to PhyloXML >>> from Bio.Phylo.PhyloXML import Phylogeny >>> phy = Phylogeny.from_tree(tree) >>> print phy Phylogeny(rooted=True, name=’’) Clade(branch_length=1.0) Clade(branch_length=1.0) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’A’) Clade(branch_length=1.0, name=’B’) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’C’) Clade(branch_length=1.0, name=’D’) Clade(branch_length=1.0) Clade(branch_length=1.0, name=’E’) Clade(branch_length=1.0, name=’F’) Clade(branch_length=1.0, name=’G’)
  • 10. Branch color >>> phy.root.color = (128, 128, 128) Or: >>> phy.root.color = ’#808080’ Or: >>> phy.root.color = ’gray’ Find clades by attribute values: >>> mrca = phy.common ancestor({’name’:’E’}, {’name’:’F’}) >>> mrca.color = ’salmon’ Directly index a clade: >>> phy.clade[0,1].color = ’blue’ >>> Phylo.draw graphviz(phy, prog=’neato’)
  • 11. D B C A G F E
  • 12. # Save the color annotations in phyloXML >>> Phylo.write(phy, ’simple-color.xml’, ’phyloxml’) <phy:phyloxml xmlns:phy="http://www.phyloxml.org"> <phylogeny rooted="true"> <clade> <branch_length>1.0</branch_length> <color> <red>128</red> <green>128</green> <blue>128</blue> </color> <clade> <branch_length>1.0</branch_length> <clade> <branch_length>1.0</branch_length> <clade> <name>A</name> ...
  • 13. Thanks Holla: • Brad Chapman and Christian Zmasek, GSoC 2009 mentors • The Biopython developers, feat. Peter J. A. Cock, Frank Kauff & Cymon J. Cox • Hilmar Lapp & the NESCent Phyloinformatics program • Google’s Open Source Programs Office • My professor, Dr. Natarajan Kannan • Developers like you
  • 14. Q&A • Which 3rd-party applications should we wrap in Bio.Phylo.Applications? (e.g. RAxML, MrBayes) • Which other libraries should we support interoperability with? (PyCogent, ape) • What other algorithms are simple, stable and relevant? (Consensus, rooting) • Features for systematics? (Geography, PopGen integration?)
  • 15. Extra: Tree methods >>> dir(tree) collapse get terminals collapse all is bifurcating common ancestor is monophyletic count terminals is parent of depths is preterminal distance ladderize find any prune find clades split find elements total branch length get nonterminals trace get path See: http://biopython.org/DIST/docs/api/Bio.Phylo. BaseTree.TreeMixin-class.html
  • 16. Extra: The Bio.Phylo class hierarchy Figure: Inheritance relationship among the core classes
  • 17. Extra: PhyloXML classes $ pydoc Bio.Phylo.PhyloXML Accession Date Point Alphabet Distribution Polygon Annotation DomainArchitecture Property BaseTree Events ProteinDomain BinaryCharacters Id Reference BranchColor MolSeq Sequence Clade Other SequenceRelation CladeRelation Phylogeny Taxonomy Confidence Phyloxml Uri See: http://biopython.org/wiki/PhyloXML