SlideShare a Scribd company logo
1 of 44
Download to read offline
09/02/2011 - Rencontres Alphy, Lyon



 Practical use of combinatorial methods
for phylogenetic network reconstruction
               Philippe Gambette
Outline

 • Phylogenetic networks
 • Motivations for the combinatorial reconstruction approach
 • Combinatorial reconstruction methods
 • Practical use
 • Illustrations
 • Perspectives
Outline

 • Phylogenetic networks
 • Motivations for the combinatorial reconstruction approach
 • Combinatorial reconstruction methods
 • Practical use
 • Illustrations
 • Perspectives
Phylogenetic trees

 Phylogenetic tree of a species set




             species                              “tokogeny”
              tree S




         A     B       C




                                      A   B   C
Genetic material transfer

    Genetic material transfers between coexisting species:
    • horizontal gene transfer
    • hybridization




S




A      B     C




                            A        B           C
Genetic material transfer

    Genetic material transfers between coexisting species:
    • horizontal gene transfer
    • hybridization


              species
              tree S


 A      B     C                                              gene G1


                                                                 A     B   C


A       B     C
network N
                            A        B           C
Genetic material transfer

    Genetic material transfers between coexisting species:
    • horizontal gene transfer
    • hybridization

                                                             gene G1
              species
              tree S
                                                                 A     B     C

 A      B     C
                                                              incompatible
                                                              gene trees


                                                             gene G2
A       B     C
network N
                                                                 A     B     C
                            A        B           C
Phylogenetic networks

  Phylogenetic network: network representing evolution data
  • explicit phylogenetic networks
  to model evolution                                                              Simplistic
                                                           synthesis
                                                            diagram
                         galled
                       network
                                                                                      level-2
                                                                                      network
                      Dendroscope

                                              HorizStory

  • abstract phylogenetic network
  to classify, visualize data
                                                                        minimum covering network
split network                          median
                                       network


SplitsTree
                                    Network
                                                                       TCS
Phylogenetic network software
                                                     Who is Who in
                                                     Phylogenetic Networks,
                                                     Articles, Authors &
                                                     Programs




Based on BibAdmin by Sergiu Chelcea
+ tag clouds, date histogram, journal lists,
co-author graphs, keyword definitions.
                                               http://www.atgc-montpellier.fr/phylnet
Explicit phylogenetic networks

 Phylogenetic network: network representing evolution data
 • explicit phylogenetic networks
 evolution model
                                                     Simplistic


                        galled
                      network
                                                         level-2
                                                         network
                    Dendroscope




                                synthesis
                                 diagram




                   HorizStory               T-Rex
                                              reticulogram
Explicit phylogenetic networks

 Rooted explicit phylogenetic network: tree-like parts + blobs.




                                        vertices with more than one parent:
                                        reticulations
               h1
                             h3
                    h2


   a b c d e f g h i              j k
Plan

 • Phylogenetic networks
 • Motivations for the combinatorial reconstruction approach
 • Combinatorial reconstruction methods
 • Practical use
 • Illustrations
 • Perspectives
Phylogenetic network reconstruction



espèce   1   :   AATTGCAG TAGCCCAAAAT
espèce   2   :   ACCTGCAG TAGACCAAT
espèce
espèce
         3
         4
             :
             :
                 GCTTGCCG
                 ATTTGCAG
                          TAGACAAGAAT
                          AAGACCAAAT
                                        {gene sequences}
espèce   5   :            TAGACAAGAAT
espèce   6   :   ACTTGCAG TAGCACAAAAT
espèce   7   :   ACCTGGTG TAAAAT                 distance methods
                  G1        G2                         Bandelt & Dress 1992 - Legendre & Makarenkov
                                                                        2000 - Bryant & Moulton 2002

                                                 parsimony methods
                                                           Hein 1990 - Kececioglu & Gusfield 1994 - Jin,
                                                                             Nakhleh, Snir, Tuller 2009

                                                 likelihood methods
                                                     Snir & Tuller 2009 - Jin, Nakhleh, Snir, Tuller 2009 -
                                                                                   Velasco & Sober 2009

                                         network N
Phylogenetic network reconstruction

                                           Problem: usually slow,
                                        lots of sequences available.

espèce   1   :   AATTGCAG TAGCCCAAAAT
espèce   2   :   ACCTGCAG TAGACCAAT
espèce
espèce
         3
         4
             :
             :
                 GCTTGCCG
                 ATTTGCAG
                          TAGACAAGAAT
                          AAGACCAAAT
                                         {gene sequences}
espèce   5   :            TAGACAAGAAT
espèce   6   :   ACTTGCAG TAGCACAAAAT
espèce   7   :   ACCTGGTG TAAAAT                    distance methods
                  G1        G2                           Bandelt & Dress 1992 - Legendre & Makarenkov
                                                                          2000 - Bryant & Moulton 2002

                                                    parsimony methods
                                                             Hein 1990 - Kececioglu & Gusfield 1994 - Jin,
                                                                               Nakhleh, Snir, Tuller 2009

                                                    likelihood methods
                                                       Snir & Tuller 2009 - Jin, Nakhleh, Snir, Tuller 2009 -
                                                                                     Velasco & Sober 2009

                                           network N
Phylogenetic network reconstruction
espèce   1   :   AATTGCAG TAGCCCAAAAT
espèce   2   :   ACCTGCAG TAGACCAAT
espèce
espèce
         3
         4
             :
             :
                 GCTTGCCG
                 ATTTGCAG
                          TAGACAAGAAT
                          AAGACCAAAT    {gene sequences}
espèce   5   :            TAGACAAGAAT
espèce   6   :   ACTTGCAG TAGCACAAAAT
espèce   7   :   ACCTGGTG TAAAAT

                  G1        G2                    Reconstruction of one tree for each gene
                                                  present in several species
                                                                        Guindon & Gascuel, SB, 2003

             T1
                                             {trees}        HOGENOM database
                                                            Dufayard, Duret, Penel, Gouy,
                                                            Rechenmann & Perrière, BioInf, 2005
                            T2
                                                  Tree consensus or reconciliation

      rooted
     explicit
     network                            optimal network N
Phylogenetic network reconstruction
espèce   1   :   AATTGCAG TAGCCCAAAAT
espèce   2   :   ACCTGCAG TAGACCAAT
espèce
espèce
         3
         4
             :
             :
                 GCTTGCCG
                 ATTTGCAG
                          TAGACAAGAAT
                          AAGACCAAAT        {gene sequences}
espèce   5   :            TAGACAAGAAT
espèce   6   :   ACTTGCAG TAGCACAAAAT
espèce   7   :   ACCTGGTG TAAAAT

                  G1         G2                         Reconstruction of one tree for each gene
                                                        present in several species

                                                                             Guindon & Gascuel, SB, 2003
             T1
                                                  {trees}        HOGENOM database
                                                                 Dufayard, Duret, Penel, Gouy,
                                                                 Rechenmann & Perrière, BioInf, 2005
                              T2                                 > 500 species, >70 000 trees
                                                        Tree consensus or reconciliation

      rooted
     explicit
     network                               optimal network N

                    Problem: tree reconciliation is difficult even for 2 trees
                            (NP-complete for 2 trees with minimum reticulation number)
                                                                            Bordewich & Semple, DAM, 2007
Triplets and clusters

Problem:
Reconstructing the supernetwork of a set of trees is
                                    hard.
Idea:
reconstruct a network containing all:



                       triplets
                        a|ce

          a b c d e


                                            of the input trees ?
Triplets and clusters

Problem:
Reconstructing the supernetwork of a set of trees is
                                    hard.
Idea:
reconstruct a network containing all:



                       triplets                               clusters
                        a|ce                                  {c,d,e}


          a b c d e                             a b c d e


                                            of the input trees ?
Softwired clusters

 “softwired” cluster : cluster of a tree contained in the network

 Tree-like model of gene transmission:
 each gene comes from a single parent


                                   abc
                                   ab         cd
                                         bc



                               a    b     c   d
Softwired clusters

 “softwired” cluster : cluster of a tree contained in the network

 Tree-like model of gene transmission:
 each gene comes from a single parent


                                   abc
                                   ab         cd
                                         bc



                               a    b     c   d
Combinatorial phylogenetic network reconstruction

Idea:
change the type of data to process

                                     {trees}




                                {clusters}        {triplets}




          optimal               optimal            optimal
      supernetwork N        supernetwork N'    supernetwork N''

                       N = N' = N''?
Combinatorial phylogenetic network reconstruction

Idea:
change the type of data to process

                                     {trees}




                                {clusters}        {triplets}




          optimal               optimal            optimal
      supernetwork N        supernetwork N'    supernetwork N''

           { N } ⊆ { N' } ⊆ { N'' }
Plan

 • Phylogenetic networks
 • Motivations for the combinatorial reconstruction approach
 • Combinatorial reconstruction methods
 • Practical use
 • Illustrations
 • Perspectives
Reconstruction from softwired clusters

{trees}      Fast exact galled network reconstruction method from softwired
             clusters
                                         Huson, Rupp, Berry, Gambette & Paul, ISMB 2009



              Step 1- Solve cluster conflicts by deleting taxa,
              reconstruct tree on remaining taxa
{clusters}    MAXIMUM COMPATIBLE SUBSET                                  a   b    c   d
                                                                             x    y

              Step 2- Attach taxa involved in conflicts to the tree
              with the minimum number of arcs:
              MINIMUM ATTACHMENT
   N'
   galled network

                                             a b      x    y    c   d
                                                               http://www.dendroscope.org
Reconstruction from softwired clusters

{arbres}      Exact method for level k network reconstruction from softwired
              clusters
                                                  Iersel, Kelk, Rupp & Huson, ISMB 2010

                         less reticulations, but slower for level > 2.


{clusters}                                     level =
                                               maximum number of reticulations
                                               per blob.


             a b c d e f g h i           j k
              level-2 network
   N'
   level-k
   network                level-1 network
                          (“galled tree”)       a b c d e f g h i                  j k
                                                          http://www.dendroscope.org
Reconstruction from triplets

 {trees}     Exact methods to reconstruct level-1 and 2 networks (if there exist
             any) from a dense triplet set
                                             Jansson, Nguyen & Sung, SODA'05 : O(n3) pour niveau 1,
                                               van Iersel, Kelk & al, RECOMB'08 : O(n8) pour niveau 2,
                                                           To & Habib, CPM'09 : O(n5k+4) pour niveau k


             T dense triplet set =
{triplets}   On any subset of 3 leaves, T contains at least one triplet

             Program Simplistic




   N'                              Yeast phylogenetic network -
   level-k                         Van Iersel et al. :
                                   Constructing level-2 phylogenetic
   network                         networks from triplets.
                                   RECOMB 2008

                                                   http://homepages.cwi.nl/~kelk/simplistic.html
Reconstruction from triplets

 {trees}     Fast heuristic method to reconstruct a level-1 network containing
             most of the input triplets
                                                           Huber, van Iersel, Kelk & Suchecki, TCBB, 2011



             Program Lev1athan
{triplets}


                          Phylogenetic network built from triplets
                          extracted from 2 trees of HIV-1 strains
                          Huber, van Iersel, Kelk & Suchecki
                          A practical algorithm for reconstructing
                          level-1 phylogenetic networks
                          TCBB, 2011
   N'
   level-1
   network


                                                          http://homepages.cwi.nl/~kelk/lev1athan/
Outline

 • Phylogenetic networks
 • Motivations for the combinatorial reconstruction approach
 • Combinatorial reconstruction methods
 • Practical use
 • Illustrations
 • Perspectives
Solution ambiguity

      Ambiguity of the results even with complete and correct data



              Many distinct minimal networks have exactly
          the same set of contained trees, triplets, and clusters.



                               a                c
              a         c
                                   b                b

                  b                        c                a
                 Characterization for level-1 networks :
        The only ambiguous cases have such blobs (< 5 vertices)


                                                           Gambette & Huber, 2011
Solution ambiguity

      Ambiguity of the results even with complete and correct data



              Many distinct minimal networks have exactly
          the same set of contained trees, triplets, and clusters.

       a|bc

                               a                c
              a         c
                                   b                b

                  b                        c                a
                 Characterization for level-1 networks :
        The only ambiguous cases have such blobs (< 5 vertices)


                                                           Gambette & Huber, 2011
Solution ambiguity

      Ambiguity of the results even with complete and correct data



              Many distinct minimal networks have exactly
          the same set of contained trees, triplets, and clusters.

       c|ab

                               a                c
              a         c
                                   b                b

                  b                        c                a
                 Characterization for level-1 networks :
        The only ambiguous cases have such blobs (< 5 vertices)


                                                           Gambette & Huber, 2011
Solution ambiguity

      Ambiguity of the results even with complete and correct data



              Many distinct minimal networks have exactly
          the same set of contained trees, triplets, and clusters.


                                                x1
                                x1                   x2
                           x2

                    b
                                a           b         a
           2 level-2 networks with exactly the same triplet set




                                                           Gambette & Huber, 2011
Solution ambiguity

      Ambiguity of the results even with complete and correct data



              Many distinct minimal networks have exactly
          the same set of contained trees, triplets, and clusters.


                                                x1
                                x1                   x2
                           x2

                    b
                                a           b         a
           2 level-2 networks with exactly the same triplet set
                  Even with complete and correct data,
       impossible to choose among the ambiguous configurations!

                                                           Gambette & Huber, 2011
Practical use
                                                                                          existing methods
                                                                                          still work to do!
Conditions for use                 Available data                Possible processings

rooted trees                       unrooted trees                Rooting with a reference species
                                                                 tree or topological constraints

Single-copy gene trees             MUL-trees (with duplicated    MUL-tree processing
                                   genes)                         Scornavacca, Berry & Ranwez, 2009


Correct clusters and triplets      noisy data                    Tree cleaning
                                                                                          PhySIC_IST, 2008
                                                                 Data filtering (clusters with high
                                                                 bootstrap value, present in >x% of the
                                                                 trees)
                                                                 Data editing : solution containing
                                                                 most of the input data

Complete data (density for         partial data, deleted genes   Selection of a large number of trees
triplet sets, complete clusters)                                 with a large number of common
                                                                 species
                                                                 Selection of the maximal number of
                                                                 taxa with triplet density
                                                                                   NP-complete problems
Outline

 • Phylogenetic networks
 • Motivations for the combinatorial reconstruction approach
 • Combinatorial reconstruction methods
 • Practical use
 • Illustrations
 • Perspectives
Illustrations

 16 trees on 47 taxa from the HOGENOM database            Lev1athan
 (proteobacteria)                                          (heuristic
 24 Enterobacteriales                                        triplets,
 2 Pasteurellales                                            level-1)
 1 Aeromonadales                                              24 sec.
 9 Alteromonadales
 1 Oceanospirillales
 6 Rhodobacterales
 4 Rhizobiales

 Networks containing triplets, softwired clusters,
 present in at least 20% of the trees


 Simplistic                                             Dendroscope
 (triplets, level-7                                  (clusters, galled
 network)                                                   network)
 63 sec.                                                      <1 sec.
Illustrations

 16 trees on 47 taxa from the HOGENOM database        Dendroscope
 (proteobacteria)                                        (clusters,
 24 Enterobacteriales                                       cluster
 2 Pasteurellales                                        network,
 1 Aeromonadales                                           level 1)
 9 Alteromonadales                                          <1 sec.
 1 Oceanospirillales
 6 Rhodobacterales
 4 Rhizobiales

 Networks containing triplets, softwired clusters,
 present in at least 20% of the trees


                                                       Dendroscope
 Dendroscope                                         (clusters,galled
 (clusters,                                                network)
 level-7                                                      <1 sec.
 network)
 2 sec.
Illustrations

 9 trees on 279 procaryote species
 Clusters in at least 2 trees
                                     Auch, Steigele, Huson & Henz, 2009




                                                               Dendroscope
                                                  (clusters, galled network)
                                                                       2 sec.
Illustrations
 23 trees, 45 species from the 3 domains of life                                                 Dendroscope
 clusters with 99% bootstrap confidence                                                               (galled
                                                                                                    network)
                                                                                                       4 sec.




                                          Data from Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ:
             Universal trees based on large combined protein sequence data sets. Nat Genet 2001, 28:281--285
Illustrations
 23 trees, 45 species from the 3 domains of life                                                 Dendroscope
 clusters with 80% bootstrap confidence                                                               (galled
 present in at least 2 trees                                                                        network)
                                                                                                      <1 sec.




                                          Data from Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ:
             Universal trees based on large combined protein sequence data sets. Nat Genet 2001, 28:281--285
Illustrations
 23 trees, 45 species from the 3 domains of life                                                 Dendroscope
 clusters with 80% bootstrap confidence                                                               (level-3
 present in at least 2 trees                                                                        network)
                                                                                                       <1 sec.




                                          Data from Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ:
             Universal trees based on large combined protein sequence data sets. Nat Genet 2001, 28:281--285
Outline

 • Phylogenetic networks
 • Motivations for the combinatorial reconstruction approach
 • Combinatorial reconstruction methods
 • Practical use
 • Illustrations
 • Perspectives
Perspectives

 Combinatorics :
 - Better knowledge of small level networks
 - Update of a network with new data
 - Unrooted explicit phylogenetic network reconstruction

 Bioinformatics :
 - Function of transferred genes (“transfer highways”)
 - Integration of combinatorial methods in a statistical framework

        Sequence data                                Combinatorial reconstruction
                                                        of a set of candidates

                            Construction of
                          combinatorial data
                                                     Choice among the candidates
                                                        By statistical methods


                                      Phylogenetic network
Thank you !
 Coauthors:
 - Vincent Berry, Christophe Paul (LIRMM)
 - Daniel Huson, Regula Rupp (Tübingen)
 - Katharina Huber (East Anglia)

 Brown et al.'s data provided by Sophie Abby




                                               Réticulogramme des 25 mots les plus fréquents
                                               de ma thèse, construit par
                                                 TreeCloud, SplitsTree et T-Rex
                                               Coloration : rouge au début, bleu à la fin

                                                              http://www.treecloud.org

More Related Content

Similar to Practical use of combinatorial methods for phylogenetic network reconstruction

Open Tree of Life @Evolution 2012
Open Tree of Life @Evolution 2012Open Tree of Life @Evolution 2012
Open Tree of Life @Evolution 2012Karen Cranston
 
Next-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologiesNext-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologiesJan Aerts
 
Aug2013 bioinformatics working group
Aug2013 bioinformatics working groupAug2013 bioinformatics working group
Aug2013 bioinformatics working groupGenomeInABottle
 
Software for SBML Today
Software for SBML TodaySoftware for SBML Today
Software for SBML TodayMike Hucka
 
Genetics (PPT from Mrs. Brenda Lee)
Genetics (PPT from Mrs. Brenda Lee)Genetics (PPT from Mrs. Brenda Lee)
Genetics (PPT from Mrs. Brenda Lee)Carla
 
Creating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSACreating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSAheathermerk
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3GenomeInABottle
 
Hadoop for Bioinformatics
Hadoop for BioinformaticsHadoop for Bioinformatics
Hadoop for BioinformaticsDeepak Singh
 
20190927 generative models_aia
20190927 generative models_aia20190927 generative models_aia
20190927 generative models_aiaYi-Fan Liou
 
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...Eastern Pennsylvania Branch ASM
 
Solis-Lemus & Ane (2016) Inferring Phylogenetic Networks.pptx
Solis-Lemus & Ane (2016) Inferring Phylogenetic Networks.pptxSolis-Lemus & Ane (2016) Inferring Phylogenetic Networks.pptx
Solis-Lemus & Ane (2016) Inferring Phylogenetic Networks.pptxRyanLong78
 
Hw09 Hadoop For Bioinfomatics
Hw09   Hadoop For BioinfomaticsHw09   Hadoop For Bioinfomatics
Hw09 Hadoop For BioinfomaticsCloudera, Inc.
 
OBC | Synthetic biology announcing the coming technological revolution
OBC | Synthetic biology announcing the coming technological revolutionOBC | Synthetic biology announcing the coming technological revolution
OBC | Synthetic biology announcing the coming technological revolutionOut of The Box Seminar
 

Similar to Practical use of combinatorial methods for phylogenetic network reconstruction (20)

Open Tree of Life @Evolution 2012
Open Tree of Life @Evolution 2012Open Tree of Life @Evolution 2012
Open Tree of Life @Evolution 2012
 
Pathway analysis 2012
Pathway analysis 2012Pathway analysis 2012
Pathway analysis 2012
 
Next-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologiesNext-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologies
 
Basics of Genome Assembly
Basics of Genome Assembly Basics of Genome Assembly
Basics of Genome Assembly
 
Bioinformatica t3-scoring matrices
Bioinformatica t3-scoring matricesBioinformatica t3-scoring matrices
Bioinformatica t3-scoring matrices
 
Aug2013 bioinformatics working group
Aug2013 bioinformatics working groupAug2013 bioinformatics working group
Aug2013 bioinformatics working group
 
Software for SBML Today
Software for SBML TodaySoftware for SBML Today
Software for SBML Today
 
Genetics (PPT from Mrs. Brenda Lee)
Genetics (PPT from Mrs. Brenda Lee)Genetics (PPT from Mrs. Brenda Lee)
Genetics (PPT from Mrs. Brenda Lee)
 
Biological networks
Biological networksBiological networks
Biological networks
 
Creating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSACreating a Kinship Matrix Using MSA
Creating a Kinship Matrix Using MSA
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
 
Hadoop for Bioinformatics
Hadoop for BioinformaticsHadoop for Bioinformatics
Hadoop for Bioinformatics
 
20190927 generative models_aia
20190927 generative models_aia20190927 generative models_aia
20190927 generative models_aia
 
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
 
Solis-Lemus & Ane (2016) Inferring Phylogenetic Networks.pptx
Solis-Lemus & Ane (2016) Inferring Phylogenetic Networks.pptxSolis-Lemus & Ane (2016) Inferring Phylogenetic Networks.pptx
Solis-Lemus & Ane (2016) Inferring Phylogenetic Networks.pptx
 
Hw09 Hadoop For Bioinfomatics
Hw09   Hadoop For BioinfomaticsHw09   Hadoop For Bioinfomatics
Hw09 Hadoop For Bioinfomatics
 
OBC | Synthetic biology announcing the coming technological revolution
OBC | Synthetic biology announcing the coming technological revolutionOBC | Synthetic biology announcing the coming technological revolution
OBC | Synthetic biology announcing the coming technological revolution
 
Biological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical ModelsBiological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical Models
 
Pathway and network analysis
Pathway and network analysisPathway and network analysis
Pathway and network analysis
 

More from Philippe Gambette

Nuages arborés et analyse textuelle de corpus politiques avec TreeCloud
Nuages arborés et analyse textuelle de corpus politiques avec TreeCloudNuages arborés et analyse textuelle de corpus politiques avec TreeCloud
Nuages arborés et analyse textuelle de corpus politiques avec TreeCloudPhilippe Gambette
 
Nuages arborés et analyse textuelle - Présentation de l’outil TreeCloud
Nuages arborés et analyse textuelle - Présentation de l’outil TreeCloudNuages arborés et analyse textuelle - Présentation de l’outil TreeCloud
Nuages arborés et analyse textuelle - Présentation de l’outil TreeCloudPhilippe Gambette
 
Longueur de branches et arbres de mots
Longueur de branches et arbres de motsLongueur de branches et arbres de mots
Longueur de branches et arbres de motsPhilippe Gambette
 
Méthodes combinatoires de reconstruction de réseaux phylogénétiques
Méthodes combinatoires de reconstruction de réseaux phylogénétiquesMéthodes combinatoires de reconstruction de réseaux phylogénétiques
Méthodes combinatoires de reconstruction de réseaux phylogénétiquesPhilippe Gambette
 
Quadruplets et réseaux non enracinés de niveau k
Quadruplets et réseaux non enracinés de niveau kQuadruplets et réseaux non enracinés de niveau k
Quadruplets et réseaux non enracinés de niveau kPhilippe Gambette
 
Utilisation de la visualisation en nuage arboré pour l'analyse littéraire
Utilisation de la visualisation en nuage arboré pour l'analyse littéraireUtilisation de la visualisation en nuage arboré pour l'analyse littéraire
Utilisation de la visualisation en nuage arboré pour l'analyse littérairePhilippe Gambette
 
Analyse de textes avec TreeCloud et Lexico3
Analyse de textes avec TreeCloud et Lexico3Analyse de textes avec TreeCloud et Lexico3
Analyse de textes avec TreeCloud et Lexico3Philippe Gambette
 
Géolocalisation de données et conception de cartes interactives
Géolocalisation de données et conception de cartes interactivesGéolocalisation de données et conception de cartes interactives
Géolocalisation de données et conception de cartes interactivesPhilippe Gambette
 
Reconstruction combinatoire de réseaux phylogénétiques
Reconstruction combinatoire de réseaux phylogénétiquesReconstruction combinatoire de réseaux phylogénétiques
Reconstruction combinatoire de réseaux phylogénétiquesPhilippe Gambette
 
The Structure of Level-k Phylogenetic Networks
The Structure of Level-k Phylogenetic NetworksThe Structure of Level-k Phylogenetic Networks
The Structure of Level-k Phylogenetic NetworksPhilippe Gambette
 
Visualiser un texte par un nuage arboré
Visualiser un texte par un nuage arboréVisualiser un texte par un nuage arboré
Visualiser un texte par un nuage arboréPhilippe Gambette
 
Estimation du nombre de citations de papillotes et de blagues Carambar
Estimation du nombre de citations de papillotes et de blagues CarambarEstimation du nombre de citations de papillotes et de blagues Carambar
Estimation du nombre de citations de papillotes et de blagues CarambarPhilippe Gambette
 
On restrictions of balanced 2-interval graphs
On restrictions of balanced 2-interval graphsOn restrictions of balanced 2-interval graphs
On restrictions of balanced 2-interval graphsPhilippe Gambette
 
Reconstruction de reseaux phylogenetiques a structure arboree depuis un ensem...
Reconstruction de reseaux phylogenetiques a structure arboree depuis un ensem...Reconstruction de reseaux phylogenetiques a structure arboree depuis un ensem...
Reconstruction de reseaux phylogenetiques a structure arboree depuis un ensem...Philippe Gambette
 
Visualising a text with a tree cloud
Visualising a text with a tree cloudVisualising a text with a tree cloud
Visualising a text with a tree cloudPhilippe Gambette
 

More from Philippe Gambette (15)

Nuages arborés et analyse textuelle de corpus politiques avec TreeCloud
Nuages arborés et analyse textuelle de corpus politiques avec TreeCloudNuages arborés et analyse textuelle de corpus politiques avec TreeCloud
Nuages arborés et analyse textuelle de corpus politiques avec TreeCloud
 
Nuages arborés et analyse textuelle - Présentation de l’outil TreeCloud
Nuages arborés et analyse textuelle - Présentation de l’outil TreeCloudNuages arborés et analyse textuelle - Présentation de l’outil TreeCloud
Nuages arborés et analyse textuelle - Présentation de l’outil TreeCloud
 
Longueur de branches et arbres de mots
Longueur de branches et arbres de motsLongueur de branches et arbres de mots
Longueur de branches et arbres de mots
 
Méthodes combinatoires de reconstruction de réseaux phylogénétiques
Méthodes combinatoires de reconstruction de réseaux phylogénétiquesMéthodes combinatoires de reconstruction de réseaux phylogénétiques
Méthodes combinatoires de reconstruction de réseaux phylogénétiques
 
Quadruplets et réseaux non enracinés de niveau k
Quadruplets et réseaux non enracinés de niveau kQuadruplets et réseaux non enracinés de niveau k
Quadruplets et réseaux non enracinés de niveau k
 
Utilisation de la visualisation en nuage arboré pour l'analyse littéraire
Utilisation de la visualisation en nuage arboré pour l'analyse littéraireUtilisation de la visualisation en nuage arboré pour l'analyse littéraire
Utilisation de la visualisation en nuage arboré pour l'analyse littéraire
 
Analyse de textes avec TreeCloud et Lexico3
Analyse de textes avec TreeCloud et Lexico3Analyse de textes avec TreeCloud et Lexico3
Analyse de textes avec TreeCloud et Lexico3
 
Géolocalisation de données et conception de cartes interactives
Géolocalisation de données et conception de cartes interactivesGéolocalisation de données et conception de cartes interactives
Géolocalisation de données et conception de cartes interactives
 
Reconstruction combinatoire de réseaux phylogénétiques
Reconstruction combinatoire de réseaux phylogénétiquesReconstruction combinatoire de réseaux phylogénétiques
Reconstruction combinatoire de réseaux phylogénétiques
 
The Structure of Level-k Phylogenetic Networks
The Structure of Level-k Phylogenetic NetworksThe Structure of Level-k Phylogenetic Networks
The Structure of Level-k Phylogenetic Networks
 
Visualiser un texte par un nuage arboré
Visualiser un texte par un nuage arboréVisualiser un texte par un nuage arboré
Visualiser un texte par un nuage arboré
 
Estimation du nombre de citations de papillotes et de blagues Carambar
Estimation du nombre de citations de papillotes et de blagues CarambarEstimation du nombre de citations de papillotes et de blagues Carambar
Estimation du nombre de citations de papillotes et de blagues Carambar
 
On restrictions of balanced 2-interval graphs
On restrictions of balanced 2-interval graphsOn restrictions of balanced 2-interval graphs
On restrictions of balanced 2-interval graphs
 
Reconstruction de reseaux phylogenetiques a structure arboree depuis un ensem...
Reconstruction de reseaux phylogenetiques a structure arboree depuis un ensem...Reconstruction de reseaux phylogenetiques a structure arboree depuis un ensem...
Reconstruction de reseaux phylogenetiques a structure arboree depuis un ensem...
 
Visualising a text with a tree cloud
Visualising a text with a tree cloudVisualising a text with a tree cloud
Visualising a text with a tree cloud
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Practical use of combinatorial methods for phylogenetic network reconstruction

  • 1. 09/02/2011 - Rencontres Alphy, Lyon Practical use of combinatorial methods for phylogenetic network reconstruction Philippe Gambette
  • 2. Outline • Phylogenetic networks • Motivations for the combinatorial reconstruction approach • Combinatorial reconstruction methods • Practical use • Illustrations • Perspectives
  • 3. Outline • Phylogenetic networks • Motivations for the combinatorial reconstruction approach • Combinatorial reconstruction methods • Practical use • Illustrations • Perspectives
  • 4. Phylogenetic trees Phylogenetic tree of a species set species “tokogeny” tree S A B C A B C
  • 5. Genetic material transfer Genetic material transfers between coexisting species: • horizontal gene transfer • hybridization S A B C A B C
  • 6. Genetic material transfer Genetic material transfers between coexisting species: • horizontal gene transfer • hybridization species tree S A B C gene G1 A B C A B C network N A B C
  • 7. Genetic material transfer Genetic material transfers between coexisting species: • horizontal gene transfer • hybridization gene G1 species tree S A B C A B C incompatible gene trees gene G2 A B C network N A B C A B C
  • 8. Phylogenetic networks Phylogenetic network: network representing evolution data • explicit phylogenetic networks to model evolution Simplistic synthesis diagram galled network level-2 network Dendroscope HorizStory • abstract phylogenetic network to classify, visualize data minimum covering network split network median network SplitsTree Network TCS
  • 9. Phylogenetic network software Who is Who in Phylogenetic Networks, Articles, Authors & Programs Based on BibAdmin by Sergiu Chelcea + tag clouds, date histogram, journal lists, co-author graphs, keyword definitions. http://www.atgc-montpellier.fr/phylnet
  • 10. Explicit phylogenetic networks Phylogenetic network: network representing evolution data • explicit phylogenetic networks evolution model Simplistic galled network level-2 network Dendroscope synthesis diagram HorizStory T-Rex reticulogram
  • 11. Explicit phylogenetic networks Rooted explicit phylogenetic network: tree-like parts + blobs. vertices with more than one parent: reticulations h1 h3 h2 a b c d e f g h i j k
  • 12. Plan • Phylogenetic networks • Motivations for the combinatorial reconstruction approach • Combinatorial reconstruction methods • Practical use • Illustrations • Perspectives
  • 13. Phylogenetic network reconstruction espèce 1 : AATTGCAG TAGCCCAAAAT espèce 2 : ACCTGCAG TAGACCAAT espèce espèce 3 4 : : GCTTGCCG ATTTGCAG TAGACAAGAAT AAGACCAAAT {gene sequences} espèce 5 : TAGACAAGAAT espèce 6 : ACTTGCAG TAGCACAAAAT espèce 7 : ACCTGGTG TAAAAT distance methods G1 G2 Bandelt & Dress 1992 - Legendre & Makarenkov 2000 - Bryant & Moulton 2002 parsimony methods Hein 1990 - Kececioglu & Gusfield 1994 - Jin, Nakhleh, Snir, Tuller 2009 likelihood methods Snir & Tuller 2009 - Jin, Nakhleh, Snir, Tuller 2009 - Velasco & Sober 2009 network N
  • 14. Phylogenetic network reconstruction Problem: usually slow, lots of sequences available. espèce 1 : AATTGCAG TAGCCCAAAAT espèce 2 : ACCTGCAG TAGACCAAT espèce espèce 3 4 : : GCTTGCCG ATTTGCAG TAGACAAGAAT AAGACCAAAT {gene sequences} espèce 5 : TAGACAAGAAT espèce 6 : ACTTGCAG TAGCACAAAAT espèce 7 : ACCTGGTG TAAAAT distance methods G1 G2 Bandelt & Dress 1992 - Legendre & Makarenkov 2000 - Bryant & Moulton 2002 parsimony methods Hein 1990 - Kececioglu & Gusfield 1994 - Jin, Nakhleh, Snir, Tuller 2009 likelihood methods Snir & Tuller 2009 - Jin, Nakhleh, Snir, Tuller 2009 - Velasco & Sober 2009 network N
  • 15. Phylogenetic network reconstruction espèce 1 : AATTGCAG TAGCCCAAAAT espèce 2 : ACCTGCAG TAGACCAAT espèce espèce 3 4 : : GCTTGCCG ATTTGCAG TAGACAAGAAT AAGACCAAAT {gene sequences} espèce 5 : TAGACAAGAAT espèce 6 : ACTTGCAG TAGCACAAAAT espèce 7 : ACCTGGTG TAAAAT G1 G2 Reconstruction of one tree for each gene present in several species Guindon & Gascuel, SB, 2003 T1 {trees} HOGENOM database Dufayard, Duret, Penel, Gouy, Rechenmann & Perrière, BioInf, 2005 T2 Tree consensus or reconciliation rooted explicit network optimal network N
  • 16. Phylogenetic network reconstruction espèce 1 : AATTGCAG TAGCCCAAAAT espèce 2 : ACCTGCAG TAGACCAAT espèce espèce 3 4 : : GCTTGCCG ATTTGCAG TAGACAAGAAT AAGACCAAAT {gene sequences} espèce 5 : TAGACAAGAAT espèce 6 : ACTTGCAG TAGCACAAAAT espèce 7 : ACCTGGTG TAAAAT G1 G2 Reconstruction of one tree for each gene present in several species Guindon & Gascuel, SB, 2003 T1 {trees} HOGENOM database Dufayard, Duret, Penel, Gouy, Rechenmann & Perrière, BioInf, 2005 T2 > 500 species, >70 000 trees Tree consensus or reconciliation rooted explicit network optimal network N Problem: tree reconciliation is difficult even for 2 trees (NP-complete for 2 trees with minimum reticulation number) Bordewich & Semple, DAM, 2007
  • 17. Triplets and clusters Problem: Reconstructing the supernetwork of a set of trees is hard. Idea: reconstruct a network containing all: triplets a|ce a b c d e of the input trees ?
  • 18. Triplets and clusters Problem: Reconstructing the supernetwork of a set of trees is hard. Idea: reconstruct a network containing all: triplets clusters a|ce {c,d,e} a b c d e a b c d e of the input trees ?
  • 19. Softwired clusters “softwired” cluster : cluster of a tree contained in the network Tree-like model of gene transmission: each gene comes from a single parent abc ab cd bc a b c d
  • 20. Softwired clusters “softwired” cluster : cluster of a tree contained in the network Tree-like model of gene transmission: each gene comes from a single parent abc ab cd bc a b c d
  • 21. Combinatorial phylogenetic network reconstruction Idea: change the type of data to process {trees} {clusters} {triplets} optimal optimal optimal supernetwork N supernetwork N' supernetwork N'' N = N' = N''?
  • 22. Combinatorial phylogenetic network reconstruction Idea: change the type of data to process {trees} {clusters} {triplets} optimal optimal optimal supernetwork N supernetwork N' supernetwork N'' { N } ⊆ { N' } ⊆ { N'' }
  • 23. Plan • Phylogenetic networks • Motivations for the combinatorial reconstruction approach • Combinatorial reconstruction methods • Practical use • Illustrations • Perspectives
  • 24. Reconstruction from softwired clusters {trees} Fast exact galled network reconstruction method from softwired clusters Huson, Rupp, Berry, Gambette & Paul, ISMB 2009 Step 1- Solve cluster conflicts by deleting taxa, reconstruct tree on remaining taxa {clusters} MAXIMUM COMPATIBLE SUBSET a b c d x y Step 2- Attach taxa involved in conflicts to the tree with the minimum number of arcs: MINIMUM ATTACHMENT N' galled network a b x y c d http://www.dendroscope.org
  • 25. Reconstruction from softwired clusters {arbres} Exact method for level k network reconstruction from softwired clusters Iersel, Kelk, Rupp & Huson, ISMB 2010 less reticulations, but slower for level > 2. {clusters} level = maximum number of reticulations per blob. a b c d e f g h i j k level-2 network N' level-k network level-1 network (“galled tree”) a b c d e f g h i j k http://www.dendroscope.org
  • 26. Reconstruction from triplets {trees} Exact methods to reconstruct level-1 and 2 networks (if there exist any) from a dense triplet set Jansson, Nguyen & Sung, SODA'05 : O(n3) pour niveau 1, van Iersel, Kelk & al, RECOMB'08 : O(n8) pour niveau 2, To & Habib, CPM'09 : O(n5k+4) pour niveau k T dense triplet set = {triplets} On any subset of 3 leaves, T contains at least one triplet Program Simplistic N' Yeast phylogenetic network - level-k Van Iersel et al. : Constructing level-2 phylogenetic network networks from triplets. RECOMB 2008 http://homepages.cwi.nl/~kelk/simplistic.html
  • 27. Reconstruction from triplets {trees} Fast heuristic method to reconstruct a level-1 network containing most of the input triplets Huber, van Iersel, Kelk & Suchecki, TCBB, 2011 Program Lev1athan {triplets} Phylogenetic network built from triplets extracted from 2 trees of HIV-1 strains Huber, van Iersel, Kelk & Suchecki A practical algorithm for reconstructing level-1 phylogenetic networks TCBB, 2011 N' level-1 network http://homepages.cwi.nl/~kelk/lev1athan/
  • 28. Outline • Phylogenetic networks • Motivations for the combinatorial reconstruction approach • Combinatorial reconstruction methods • Practical use • Illustrations • Perspectives
  • 29. Solution ambiguity Ambiguity of the results even with complete and correct data Many distinct minimal networks have exactly the same set of contained trees, triplets, and clusters. a c a c b b b c a Characterization for level-1 networks : The only ambiguous cases have such blobs (< 5 vertices) Gambette & Huber, 2011
  • 30. Solution ambiguity Ambiguity of the results even with complete and correct data Many distinct minimal networks have exactly the same set of contained trees, triplets, and clusters. a|bc a c a c b b b c a Characterization for level-1 networks : The only ambiguous cases have such blobs (< 5 vertices) Gambette & Huber, 2011
  • 31. Solution ambiguity Ambiguity of the results even with complete and correct data Many distinct minimal networks have exactly the same set of contained trees, triplets, and clusters. c|ab a c a c b b b c a Characterization for level-1 networks : The only ambiguous cases have such blobs (< 5 vertices) Gambette & Huber, 2011
  • 32. Solution ambiguity Ambiguity of the results even with complete and correct data Many distinct minimal networks have exactly the same set of contained trees, triplets, and clusters. x1 x1 x2 x2 b a b a 2 level-2 networks with exactly the same triplet set Gambette & Huber, 2011
  • 33. Solution ambiguity Ambiguity of the results even with complete and correct data Many distinct minimal networks have exactly the same set of contained trees, triplets, and clusters. x1 x1 x2 x2 b a b a 2 level-2 networks with exactly the same triplet set Even with complete and correct data, impossible to choose among the ambiguous configurations! Gambette & Huber, 2011
  • 34. Practical use existing methods still work to do! Conditions for use Available data Possible processings rooted trees unrooted trees Rooting with a reference species tree or topological constraints Single-copy gene trees MUL-trees (with duplicated MUL-tree processing genes) Scornavacca, Berry & Ranwez, 2009 Correct clusters and triplets noisy data Tree cleaning PhySIC_IST, 2008 Data filtering (clusters with high bootstrap value, present in >x% of the trees) Data editing : solution containing most of the input data Complete data (density for partial data, deleted genes Selection of a large number of trees triplet sets, complete clusters) with a large number of common species Selection of the maximal number of taxa with triplet density NP-complete problems
  • 35. Outline • Phylogenetic networks • Motivations for the combinatorial reconstruction approach • Combinatorial reconstruction methods • Practical use • Illustrations • Perspectives
  • 36. Illustrations 16 trees on 47 taxa from the HOGENOM database Lev1athan (proteobacteria) (heuristic 24 Enterobacteriales triplets, 2 Pasteurellales level-1) 1 Aeromonadales 24 sec. 9 Alteromonadales 1 Oceanospirillales 6 Rhodobacterales 4 Rhizobiales Networks containing triplets, softwired clusters, present in at least 20% of the trees Simplistic Dendroscope (triplets, level-7 (clusters, galled network) network) 63 sec. <1 sec.
  • 37. Illustrations 16 trees on 47 taxa from the HOGENOM database Dendroscope (proteobacteria) (clusters, 24 Enterobacteriales cluster 2 Pasteurellales network, 1 Aeromonadales level 1) 9 Alteromonadales <1 sec. 1 Oceanospirillales 6 Rhodobacterales 4 Rhizobiales Networks containing triplets, softwired clusters, present in at least 20% of the trees Dendroscope Dendroscope (clusters,galled (clusters, network) level-7 <1 sec. network) 2 sec.
  • 38. Illustrations 9 trees on 279 procaryote species Clusters in at least 2 trees Auch, Steigele, Huson & Henz, 2009 Dendroscope (clusters, galled network) 2 sec.
  • 39. Illustrations 23 trees, 45 species from the 3 domains of life Dendroscope clusters with 99% bootstrap confidence (galled network) 4 sec. Data from Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ: Universal trees based on large combined protein sequence data sets. Nat Genet 2001, 28:281--285
  • 40. Illustrations 23 trees, 45 species from the 3 domains of life Dendroscope clusters with 80% bootstrap confidence (galled present in at least 2 trees network) <1 sec. Data from Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ: Universal trees based on large combined protein sequence data sets. Nat Genet 2001, 28:281--285
  • 41. Illustrations 23 trees, 45 species from the 3 domains of life Dendroscope clusters with 80% bootstrap confidence (level-3 present in at least 2 trees network) <1 sec. Data from Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ: Universal trees based on large combined protein sequence data sets. Nat Genet 2001, 28:281--285
  • 42. Outline • Phylogenetic networks • Motivations for the combinatorial reconstruction approach • Combinatorial reconstruction methods • Practical use • Illustrations • Perspectives
  • 43. Perspectives Combinatorics : - Better knowledge of small level networks - Update of a network with new data - Unrooted explicit phylogenetic network reconstruction Bioinformatics : - Function of transferred genes (“transfer highways”) - Integration of combinatorial methods in a statistical framework Sequence data Combinatorial reconstruction of a set of candidates Construction of combinatorial data Choice among the candidates By statistical methods Phylogenetic network
  • 44. Thank you ! Coauthors: - Vincent Berry, Christophe Paul (LIRMM) - Daniel Huson, Regula Rupp (Tübingen) - Katharina Huber (East Anglia) Brown et al.'s data provided by Sophie Abby Réticulogramme des 25 mots les plus fréquents de ma thèse, construit par TreeCloud, SplitsTree et T-Rex Coloration : rouge au début, bleu à la fin http://www.treecloud.org