Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Elizabeth Munch SOED 2016

811 vues

Publié le

There is a great deal of recent excitement around the idea of finding shape in data. The relatively young field of topological data analysis (TDA) provides tools which can quantify, investigate, and utilize shape in data to understand something about the domain from which the data was obtained. These methods have been successfully used in many fields, including atmospheric science, time series analysis, and genetics to provide deep insights. However, what does it really mean for data to have shape? In this talk, we will look at some common tools used in TDA such as persistence diagrams, Reeb graphs, and mapper, and ideas for how different kinds of data can fit into the TDA pipeline.

Publié dans : Formation
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Elizabeth Munch SOED 2016

  1. 1. What does it mean for data to have shape? Elizabeth Munch University at Albany – SUNY:: Dept. of Mathematics & Statistics Apr 7, 2016 Liz Munch (UAlbany) TDA Apr 7, 2016 1 / 24
  2. 2. What does it mean for data to have shape? Elizabeth Munch Data Point University at Albany – SUNY:: Dept. of Mathematics & Statistics Apr 7, 2016 Liz Munch (UAlbany) TDA Apr 7, 2016 1 / 24
  3. 3. (-0.02,-1.62) (-1.38,-0.93) (1.22,1.55) (-0.71,-1.48) (-0.17,-0.99) (0.25,-1.19) (-0.48,-1.71) (1.21,1.06) (-0.4,-1.73) (0.21,-1.87) (-0.09,1.23) (-0.95,0.33) (1.07,0.22) (1.87,-0.17) (-1.69,0.06) (-0.76,-0.9) (0.38,1.49) (-0.22,-1.31) (0.67,-1.58) (1.39,1.13) (-1.07,1.2) (1.26,1.02) (0.63,-1.01) (-1.13,0.37) (0.82,1.26) (0.92,0.46) (0.27,-1.22) (1.24,-1.56) (-1.38,1.0) (1.43,0.98) (-0.96,0.98) (1.77,-0.08) (-0.27,1.64) (1.48,1.2) (1.08,1.3) (-1.16,-0.3) (-1.29,1.5) (-0.14,-1.93) (0.32,1.78) (-1.5,0.72) (-1.28,-0.63) (0.03,1.1) (1.57,-1.05) (-1.5,-0.34) (-0.22,-1.53) (0.39,-1.59) (-1.81,0.59) (-0.38,-1.63) (-0.69,1.62) (-0.5,1.25) (-1.71,-1.03) (1.1,-0.11) (-0.02,-1.48) (-1.3,-0.25) (-1.37,0.84) (-0.88,-1.39) (-0.38,-1.77) (0.0,1.72) (-0.61,1.75) (0.15,1.74) (-0.11,-1.55) (-1.53,0.2) (-0.96,0.43) (-0.87,0.79) (-0.36,1.03) (1.59,0.15) (-0.13,1.18) (1.21,-0.35) (1.18,-0.85) (-1.2,1.27) (-1.43,-0.91) (-1.44,-0.06) (-1.86,-0.55) (0.5,-1.24) (-1.78,-0.07) (0.48,-1.22) (-0.43,1.02) (1.37,-0.91) (-1.59,0.98) (1.15,-0.1) (-1.59,-0.6) (0.09,1.25) (0.32,1.53) (0.89,-1.43) (1.15,-1.22) (0.29,1.84) (-0.4,1.61) (-1.57,-1.07) (-0.29,-1.55) (1.42,-0.99) (0.86,-1.81) (1.43,-1.15) (-0.53,1.65) (-1.18,-0.72) (-0.59,1.22) (-1.22,-0.61) (0.19,-1.26) (1.82,-0.84) (-0.06,1.36) (-1.27,0.59) Liz Munch (UAlbany) TDA Apr 7, 2016 2 / 24
  4. 4. Liz Munch (UAlbany) TDA Apr 7, 2016 2 / 24
  5. 5. Large Data Sets Main goal of Topological Data Analysis (TDA) Find and quantify structure in big data. Liz Munch (UAlbany) TDA Apr 7, 2016 3 / 24
  6. 6. Large Data Sets Main goal of Topological Data Analysis (TDA) Find and quantify structure in big data. Goals of this talk What tools are available? How do we fit educational data into this pipeline? Liz Munch (UAlbany) TDA Apr 7, 2016 3 / 24
  7. 7. Large Data Sets Main goal of Topological Data Analysis (TDA) Find and quantify structure in big data. Goals of this talk What tools are available? How do we fit educational data into this pipeline? Spoiler alert: I don’t know how to do this.... Liz Munch (UAlbany) TDA Apr 7, 2016 3 / 24
  8. 8. 1 Persistent Homology 2 Reeb graphs and Mapper Liz Munch (UAlbany) TDA Apr 7, 2016 4 / 24
  9. 9. 1 Persistent Homology 2 Reeb graphs and Mapper Liz Munch (UAlbany) TDA Apr 7, 2016 4 / 24
  10. 10. What does it mean for data to have shape? Topology = Topography Mathematical study of spaces preserved under continuous deformations stretching and bending not tearing or gluing Study of the shape and features of the surface of the Earth Liz Munch (UAlbany) TDA Apr 7, 2016 5 / 24
  11. 11. History Leonhard Euler (1707-1783) Euler CharacteristicImages: Wikipedia Liz Munch (UAlbany) TDA Apr 7, 2016 6 / 24
  12. 12. History Pt 2 Esoteric field of study 1700-2000 Algebraic topology Applications/intersections with dynamical systems Would never be considered “applied” in traditional sense. Liz Munch (UAlbany) TDA Apr 7, 2016 7 / 24
  13. 13. History Pt 2 Esoteric field of study 1700-2000 Algebraic topology Applications/intersections with dynamical systems Would never be considered “applied” in traditional sense. Topology, the pinnacle of human thought. In four centuries it may be useful. - Alexander Solzhenitzin, “The First Circle” 1968 Liz Munch (UAlbany) TDA Apr 7, 2016 7 / 24
  14. 14. History Pt 2 Esoteric field of study 1700-2000 Algebraic topology Applications/intersections with dynamical systems Would never be considered “applied” in traditional sense. Topology, the pinnacle of human thought. In four centuries it may be useful. - Alexander Solzhenitzin, “The First Circle” 1968 Things change ca.2000 Introduction of Persistent Homology Liz Munch (UAlbany) TDA Apr 7, 2016 7 / 24
  15. 15. Main questions How do we quantify the structure we see? Can we calculate something to represent the structure? Liz Munch (UAlbany) TDA Apr 7, 2016 8 / 24
  16. 16. Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
  17. 17. Very small radius is just dots. Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
  18. 18. Very small radius is just dots. Very large radius is just a blob. Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
  19. 19. Very small radius is just dots. Very large radius is just a blob. Some range of radii lets us see the big circle. Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
  20. 20. Very small radius is just dots. Very large radius is just a blob. Some range of radii lets us see the big circle. Some small circles appear and disappear quickly.... maybe we get to just call these noise! Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
  21. 21. Very small radius is just dots. Very large radius is just a blob. Some range of radii lets us see the big circle. Some small circles appear and disappear quickly.... maybe we get to just call these noise! How do we quantify this? Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
  22. 22. Homology & Persistent Homology What is Homology? A topological invariant which assigns a sequence of vector spaces, Hk(X), to a given topological space X. Liz Munch (UAlbany) TDA Apr 7, 2016 10 / 24
  23. 23. Homology & Persistent Homology What is Homology? A topological invariant which assigns a sequence of vector spaces, Hk(X), to a given topological space X. Liz Munch (UAlbany) TDA Apr 7, 2016 10 / 24
  24. 24. Homology & Persistent Homology What is Homology? A topological invariant which assigns a sequence of vector spaces, Hk(X), to a given topological space X. What is Persistent Homology? A way to watch how the homology of a filtration (sequence) of topological spaces changes so that we can understand something about the space. Liz Munch (UAlbany) TDA Apr 7, 2016 10 / 24
  25. 25. Understanding a persistence diagram Liz Munch (UAlbany) TDA Apr 7, 2016 11 / 24
  26. 26. Circles are useful when you least expect it. Liz Munch (UAlbany) TDA Apr 7, 2016 12 / 24
  27. 27. Circles are useful when you least expect it. Caveat: Persistence does more than circles.... Liz Munch (UAlbany) TDA Apr 7, 2016 12 / 24
  28. 28. Machining Dynamics Workpiece Stable feed Unstable Images: Firas Khasawneh, SUNY Polytechnic Institute; and Boeing. Liz Munch (UAlbany) TDA Apr 7, 2016 13 / 24
  29. 29. Chatter Liz Munch (UAlbany) TDA Apr 7, 2016 14 / 24
  30. 30. Delay embedding Definition Given a time series X(t), the delay embedding is ψm η : t −→ (X(t), X(t + η), · · · , X(t + (m − 1)η)). Liz Munch (UAlbany) TDA Apr 7, 2016 15 / 24
  31. 31. Differentiation by Max Persistence 100 120 140 160 180 200 220 240 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Signal, [0.9, 0.07] −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Y(t) −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Y(t+2.13) Takens Embedding, [0.9, 0.07] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Birth Radius 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 DeathRadius Persistence Diagram, [0.9, 0.07] 70 80 90 100 110 120 130 140 150 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Signal, [1.42, 0.05] −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Y(t) −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Y(t+1.62) Takens Embedding, [1.42, 0.05] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Birth Radius 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 DeathRadius Persistence Diagram, [1.42, 0.05] 60 70 80 90 100 110 120 130 140 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Signal, [1.48, 0.25] −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Y(t) −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Y(t+1.56) Takens Embedding, [1.48, 0.25] 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Birth Radius 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 DeathRadius Persistence Diagram, [1.48, 0.25] Liz Munch (UAlbany) TDA Apr 7, 2016 16 / 24
  32. 32. Turning Model 0.5 1 1.5 2 2.5 3 0 0.05 0.1 0.15 0.2 0.25 Liz Munch (UAlbany) TDA Apr 7, 2016 17 / 24
  33. 33. Turning Model Results Warm colors ⇒ High max persistence ⇒ Chatter Cool colors ⇒ Low max persistence ⇒ No Chatter Combination with Machine Learning Methods ⇒ 97% classification accuracy Liz Munch (UAlbany) TDA Apr 7, 2016 17 / 24
  34. 34. 1 Persistent Homology 2 Reeb graphs and Mapper Liz Munch (UAlbany) TDA Apr 7, 2016 18 / 24
  35. 35. Clustering Liz Munch (UAlbany) TDA Apr 7, 2016 19 / 24
  36. 36. 1-Dimensional Structure Liz Munch (UAlbany) TDA Apr 7, 2016 20 / 24
  37. 37. 1-Dimensional Structure Liz Munch (UAlbany) TDA Apr 7, 2016 20 / 24
  38. 38. Original Reeb Graph construction Liz Munch (UAlbany) TDA Apr 7, 2016 21 / 24
  39. 39. Original Reeb Graph construction Liz Munch (UAlbany) TDA Apr 7, 2016 21 / 24
  40. 40. Mapper Image: Nicolau Levine Carlsson, PNAS 2011 Liz Munch (UAlbany) TDA Apr 7, 2016 22 / 24
  41. 41. Mapper Breast cancer gene expression data Determine a good filter function Run mapper Found new type of breast cancer (c-MYB+) with high survival rate Image: Nicolau Levine Carlsson, PNAS 2011 Liz Munch (UAlbany) TDA Apr 7, 2016 22 / 24
  42. 42. Mapper Image: Nicolau Levine Carlsson, PNAS 2011 Liz Munch (UAlbany) TDA Apr 7, 2016 22 / 24
  43. 43. Conclusions Topology can help find structure in data that is not obvious by other means. Liz Munch (UAlbany) TDA Apr 7, 2016 23 / 24
  44. 44. Conclusions Topology can help find structure in data that is not obvious by other means. Lots of tools available, lots of open-source code for computation! Mapper, Reeb graph, Contour Tree, Merge tree Python mapper - danifold.net/mapper/ Persistence Perseus - sas.upenn.edu/~vnanda/perseus/ Dionysus - mrzv.org/software/dionysus/ R TDA - cran.r-project.org/web/packages/TDA/ PHAT - bitbucket.org/phat-code/phat Liz Munch (UAlbany) TDA Apr 7, 2016 23 / 24
  45. 45. Conclusions Topology can help find structure in data that is not obvious by other means. Lots of tools available, lots of open-source code for computation! Mapper, Reeb graph, Contour Tree, Merge tree Python mapper - danifold.net/mapper/ Persistence Perseus - sas.upenn.edu/~vnanda/perseus/ Dionysus - mrzv.org/software/dionysus/ R TDA - cran.r-project.org/web/packages/TDA/ PHAT - bitbucket.org/phat-code/phat Input from domain scientists is imperative! What is the right question? What is the right tool? How do we interpret the output? Liz Munch (UAlbany) TDA Apr 7, 2016 23 / 24
  46. 46. Thank you! Collaborators Jos´e Perea (MSU) Firas Khasawneh (SUNY Poly) emunch@albany.edu www.elizabethmunch.com Liz Munch (UAlbany) TDA Apr 7, 2016 24 / 24

×