SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
TOPOLOGICAL DATA ANALYSIS
HJ vanVeen· Data Science· Nubank Brasil
TOPOLOGY I
• "When a truth is necessary, the reason for it can be
found by analysis, that is, by resolving it into simpler
ideas and truths until the primary ones are reached."
- Leibniz
TOPOLOGY II
• Topology is the mathematical study of topological
spaces.
• Topology is interested in shapes,
• More specifically: the concept of 'connectedness'
TOPOLOGY III
• A topologist is someone who does not see the
difference between a coffee mug and a donut.











HISTORY I
• “Nothing at all takes place in the universe in which
some rule of maximum or minimum does not
appear.” - Euler
• Seven Bridges of Koningsbrucke: devise a walk
through the city that would cross each bridge
once and only once.
HISTORY II
HISTORY III
• Euler's big insights:
• Doesn’t matter where you start walking, only matters which bridges
you cross.
• A similar solution should be found, regardless where you start your
walk.
• only the connectedness of bridges matter,
• a solution should also apply to all other bridges that are connected
in a similar fashion, no matter the distances between them.
HISTORY IV
• We now call these graph walks ‘Eulerian walks’ in
Euler’s honor.
• Euler's first proven graph theory theorem:
• 'Euler walks' are possible if exactly zero or two nodes
have an odd number of edges.
TDA I
• TDA marries 300-year old maths with
modern data analysis.
• Captures the shape of data
• Is invariant
• Compresses large datasets
• Functions well in the presence of noise / missing variables
TDA II
• Capturing the shape of data





























•Traditional techniques like clustering or dimensionality reduction have
trouble capturing this shape.

TDA III
• Invariance.









• Euler showed that only connectedness matters.The size, position, or
pose of an object doesn't change that object.
TDA IV
• Compression.
• Compressed representations use 

the order in data.
• Only order can be compressed.
• Random noise or slight variations 

are ignored.
• Lossy compression retains the most

important features.
• "Now where there are no parts, there neither extension, nor shape, nor divisibility is possible.
And these monads are the true atoms of nature and, in a word, the elements of things." - Leibniz
MAPPER I
• Mapper was created by Ayasdi Co-founder
Gurjeet Singh during his PhD under Gunnar
Carlsson.
• Based on the idea of partial clustering of the data
guided by a set of functions defined on the data.
MAPPER II
• Mapper was inspired by the Reeb Graph.













MAPPER III
• Map the data with overlapping intervals.
• Cluster the points inside the intervals
• When clusters share data points draw an edge
• Color nodes by function
MAPPER IV
MAPPERV
Distance_to_median(row) x y z
1.5 1.5 1.5 1.5
1.5 -0.5 -0.5 -0.5
0 1 1 1
0 1 0.9 1.1
3 2 2 2
3 2.1 1.9 2
Y
MAPPERVI
• In conclusion:
FUNCTIONS
• Raw features or point-cloud axis / coordinates
• Statistics: Mean, Max, Skewness, etc.
• Mathematics: L2-norm, FourierTransform, etc.
• Machine Learning: t-SNE, PCA, out-of-fold preds
• Deep Learning: Layer activations, embeddings
CLUSTER ALGO’S
• DBSCAN / HDBSCAN:
• Handles noise well.
• No need to set number of clusters.
• K-Means:
• Creates visually nice simplicial complexes/graphs
SOME GENERAL USE CASES
• ComputerVision
• Model and feature inspection
• Computational Biology / Healthcare
• Persistent Homology
COMPUTERVISION
• Demo













MODEL AND FEATURE
INSPECTION
• Demo













COMPUTATIONAL BIOLOGY
• Example













PERSISTENT HOMOLOGY
• Example













SOME FINANCE USE CASES
• Customer Segmentation
• Transactional Fraud
• Accurate Interpretable Models
• Exploration / Analysis
CUSTOMER SEGMENTATION
• Demo













TRANSACTIONAL FRAUD
• Example of spousal fraud













ACCURATE INTERPRETABLE
MODELS
• Create: global linear model
• Function: L2-norm
• Color: Heatmap by ground truth and animate to out-of-fold model predictions
• Identify: Low accuracy sub graphs
• Select: Features that are most important for sub graphs
• Create: Local linear models on sub graphs
• Stack: DecisionTree
• Compare: Divide-and-Conquer and LIME
• DEMO
EXPLORATION / ANALYSIS
• Demo













QUESTIONS?
FURTHER READING
• Google terms:
• Ayasdi,Topological Data Analysis, Robert Ghrist, Gurjeet Singh, Gunnar Carlsson,
Anthony Bak,Allison Gilmore, Simplicial Complex, Python Mapper.
• Videos:
• https://www.youtube.com/watch?v=4RNpuZydlKY
• https://www.youtube.com/watch?v=x3Hl85OBuc0
• https://www.youtube.com/watch?v=cJ8W0ASsnp0
• https://www.youtube.com/watch?v=kctyag2Xi8o

Contenu connexe

Tendances

Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
Krishnaram Kenthapadi
 

Tendances (20)

Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
CCS2019-opological time-series analysis with delay-variant embedding
CCS2019-opological time-series analysis with delay-variant embeddingCCS2019-opological time-series analysis with delay-variant embedding
CCS2019-opological time-series analysis with delay-variant embedding
 
Tensor representations in signal processing and machine learning (tutorial ta...
Tensor representations in signal processing and machine learning (tutorial ta...Tensor representations in signal processing and machine learning (tutorial ta...
Tensor representations in signal processing and machine learning (tutorial ta...
 
How Does Math Matter in Data Science
How Does Math Matter in Data ScienceHow Does Math Matter in Data Science
How Does Math Matter in Data Science
 
013_20160328_Topological_Measurement_Of_Protein_Compressibility
013_20160328_Topological_Measurement_Of_Protein_Compressibility013_20160328_Topological_Measurement_Of_Protein_Compressibility
013_20160328_Topological_Measurement_Of_Protein_Compressibility
 
CatBoost intro
CatBoost   introCatBoost   intro
CatBoost intro
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
 
How to become a data scientist in 6 months
How to become a data scientist in 6 monthsHow to become a data scientist in 6 months
How to become a data scientist in 6 months
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Pca(principal components analysis)
Pca(principal components analysis)Pca(principal components analysis)
Pca(principal components analysis)
 
Exploratory Data Analysis using Python
Exploratory Data Analysis using PythonExploratory Data Analysis using Python
Exploratory Data Analysis using Python
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
GAN - Generative Adversarial Nets
GAN - Generative Adversarial NetsGAN - Generative Adversarial Nets
GAN - Generative Adversarial Nets
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Supervised Machine Learning Techniques
Supervised Machine Learning TechniquesSupervised Machine Learning Techniques
Supervised Machine Learning Techniques
 

En vedette

No-Bullshit Data Science
No-Bullshit Data ScienceNo-Bullshit Data Science
No-Bullshit Data Science
Domino Data Lab
 
Druva Casestudy - HackerEarth
Druva Casestudy - HackerEarthDruva Casestudy - HackerEarth
Druva Casestudy - HackerEarth
HackerEarth
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
Antti Haapala
 
Leverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingLeverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and Recruiting
HackerEarth
 

En vedette (20)

No-Bullshit Data Science
No-Bullshit Data ScienceNo-Bullshit Data Science
No-Bullshit Data Science
 
Intra company hackathons using HackerEarth
Intra company hackathons using HackerEarthIntra company hackathons using HackerEarth
Intra company hackathons using HackerEarth
 
USC LIGHT Ministry Introduction
USC LIGHT Ministry IntroductionUSC LIGHT Ministry Introduction
USC LIGHT Ministry Introduction
 
Kill the wabbit
Kill the wabbitKill the wabbit
Kill the wabbit
 
Work - LIGHT Ministry
Work - LIGHT MinistryWork - LIGHT Ministry
Work - LIGHT Ministry
 
Open Innovation - A Case Study
Open Innovation - A Case StudyOpen Innovation - A Case Study
Open Innovation - A Case Study
 
Druva Casestudy - HackerEarth
Druva Casestudy - HackerEarthDruva Casestudy - HackerEarth
Druva Casestudy - HackerEarth
 
Menstrual Health Reader - mEo
Menstrual Health Reader - mEoMenstrual Health Reader - mEo
Menstrual Health Reader - mEo
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
 
Leverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingLeverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and Recruiting
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine Learning
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
 
DataRobot R Package
DataRobot R PackageDataRobot R Package
DataRobot R Package
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
 
6 rules of enterprise innovation
6 rules of enterprise innovation6 rules of enterprise innovation
6 rules of enterprise innovation
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
State of women in technical workforce
State of women in technical workforceState of women in technical workforce
State of women in technical workforce
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
 
How hackathons can drive top line revenue growth
How hackathons can drive top line revenue growthHow hackathons can drive top line revenue growth
How hackathons can drive top line revenue growth
 
Ethics in Data Science and Machine Learning
Ethics in Data Science and Machine LearningEthics in Data Science and Machine Learning
Ethics in Data Science and Machine Learning
 

Similaire à Tda presentation

splaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNN
splaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNNsplaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNN
splaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNN
ratnapatil14
 

Similaire à Tda presentation (20)

L5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature EngineeringL5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature Engineering
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
Fractals
FractalsFractals
Fractals
 
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
Social Network Analysis - Lecture 4 in Introduction to Computational Social S...
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
Talk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and ApplicationsTalk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and Applications
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
 
ODSC India 2018: Topological space creation & Clustering at BigData scale
ODSC India 2018: Topological space creation & Clustering at BigData scaleODSC India 2018: Topological space creation & Clustering at BigData scale
ODSC India 2018: Topological space creation & Clustering at BigData scale
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
 
Deductive databases
Deductive databasesDeductive databases
Deductive databases
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
Radcliffe
RadcliffeRadcliffe
Radcliffe
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
 
Exploratory Data Analysis week 4
Exploratory Data Analysis week 4Exploratory Data Analysis week 4
Exploratory Data Analysis week 4
 
splaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNN
splaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNNsplaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNN
splaytree-171227043127.pptx NNNNNNNNNNNNNNNNNNNNNNN
 
Splay tree
Splay treeSplay tree
Splay tree
 
Exploring Data (1).pptx
Exploring Data (1).pptxExploring Data (1).pptx
Exploring Data (1).pptx
 
Simple math for anomaly detection toufic boubez - metafor software - monito...
Simple math for anomaly detection   toufic boubez - metafor software - monito...Simple math for anomaly detection   toufic boubez - metafor software - monito...
Simple math for anomaly detection toufic boubez - metafor software - monito...
 
NoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityNoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativity
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data Preparation
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Tda presentation

  • 1. TOPOLOGICAL DATA ANALYSIS HJ vanVeen· Data Science· Nubank Brasil
  • 2. TOPOLOGY I • "When a truth is necessary, the reason for it can be found by analysis, that is, by resolving it into simpler ideas and truths until the primary ones are reached." - Leibniz
  • 3. TOPOLOGY II • Topology is the mathematical study of topological spaces. • Topology is interested in shapes, • More specifically: the concept of 'connectedness'
  • 4. TOPOLOGY III • A topologist is someone who does not see the difference between a coffee mug and a donut.
 
 
 
 
 

  • 5. HISTORY I • “Nothing at all takes place in the universe in which some rule of maximum or minimum does not appear.” - Euler • Seven Bridges of Koningsbrucke: devise a walk through the city that would cross each bridge once and only once.
  • 7. HISTORY III • Euler's big insights: • Doesn’t matter where you start walking, only matters which bridges you cross. • A similar solution should be found, regardless where you start your walk. • only the connectedness of bridges matter, • a solution should also apply to all other bridges that are connected in a similar fashion, no matter the distances between them.
  • 8. HISTORY IV • We now call these graph walks ‘Eulerian walks’ in Euler’s honor. • Euler's first proven graph theory theorem: • 'Euler walks' are possible if exactly zero or two nodes have an odd number of edges.
  • 9. TDA I • TDA marries 300-year old maths with modern data analysis. • Captures the shape of data • Is invariant • Compresses large datasets • Functions well in the presence of noise / missing variables
  • 10. TDA II • Capturing the shape of data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 •Traditional techniques like clustering or dimensionality reduction have trouble capturing this shape.

  • 11. TDA III • Invariance.
 
 
 
 
 • Euler showed that only connectedness matters.The size, position, or pose of an object doesn't change that object.
  • 12. TDA IV • Compression. • Compressed representations use 
 the order in data. • Only order can be compressed. • Random noise or slight variations 
 are ignored. • Lossy compression retains the most
 important features. • "Now where there are no parts, there neither extension, nor shape, nor divisibility is possible. And these monads are the true atoms of nature and, in a word, the elements of things." - Leibniz
  • 13. MAPPER I • Mapper was created by Ayasdi Co-founder Gurjeet Singh during his PhD under Gunnar Carlsson. • Based on the idea of partial clustering of the data guided by a set of functions defined on the data.
  • 14. MAPPER II • Mapper was inspired by the Reeb Graph.
 
 
 
 
 
 

  • 15. MAPPER III • Map the data with overlapping intervals. • Cluster the points inside the intervals • When clusters share data points draw an edge • Color nodes by function
  • 17. MAPPERV Distance_to_median(row) x y z 1.5 1.5 1.5 1.5 1.5 -0.5 -0.5 -0.5 0 1 1 1 0 1 0.9 1.1 3 2 2 2 3 2.1 1.9 2 Y
  • 19. FUNCTIONS • Raw features or point-cloud axis / coordinates • Statistics: Mean, Max, Skewness, etc. • Mathematics: L2-norm, FourierTransform, etc. • Machine Learning: t-SNE, PCA, out-of-fold preds • Deep Learning: Layer activations, embeddings
  • 20. CLUSTER ALGO’S • DBSCAN / HDBSCAN: • Handles noise well. • No need to set number of clusters. • K-Means: • Creates visually nice simplicial complexes/graphs
  • 21. SOME GENERAL USE CASES • ComputerVision • Model and feature inspection • Computational Biology / Healthcare • Persistent Homology
  • 23. MODEL AND FEATURE INSPECTION • Demo
 
 
 
 
 
 

  • 26. SOME FINANCE USE CASES • Customer Segmentation • Transactional Fraud • Accurate Interpretable Models • Exploration / Analysis
  • 28. TRANSACTIONAL FRAUD • Example of spousal fraud
 
 
 
 
 
 

  • 29. ACCURATE INTERPRETABLE MODELS • Create: global linear model • Function: L2-norm • Color: Heatmap by ground truth and animate to out-of-fold model predictions • Identify: Low accuracy sub graphs • Select: Features that are most important for sub graphs • Create: Local linear models on sub graphs • Stack: DecisionTree • Compare: Divide-and-Conquer and LIME • DEMO
  • 30. EXPLORATION / ANALYSIS • Demo
 
 
 
 
 
 

  • 32. FURTHER READING • Google terms: • Ayasdi,Topological Data Analysis, Robert Ghrist, Gurjeet Singh, Gunnar Carlsson, Anthony Bak,Allison Gilmore, Simplicial Complex, Python Mapper. • Videos: • https://www.youtube.com/watch?v=4RNpuZydlKY • https://www.youtube.com/watch?v=x3Hl85OBuc0 • https://www.youtube.com/watch?v=cJ8W0ASsnp0 • https://www.youtube.com/watch?v=kctyag2Xi8o