WiDS Alexandria, Egypt workshop in topological data analysis (Python and R code available on request), covering persistent homology, the Mapper algorithm, and discrete Ricci curvature. Examples include text data and social network data.
1. T O P O L O G I C A L
D ATA A N A LY S I S
C O L L E E N M . F A R R E L L Y ,
D A T A S E M B L Y
2. W H Y
T O P O L O G I C A L
D ATA A N A LY S I S ?
• Autocorrelations/dynamic systems (time
series, spatiotemporal data)
• Wide data (-omics data)
• Small data (pilot studies, rare diseases…)
• Visualization-heavy needs for
comparisons/groups (especially high-
dimensional data)
• Data that breaks assumptions of machine
learning algorithms/statistical models
3. E X A M P L E S
O F T D A
T O O L S
Persistent homology
Mapper algorithm
Homotopy continuation
Morse functions/clustering/regression
Euler calculus
Discrete exterior calculus
Ricci curvature
Mappings to Teichmüller space
4. P E R S I S T E N T
H O M O L O G Y
C O M P A R I N G G R O U P S A N D E X T E N D I N G
H I E R A R C H I C A L C L U S T E R I N G
5. P O I N T C L O U D S
A N D D I S TA N C E
M E T R I C S
7. H O M O L O G Y O V E R V I E W : B E T T I
N U M B E R S
(1,0,0…) (1,1,0…) (1,0,1…)
8. F I LT R AT I O N S
A N D
P E R S I S T E N C
E
• Filter distances or objects to
obtain a series of topological
objects (graphs, simplicial
complexes…)
• Compute a series of metrics
or summary statistics over
filtrations
• Track how metrics/statistics
9. A L G O R I T H M D E TA I L S
Rips filtration
• Pairwise intersections
of ɛ-balls centered at a
given point in the point
cloud or distance
matrix
Dimension parameter
• Number of Betti
numbers to compute
(usually set to a
dimension of 0 or 1)
Diagram
parameters/distance
computation parameters
• Optional visualization
or statistical testing
functions after using
ripser()
10. I M P L E M E N TAT I O N I N P Y T H O N O R R
• TDAstats
• TDAverse
R packages
• Scikit-TDA
• Ripser/persim
• Giotto-TDA
Python packages
11. E X A M P L E
A N A L Y S I S :
P R O B L E M / D A T
A
Small set of BERT-
embedded poems that are
either humorous or serious
in tone
Want to understand if there
are significant differences in
BERT features between the
two sets of poems
12. M A P P E R
C L U S T E R I N G A N D D A T A M I N I N G
13. M O R S E
F U N C T I O N S
: H E I G H T
F U N C T I O N S
A N D
C R I T I C A L
P O I N T S
15. A L G O R I T H M D E TA I L S
Project Data
• Takes input
data and
projects to
custom
embeddings
(3-
dimensional
space, knn
distances…)
Create Cover
• Percent of
overlap
across
covers and
number of
covers
(different
results with
different
parameters)
Cluster
• DBSCAN or
other
clusterers
available in
scikit-learn
Save Model
• Save output
and details
to a
webpage
(path_html)
16. I M P L E M E N TAT I O N I N P Y T H O N O R
R
• TDAmapper
R packages
• Kepler-Mapper (part of Scikit)
• Giotto-TDA
• tmap
Python packages
17. E X A M P L E A N A LY S I S :
P R O B L E M / D ATA
Small set of BERT-embedded poems that
are either humorous or serious in tone
Want to cluster poems to understand the
existence of subgroups
18. R I C C I C U R VAT U R E
F I N D I N G K E Y P I E C E S O F A S O C I A L
N E T W O R K
19. R I C C I
C U R VAT U R E
Negative
Zero
Positive
20. P O W E R / D I S E A S E
N E T W O R K B A C K B O N E S
21. A L G O R I T H M D E TA I L S
Calculate Curvature
on Edges
• Examine vertices
and their adjacent
edges to see how
much “pull” there is
on an edge
Calculate Curvature
on Vertices
• Sum up edge
weights around a
vertex to find out
how much “stuff” is
weighing it down
22. I M P L E M E N TAT I O N I N P Y T H O N O R
R
• Custom in igraph
R packages
• Custom in igraph
• Custom in networkx
Python packages
23. E X A M P L E A N A LY S I S :
P R O B L E M / D ATA
Town network representing a supply
chain (medical, food, electricity…)
Want to understand vulnerabilities
that exist within the network