Sandhya Prabhakaran - A Bayesian Approach To Model Overlapping Objects Available As Distance Data
1. A Bayesian Approach to Model Overlapping
Objects Available as Distance Data
Sandhya Prabhakaran1
and Julia E. Vogt2,3
Memorial Sloan Kettering Cancer Centre, NYC
1
University of Basel
2
Swiss Institute of Bioinformatics
3
MLconf, NYC
29th March 2019
2. Two religions in Machine Learning
Frequentists
(https://medium.com/datadriveninvestor/bayesian-vs-frequentist-for-dummies-58ce230c3796)
3. Two religions in Machine Learning
Frequentists Bayesians
(https://medium.com/datadriveninvestor/bayesian-vs-frequentist-for-dummies-58ce230c3796)
4. Two religions in Machine Learning
● A coin toss example: 10 heads in 10 tosses (= data given)
● Frequentists:
○ Probability is a Point estimate
○ What is the relative frequency of tails = no answer
5. Two religions in Machine Learning
● A coin toss example: 10 heads in 10 tosses (= data given)
● Frequentists:
○ Probability is a Point estimate
○ What is the relative frequency of tails = no answer
● Bayesians:
○ Probability is a distribution
○ What is the relative frequency of tails = 0.5
6. Two religions in Machine Learning
● A coin toss example: 10 heads in 10 tosses (= data given)
● Frequentists:
○ Probability is a Point estimate
○ What is the relative frequency of tails = no answer
● Bayesians:
○ Probability is a distribution
○ What is the relative frequency of tails = 0.5
○ A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly
believes he has seen a mule.
○ More flexible: inference, thinking, planning and reasoning (downstream analyses)
20. POCD: Overlap Clustering for distance data
● Bayesian clustering model
● Given pairwise D, we infer Z (the cluster assignment matrix)
21. POCD: Overlap Clustering for distance data
Z
● Binary matrix
● Cluster assignment
matrix
● Needs to be inferred
22. POCD: Overlap Clustering for distance data
● Bayesian clustering model
● Given pairwise D, we infer Z:
p(Z|D,.) ∝ p(D|Z) p(Z)
(posterior) (likelihood) (prior)
23. POCD: Overlap Clustering for distance data
p(Z|D,.) ∝ p(D|Z) p(Z)
(prior)(posterior) (likelihood)
24. POCD: Overlap Clustering for distance data
Prior over Z: Indian Buffet process
● As k → infinity, we arrive at the IBP
● No need to fix the number of clusters
p(Z|D,.) ∝ p(D|Z) p(Z)
(prior)(posterior) (likelihood)
25. POCD: Overlap Clustering for distance data
Invariant Likelihood: generalised Wishart
● Translation and rotation invariant
p(Z|D,.) ∝ p(D|Z) p(Z)
(prior)(posterior) (likelihood)
26. POCD: Overlap Clustering for distance data
Inference using Metropolis Hastings
● MCMC algorithm
● Used in models deploying the IBP
● Asymptotically exact
approximations of the posterior
● We need to infer Z and #clusters
p(Z|D,.) ∝ p(D|Z) p(Z)
(prior)(posterior) (likelihood)
27. POCD: Overlap Clustering for distance data
Inference using Metropolis Hastings
● MCMC algorithm
● Used in models deploying the IBP
● Asymptotically exact
approximations of the posterior
● We need to infer Z and #clusters
p(Z|D,.) ∝ p(D|Z) p(Z)
(prior)(posterior) (likelihood)
28. POCD: Overlap Clustering for distance data
Clustering protein contact maps from HIV Protease inhibitors (PIs)
● Of the 26 FDA approved anti-HIV drugs:
○ 10 are PIs
● The PIs exhibit similar behaviour
○ Similar chemical structure
● Not readily available
https://www.sciencedirect.com/science/article/pii/S0165614711001398
29. POCD: Overlap Clustering for distance data
Clustering protein contact maps from HIV Protease inhibitors (PIs)
● Necessary to identify alternative PIs for therapy
○ What are the structural dissimilarities amongst PIs?
30. POCD: Overlap Clustering for distance data
Clustering protein contact maps from HIV Protease inhibitors (PIs)
● Necessary to identify alternative PIs for therapy
○ What are the structural dissimilarities amongst PIs?
● Use Protein Contact Maps of each PI
○ Distances between all AA residue pairs for a protein
○ Row-wise vectorise the contact map
○ Compute the Normalised Information distance
33. Reading material
● A tutorial on Bayesian nonparametric models:
http://gershmanlab.webfactional.com/pubs/GershmanBlei12.pdf
● Leo Breiman: ‘Statistical Modeling: The Two Cultures’:
https://projecteuclid.org/download/pdf_1/euclid.ss/1009213726
● An abstract of this work as Spotlight at the Bayesian Nonparametrics Workshop at NeurIPS 2018:
https://drive.google.com/file/d/1ExVpeUomv8Z4mPMu5as_CbmrHjVY0IDV/view
● Tutorials on latest Deep learning papers: https://www.depthfirstlearning.com/ ( @DepthFirstLearn)