Despite widespread adoption and success most machine learning models remain black boxes. Many times users and practitioners are asked to implicitly trust the results. However understanding the reasons behind predictions is critical in assessing trust, which is fundamental if one is asked to take action based on such models, or even to compare two similar models. In this talk I will (1.) formulate the notion of interpretability of models, (2.) provide a review of various attempts and research initiatives to solve this very important problem and (3.) demonstrate real industry use-cases and results focusing primarily on Deep Neural Networks.
14. Recent research on understanding DNNs
Why does deep and cheap learning work so we ?
(Henry W. Lin (Harvard), Max Tegmark (MIT), David Rolnick (MIT))
• Physics centric theory
Understanding deep learning requires rethinking generalization
(Chiyuan Zhang, Samy Bengio, Maritz Hardt, Benjamin Recht, Oriol Vinyals (Google Brain and Deepmind))
• Revisits learning theory, esp generalization bounds in empirical risk minimization
Opening the Black Box of Deep Neural Networks via Information
(Ravid Shwartz-Ziv, Naftali Tishby)
• Information bottleneck theory
20. Visualizing representations t-SNE
• Embeds a high dim
probability distribution
to a 2-D plane
• Uses SGD to minimize
KLD
Embedding of 2-d representation of the
final conv layer in AlexNet trained on
Imagenet images
Visually inspect clusters for
feature coherence
Can be a tool for global
visualization of feature
separation
Is not trivial to get good
results
Credit : Karpathy, t-SNE vizualization of CNN codes
23. Saliency maps – Grad-CAM
- Backprops target class activations from final conv layer
- Does not need any retraining or architecture change
- Quite fast; single operation in most frameworks
- Uses guided backprop to only propagate positive
activations
- Negative gradients get zero-ed out
• Misses negatively correlated inputs
Credit : Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
24. Attribution maps
DeepLift (Deep Learning Important FeaTures)
- Explain “difference of reference value” of output in terms of “difference from reference value” of input:
• ∆ 𝑡 → ∆𝑥#, ∆𝑥$, . . , ∆𝑥&
- Assign contributions 𝐶∆0C,∆D:
• ∑ 𝐶∆0C,∆D
&
G = ∆ 𝑡
- Can account for –ve contributions
- Very new, hasn’t been depicted in non MNIST dataset. Also reference value is empirical
Integrated Gradients
- Pick some reference values, eg image with 0 pixel values
- Scale input values linearly to actual value, do
gradient * ∆𝑖𝑛𝑝𝑢𝑡 at each step
↪ ∆𝑖𝑛𝑝𝑢𝑡 ∗ ∑ 𝑔𝑟𝑎𝑑0C
- Very fine grained – at a pixel level
Learning Important Features Through Propagating Activation Differences
Axiomatic Attribution for Deep Networks
32. Learning theory
Given Input {𝑥G, 𝑥$, . . , 𝑥&} ∈ 𝒳 eg images ; Output {𝑦#, 𝑦$ , … . , 𝑦&} ∈ Υ eg labels ;
Hypothesis space Η set of functions
Goal of supervised learning is to learn a function -> 𝑓[ ∶ 𝑦6]^_ = 𝑓[ 𝑥&^`
Define a loss function ℓ 𝑓[ 𝑥 , 𝑦
Define emprical loss : ℓ[ =
#
b
Σ 𝑓, 𝑧 𝑤ℎ𝑒𝑟𝑒 𝑧 = 𝑥G, 𝑦G
We want lim
&klm
𝑙[ 𝑓[ − 𝑙 𝑓[ = 0 ; Ie training set error and real error converge to 0 as n tends to infinity
No of trainable parameters indicative of model complexity
Regularization is used to penalize complexity and reduce
variance
Generalization Error = |training error – validation error|
34. Under-specification Bias
Scientific Understanding:
• We have no complete way to state
what knowledge is
• Best we can do is ask for
explanation
Safety:
• Complex tasks is almost never
end-to-end testable
• Query model for explanation
Ethics:
• Encoding all protections a
priori, not possible
• Guard against discrimination
Mismatched objectives:
• Optimizing an incomplete
objective
All these may
address
depressions.
But which side
effect are you
willing to accept ?
Debugging:
• We may not know the internals
• Domain mismatches
• Mislabeled Training set
Model lifecycle management:
• Compare different models
• Training set evolution
Your own :
• …