Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCon AI Summit 2017

Interpretability of Deep Neural Nets

Agenda
• Case for Interpretability
• Just how black is the Deep Neural Nets(DNNs) box ?
• Recent research papers/solutions to Interpretability of DNNs
• Demo / Code walk through

ML everywhere
Big Data + Vast computing resources + Key algo breakthroughs

“People worry that computers will get too
smart and take over the world, but the real
problem is that they’re too stupid and
they’ve already taken over the world.”
- Pedro Domingos “The Master Algorithm”
Intelligible Models for Heathcare: Predicting
Pneumonia Risk and Hospital 30 day Readmission
(Rich Caruana et al 2015 , Microsoft Research)
- Goal : Predict POD for InPatient or OutPatient care
- Neural Network predicted Asthma as OutPatient
- Historically Asthama patients were sent start to IC
- Model was “downgraded” to logistic regression

Interpretability in machine learning
⋮
𝑥#
𝑥$
𝑥%
𝑥&
Evaluation
Metrics
𝒚∗
𝒚) Results
Interpretation
What is Interpretability ?
“Ability to explain or to present in understandable terms to a human”
Explanation – “Currency in which we exchange beliefs”
Towards a rigorous science of Interpretability in Machine Learning, (Been Kim et.al.)

Explainable Models
Credit: DARPA XAI project

Native interpretability ?
Linear and
Monotonic
Linear and Non
Monotonic
Non Linear and
Non Monotonic
- Linear regressions
- Decision trees
Multi Adaptive
Regression Splines
- Boosted models
- Non-linear SVMs
- DNNs

When do we need interpretability ?
Stems from incompleteness in problem formalization
(under-specification)
No amount of data may “fix it”
Visual Q/A
Image
Translation
Playing Go
Research
Debug
Mismatched
Objective
Safety
Scientific
Inquiry
Human
Training
Fairness
Voice
NLP
Multi modal
learning
Model
Lifecycle Mgt
Weak
At Par
Better
Domain
Adaption

Scope of Interpretability
Global
The complete conditional distribution
Relationships between input and the dependent variables
Relationship between the ML algo and results
Local
Local region of the conditional distribution
“Why did the model predict this particularly ?”
“What if this particular input was absent ?”

Deep Neural Nets
Input space for CNNs trained on Imagenet
Per Image:
Num of Pixels = 256 X 256 X 3
Values each pixel can take = 256
Domain of all image = 256($./0$./0%)
Total Input : 1.2M across 1000 classes
: 3×256$×1200 ~ 236M/class
: Very very tiny fraction of Domain
Parameter to Training set (6
&⁄ ) : 4 to 110
But… SGD always converges to a good solution
Samoyed White-wolf

Recent research on understanding DNNs
Why does deep and cheap learning work so we ?
(Henry W. Lin (Harvard), Max Tegmark (MIT), David Rolnick (MIT))
• Physics centric theory
Understanding deep learning requires rethinking generalization
(Chiyuan Zhang, Samy Bengio, Maritz Hardt, Benjamin Recht, Oriol Vinyals (Google Brain and Deepmind))
• Revisits learning theory, esp generalization bounds in empirical risk minimization
Opening the Black Box of Deep Neural Networks via Information
(Ravid Shwartz-Ziv, Naftali Tishby)
• Information bottleneck theory

Randomization Test
Partially corrupted labels : independent relabeling with probability p
Random labels : all labels replaced with a random ones
Shuffled pixels : random permutation of the pixels applied to all images
Random pixels : a different random permutation to each image
Gaussian : A Gaussian dist. (same mean/var of original image) of pixels
All hyper-parameters were
kept same
Training error was 0 in all
models

Role of Regularization
• Typical regularizations: Dropouts, batchnormalization,
data augmentation, early stopping, weight decay (l2)
• Explicit regularizers may improve generalization
performance
• Implicit regularizers like model architecture and SGD are a
better controller of generalization error

Takeaways
• Effective capacity of a neural networks is sufficient for memorizing
the entire data set
• Optimization != Generalization
• DNNs are fragile to overfitting, it will shatter any input space
• Regularizers may improve performance but are not necessary or itself
sufficient for controlling generalization error

Interpretability of DNNs. Hmmm..
Notwithstanding a lack of unifying theory on Deep Networks, they
work great
Having 85-90% accuracy in classification problems is almost easy
(of course with state of the art models, and careful hyper-parameter tuning)
How can we build trust ?

Disentanglement and separation of features
Learn facial expressions:
- Same individual are close in pixel space
- To extract expression must disentangle and
separate expression from face
A four layer NN can
separate the spirals
Credit : https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/

Visualizing representations t-SNE
• Embeds a high dim
probability distribution
to a 2-D plane
• Uses SGD to minimize
KLD
Embedding of 2-d representation of the
final conv layer in AlexNet trained on
Imagenet images
Visually inspect clusters for
feature coherence
Can be a tool for global
visualization of feature
separation
Is not trivial to get good
results
Credit : Karpathy, t-SNE vizualization of CNN codes

Proxy models – Knowledge Distillation
DNNs learns probability
dist between target (Dark
knowledge, Hinton)
• First: Train a large
network on training
• Train against point
mass probability
distribution
• Next: Train a shallow
model using richer
probability distribution
between targets
Credit : H2O.ai

Local interpretations
Saliency / Attribution maps
Visualize the features in the input space that mattered for the classification
Sensitivity analysis on model
What would happen to output 𝒚) ?
If we perturb the input 𝒙 → 𝒙 + 𝝐, 𝝐 can be feature, data, specific inputs

Saliency maps – Grad-CAM
- Backprops target class activations from final conv layer
- Does not need any retraining or architecture change
- Quite fast; single operation in most frameworks
- Uses guided backprop to only propagate positive
activations
- Negative gradients get zero-ed out
• Misses negatively correlated inputs
Credit : Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

Attribution maps
DeepLift (Deep Learning Important FeaTures)
- Explain “difference of reference value” of output in terms of “difference from reference value” of input:
• ∆ 𝑡 → ∆𝑥#, ∆𝑥$, . . , ∆𝑥&
- Assign contributions 𝐶∆0C,∆D:
• ∑ 𝐶∆0C,∆D
&
G = ∆ 𝑡
- Can account for –ve contributions
- Very new, hasn’t been depicted in non MNIST dataset. Also reference value is empirical
Integrated Gradients
- Pick some reference values, eg image with 0 pixel values
- Scale input values linearly to actual value, do
gradient * ∆𝑖𝑛𝑝𝑢𝑡 at each step
↪ ∆𝑖𝑛𝑝𝑢𝑡 ∗ ∑ 𝑔𝑟𝑎𝑑0C

- Very fine grained – at a pixel level
Learning Important Features Through Propagating Activation Differences
Axiomatic Attribution for Deep Networks

Sensitivity analysis
Change the input by 𝝐 and then observe
prediction probability
- Occlusion based
- Idea is probability score will drop as
important areas are occluded
- Superpixel based
- Same idea as above but better
coherence
Credit : http://blog.qure.ai/notes/visualizing_deep_learning

LIME
Local Interpretable Model Agnostic Explanations
Key insights
Local vrs Global interpretation
Globally faithful interpretation might be impossible
To explain individual decision need to know the small local region
Global trust
If we trust individual reasonings
Repeat with a good coverage over the input space
Local explanation
of the + data
points
Locally fitted sparse
linear model

Example - CNN
• Segments the image using
opencv
• Build a linear model based
on prediction scores against
segments

Example – NLP (Topic modelling)

Example – Tabular Data (RandomForest)

Conclusion
Interpretability is not a “good to have” feature
This is just the beginning and future is bright
• “Right to explanation” – EU General Data Protection Regulation
• SR 11-7: Guidance on Model Risk Management
• Explainable Artificial Intelligence – Darpa
• https://www.darpa.mil/program/explainable-artificial-intelligence

Learning theory
Given Input {𝑥G, 𝑥$, . . , 𝑥&} ∈ 𝒳 eg images ; Output {𝑦#, 𝑦$ , … . , 𝑦&} ∈ Υ eg labels ;
Hypothesis space Η set of functions
Goal of supervised learning is to learn a function -> 𝑓[ ∶ 𝑦6]^_ = 𝑓[ 𝑥&^`
Define a loss function ℓ 𝑓[ 𝑥 , 𝑦
Define emprical loss : ℓ[ =
#
b
Σ 𝑓, 𝑧 𝑤ℎ𝑒𝑟𝑒 𝑧 = 𝑥G, 𝑦G
We want lim
&klm
𝑙[ 𝑓[ − 𝑙 𝑓[ = 0 ; Ie training set error and real error converge to 0 as n tends to infinity
No of trainable parameters indicative of model complexity
Regularization is used to penalize complexity and reduce
variance
Generalization Error = |training error – validation error|

Model Selection : Bias - Variance Tradeoff
Deep Neural
Nets

Under-specification Bias
Scientific Understanding:
• We have no complete way to state
what knowledge is
• Best we can do is ask for
explanation
Safety:
• Complex tasks is almost never
end-to-end testable
• Query model for explanation
Ethics:
• Encoding all protections a
priori, not possible
• Guard against discrimination
Mismatched objectives:
• Optimizing an incomplete
objective
All these may
address
depressions.
But which side
effect are you
willing to accept ?
Debugging:
• We may not know the internals
• Domain mismatches
• Mislabeled Training set
Model lifecycle management:
• Compare different models
• Training set evolution
Your own :
• …

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCon AI Summit 2017

Recommandé

Recommandé

Contenu connexe

Similaire à Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCon AI Summit 2017

Similaire à Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCon AI Summit 2017 (20)

Plus de StampedeCon

Plus de StampedeCon (20)

Dernier

Dernier (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCon AI Summit 2017