2. Questions to be
Explained
https://www.britannica.com/technology/MYCIN
Overall guiding questions:
• Why did the system pick this rather
than that?
• When will it fail?
• Is the system trustworthy?
• Locus/location of control.
Specific WHY questions:
• Why did the algorithm work that
way?
• What elements of the problem
space led to that decision?
• Why did your learning process
produce that algorithm?
2
3. What is a good
explanation?
(1)
http://www.socsci.ru.nl/johank/Explainable%20AI%2006-04-
18.pdf
• Explanation not just answers “why
this”, but “why this, rather than
that” (parsimoniously)
• Q: “Why did Alice got tenure (while
Bob didn’t)?”
• A1: “Alice had a good publication
record”
• But Bob had a good publication
record as well!
That doesn’t explain why she
got tenure!
• A2: “Alice had a good publication
record and did quality teaching”
• Bob was a poor teacher, so this
explains why Alice got tenure and
Bob was denied tenure!
3
4. Explanations in Various AI & ML Models
4
(a) Linear regression; (b) Decision trees; (c) K-Nearest Neighbors;
(d) Rule-based Learners; (e) Generalized Additive Models; (f) Bayesian Models.
https://www.sciencedirect.com/science/article/pii/S1566253519308103#fig0001
6. Local Interpretable Model-Agnostic
Explanations (LIME)
https://homes.cs.washington.edu/~marcotcr/blog/lime/ 6
• LIME method was originally proposed by Ribeiro, Singh, and
Guestrin (2016).
• The key idea behind it is to approximate a global model
(which is a black-box) by local models which are simpler and
transparent.
7. LIME Method
https://homes.cs.washington.edu/~marcotcr/blog/lime/ 7
• In order to be model-agnostic, LIME can't peak into the
model. What LIME does to learn the behavior of the
underlying model is to first perturb the input (e.g.,
removing words or hiding parts of the image).
• For images, an original image is divided into interpretable
components (contiguous superpixels).
9. https://pbiecek.github.io/ema/shapley.html 9
Perturbation for text data:
For example, if we are trying to explain the
prediction of a text classifier for the sentence
“I hate this movie”, we will perturb the sentence
and get predictions on sentences such as “I hate
movie”, “I this movie”, “I movie”, “I hate”, etc.
12. Lime Algorithm
12
•G is a class of potentially interpretable models (e.g, linear regression,
decision trees, etc..)
• Ω(g), where g belongs to G, is a measure of complexity (opposed to
interpretability). It might represent, for instance, the number of non-zero
weights for a linear model, or the depth of the tree for decision trees.
• πₓ(z) is a proximity measure between data points, which assumes high
values when x and z are “close”, low values if they are “far”.
•𝔏(f,g,πₓ) is a loss function which evaluates the level of inaccuracy of g in
approximating f within the defined locality πₓ.
16. LIME Algorithm
https://homes.cs.washington.edu/~marcotcr/blog/lime/ 16
1. Sample the locality around the selected single data point
uniformly and at random and generate a dataset of perturbed
data points with it’s corresponding prediction from the model
we want to be explained.
2. Use the specified feature selection methodology to select the
number of features that is required for explanation.
3. Calculate the sample weights using a kernel function and a
distance function. (this captures how close or how far the
sampled points are from the original point).
4. Fit an interpretable model (locally weighted linear
regression) on the perturbed dataset using the sample
weights to weigh the objective function (e.g. squared error).
5. Provide local explanations using the newly trained
interpretable model.
18. https://homes.cs.washington.edu/~marcotcr/blog/lime/ 18
• More examples:
Text classification (20 newsgroups). Six features (words) are used.
Negative (blue) words indicate atheism, while positive (orange) words indicate
Christian. The way to interpret the weights by applying them to the prediction
probabilities. For example, if we remove the words Host and NNTP from the
document, we expect the classifier to predict atheism with probability 0.58 - 0.14
- 0.11 = 0.31.
19. cooperative games
20
• Suppose we have a team in Kaggle of 3
players in a competition where the value of
the different coalitions is given by:
20. And in Binary classificcation
In some far away country we got this elections outcome
• Reds got 46 seats ,Blues got 47 seats ,Bennett got 7 seats
• Reds cant make a coalition with Blue
• Bennet willing to cooperate with everybody
• Lets define utilty function;
v(Reds,blues) = 0
v(Bennett, Blues) = 1
v(Reds, Bennett) = 1
v(Bennett, Blues,Reds) = 1
Value of reds = (1/6) , Value of blues = (1/6) , Value of bennet = 4/6)
21
21. Defintion
• N is the set containing all the players. In our example, N contains the three
factories.
• S is a subset of N (i.e. S ⊆ N). It is nothing more than a subset of participants
of the grand coalition N.
• i is an element of N (i.e. i ∈ N)
• v is a value function that maps subsets of players S to a real number
• σ be the set with all the permutations of N
• Let Q(σⱼ , i) be the coalition of the predecessors of the player i in σⱼ where σⱼ
is an element of σ. For instance if σⱼ=(P1, P4, P2, P3) and i=P2 then Q(σⱼ ,
i)={P1, P4}.
•
22
23. Problems
1.How can we evaluate a model without a
feature in order to compute the marginal
contribution?
2.In order to exactly compute Shapley
Values we must evaluate all possible
coalitions of features, but it exponentially
increases with the number of features.
24
25. Counterfactual Explanations
• A counterfactual explanation of a
prediction describes the smallest
change to the feature values that
changes the prediction to a predefined
output.
26
27. Approach: Counterfactual Optimization
● Minimize:
○ Squared error with desired (counterfactual) label
○ Distance to perturbed point
● While maximizing weight on squared error term
○ We want to have the prediction change more than we want the point to be close
○ Iteratively: solve for x’, then maximize lambda
○ High λ - counterfactuals with predictions close to the desired outcome y’
○ Low λ - counterfactuals x’ that are similar to x in the feature values.
Norms: L1 > sparsity > interpretability
L2 > classic
Factors: MAD > robust to outliers
STD > standardizes over features of differents scale (hopefully MAD does this to a similar extent)