Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Interpretability beyond feature attribution:
Testing with Concept Activation Vectors
TCAV
Been Kim
Presenting work with a ...
My goal
interpretability
!2
To use machine learning responsibly
we need to ensure that
1. our values are aligned
2. our kn...
My goal
interpretability
!3
To use machine learning responsibly
we need to ensure that
1. our values are aligned
2. our kn...
ingredients for interpretability methods.
!4
?
Some quality function
Model
Model
Class1
Class0
Data
Class1
Class0
Data
Class1
Class0
Human
What’s ML?
If I were you, I
would train a
neural network.
newbie
expert
Class1
Class0
Task
• Local vs. global
• Simple explanations vs.
more complex but more
accurate explanations
• Low or high ...
Interpretability methods
Post-training explanations
Building inherently interpretable models
Explaining data
Agenda
Post-training explanations
Agenda
1. Revisiting existing methods:
Saliency maps
2. Making explanations using the way humans think:
Testing with conce...
Agenda
Sanity Checks for Saliency Maps
Joint work with Adebayo, Gilmer, Goodfellow, Hardt, [NIPS 18]
1. Revisiting existin...
Problem:
Post-training explanation
!15
cash-machine-ness
A trained
machine learning model
(e.g., neural network)
Why was t...
One of the most popular interpretability methods for images:
Saliency maps
!16
SmoothGrad [Smilkov, Thorat, K., Viégas, Wa...
One of the most popular interpretability methods for images:
Saliency maps
!17
SmoothGrad [Smilkov, Thorat, K., Viégas, Wa...
One of the most popular interpretability methods for images:
Saliency maps
!18
Sanity check:
If I change M a lot, will hum...
Some confusing behaviors of saliency maps.
Saliency map
Sanity Checks for Saliency Maps
Joint work with Adebayo, Gilmer, G...
Some confusing behaviors of saliency maps.
Randomized weights!
Network now makes garbage prediction.
Saliency map
Sanity C...
Some confusing behaviors of saliency maps.
Saliency map
Randomized weights!
Network now makes garbage prediction.
!!!!!???...
Some saliency maps look similar
when we randomize the network
(= making the network completely useless)
Before After
Guide...
• Potential human confirmation bias: Just because it
“makes sense” to humans, doesn’t mean they reflect
evidence for the p...
!24
local
undestanding
lay
person?
human’s
subjective
judgement
What can we do better?
Creating a wishlist.
Using
input
fe...
!25
local
undestanding
lay
person?
human’s
subjective
judgement
global
!
quantitive
Using
input
features
as a language
Som...
Agenda
TCAV [ICML’18]
Joint work with Wattenberg, Gilmer, Cai, Wexler, Viegas, Sayres
1. Revisiting existing methods:
Sali...
Problem:
Post-training explanation
!27
cash-machine-ness
A trained
machine learning model
(e.g., neural network)
Why was t...
prediction:
Cash machine
https://pair-code.github.io/saliency/
SmoothGrad [Smilkov, Thorat, K., Viégas, Wattenberg ’17]
Co...
prediction:
Cash machine
https://pair-code.github.io/saliency/
SmoothGrad [Smilkov, Thorat, K., Viégas, Wattenberg ’17]
Wh...
prediction:
Cash machine
https://pair-code.github.io/saliency/
SmoothGrad [Smilkov, Thorat, K., Viégas, Wattenberg ’17]
Wh...
prediction:
Cash machine
https://pair-code.github.io/saliency/
SmoothGrad [Smilkov, Thorat, K., Viégas, Wattenberg ’17]
Wh...
prediction:
Cash machine
https://pair-code.github.io/saliency/
SmoothGrad [Smilkov, Thorat, K., Viégas, Wattenberg ’17]
Wh...
Quantitative explanation: how much a concept (e.g., gender, race)
was important for a prediction in a trained model.
…even...
Goal of TCAV:
Testing with Concept Activation Vectors
!34
Doctor-ness
A trained
machine learning model
(e.g., neural netwo...
Goal of TCAV:
Testing with Concept Activation Vectors
!35
Doctor-ness
A trained
machine learning model
(e.g., neural netwo...
Goal of TCAV:
Testing with Concept Activation Vectors
!36
Doctor-ness
TCAV score for
womennot women
Doctor
A trained
machi...
Goal of TCAV:
Testing with Concept Activation Vectors
!37
Doctor-ness
TCAV score for
womennot women
Doctor
A trained
machi...
Goal of TCAV:
Testing with Concept Activation Vectors
!38
zebra-ness
A trained
machine learning model
(e.g., neural networ...
TCAV
TCAV:
Testing with Concept Activation Vectors
!39
zebra-ness
A trained
machine learning model
(e.g., neural network)
...
Defining concept activation vector (CAV)
Inputs:
!40
Random
images
Examples of
concepts
A trained network under investigat...
!41
Inputs:
Train a linear classifier to
separate activations.
CAV ( ) is the vector
orthogonal to the decision
boundary.
...
TCAV
TCAV:
Testing with Concept Activation Vectors
!42
zebra-ness
A trained
machine learning model
(e.g., neural network)
...
TCAV core idea:
Derivative with CAV to get prediction sensitivity
!43
TCAV score
Directional derivative with CAV
striped CAV
TCAV score
Directional derivative with CAV
TCAV core idea:
Derivative with CAV to get prediction sensitivity
!...
TCAV
TCAV:
Testing with Concept Activation Vectors
!45
zebra-ness
A trained
machine learning model
(e.g., neural network)
...
TCAV
TCAV:
Testing with Concept Activation Vectors
!46
zebra-ness
A trained
machine learning model
(e.g., neural network)
...
Quantitative validation:
Guarding against spurious CAV
Did my CAVs returned high sensitivity by chance?
!47
Learn many stripes CAVs
using different sets of
random images
Quantitative validation:
Guarding against spurious CAV
!48
……
Zebra
Quantitative validation:
Guarding against spurious CAV
!49
……
Zebra
Quantitative validation:
Guarding against spurious CAV
!50
Check the distribution of
is statistically
different from random
using t-test
TCAV score
random
……
Zebra
Quantitative vali...
Recap TCAV:
Testing with Concept Activation Vectors
!52
1. Learning CAVs 2. Getting TCAV score 3. CAV validation
Qualitati...
Results
1. Sanity check experiment
2. Biases in Inception V3 and GoogleNet
3. Domain expert confirmation from Diabetic Ret...
Results
1. Sanity check experiment
2. Biases from Inception V3 and GoogleNet
3. Domain expert confirmation from Diabetic R...
Sanity check experiment
!55
If we know the ground truth
(important concepts),
will TCAV match?
Sanity check experiment setup
!56
An image
+
Potentially noisy Caption
Sanity check experiment setup
!57
An image
+
Potentially noisy Caption
image
concept
models can use either
image or captio...
Sanity check experiment setup
!58 Caption noise level in training set
An image
+
Potentially noisy Caption
image
concept
m...
Sanity check experiment setup
!59
Test accuracy
with
no caption image
=
Importance of
image concept
Caption noise level in...
Sanity check experiment
!60
Caption noise level in training set Caption noise level in training set
Test accuracy
with
no ...
!61
Cool, cool.
Can saliency maps do this too?
Can saliency maps communicate
the same information?
!62
Ground truth
Image
concept
Image
concept
Image
concept
Image
conce...
Human subject experiment:
Can saliency maps communicate the same
information?
• 50 turkers are
• asked to judge importance...
!64
• Random chance: 50%
• Human performance with
saliency map: 52%
• Humans can’t agree: more
than 50% no significant
con...
Human subject experiment:
Can saliency maps communicate the same
information?
• Random chance: 50%
• Human performance wit...
Results
1. Sanity check experiment
2. Biases from Inception V3 and GoogleNet
3. Domain expert confirmation from Diabetic R...
TCAV in
Two widely used image prediction models
!67
TCAV in
Two widely used image prediction models
!68
Geographical
bias!
http://www.abc.net.au
TCAV in
Two widely used image prediction models
!69
Quantitative
confirmation to
previously
qualitative
findings
[Stock & ...
TCAV in
Two widely used image prediction models
!70
Quantitative
confirmation to
previously
qualitative
findings
[Stock & ...
Results
1. Sanity check experiment
2. Biases Inception V3 and GoogleNet
3. Domain expert confirmation from Diabetic Retino...
Diabetic Retinopathy
• Treatable but sight-threatening conditions
• Have model to with accurate prediction of DR (85%)
[Kr...
Collect human doctor’s knowledge
!73
PRP
PRH/VH
NV/FP
VB
MA HMA
DR level 4
DR level 1
Concepts

belong to 

this level
Con...
TCAV for Diabetic Retinopathy
!74
PRP PRH/VH NV/FP VB
Green: domain expert’s label on concepts belong to the level
Red: do...
PRP PRH/VH NV/FP VB
Green: domain expert’s label on concepts belong to the level
Red: domain expert’s label on concepts do...
PRP PRH/VH NV/FP VB
Green: domain expert’s label on concepts belong to the level
Red: domain expert’s label on concepts do...
Summary:
Testing with Concept Activation Vectors
!77
stripes concept (score: 0.9)
was important to zebra class
for this tr...
Prochain SlideShare
Chargement dans…5
×

Interpretability beyond feature attribution quantitative testing with concept activation vectors (tcav)

660 vues

Publié le

Been Kim, Senior Research Scientist, Google Brain

Publié dans : Technologie
  • Soyez le premier à commenter

Interpretability beyond feature attribution quantitative testing with concept activation vectors (tcav)

  1. 1. Interpretability beyond feature attribution: Testing with Concept Activation Vectors TCAV Been Kim Presenting work with a lot of awesome people inside and outside of Google: Marten Wattenberg, Julius Adebayo, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Ian Goodfellow, Mortiz Hardt, Rory Sayres
  2. 2. My goal interpretability !2 To use machine learning responsibly we need to ensure that 1. our values are aligned 2. our knowledge is reflected for everyone. http://blogs.teradata.com/ Machine Learning Models Human
  3. 3. My goal interpretability !3 To use machine learning responsibly we need to ensure that 1. our values are aligned 2. our knowledge is reflected for everyone. http://blogs.teradata.com/ Machine Learning Models Human
  4. 4. ingredients for interpretability methods. !4 ? Some quality function
  5. 5. Model
  6. 6. Model
  7. 7. Class1 Class0 Data
  8. 8. Class1 Class0 Data
  9. 9. Class1 Class0 Human What’s ML? If I were you, I would train a neural network. newbie expert
  10. 10. Class1 Class0 Task • Local vs. global • Simple explanations vs. more complex but more accurate explanations • Low or high stake domains
  11. 11. Interpretability methods Post-training explanations Building inherently interpretable models Explaining data
  12. 12. Agenda Post-training explanations
  13. 13. Agenda 1. Revisiting existing methods: Saliency maps 2. Making explanations using the way humans think: Testing with concept activation vectors (TCAV) Post-training explanations
  14. 14. Agenda Sanity Checks for Saliency Maps Joint work with Adebayo, Gilmer, Goodfellow, Hardt, [NIPS 18] 1. Revisiting existing methods: Saliency maps 2. Making explanations using the way humans think: Testing with concept activation vectors (TCAV) Post-training explanations
  15. 15. Problem: Post-training explanation !15 cash-machine-ness A trained machine learning model (e.g., neural network) Why was this a cash machine?
  16. 16. One of the most popular interpretability methods for images: Saliency maps !16 SmoothGrad [Smilkov, Thorat, K., Viégas, Wattenberg ’17] Integrated gradient [Sundararajan, Taly, Yan ’17] picture credit: @sayres Caaaaan do! We’ve got saliency maps to measure importance of each pixel! a logit pixel i,j
  17. 17. One of the most popular interpretability methods for images: Saliency maps !17 SmoothGrad [Smilkov, Thorat, K., Viégas, Wattenberg ’17] Integrated gradient [Sundararajan, Taly, Yan ’17] widely used for images local undestandingNN humans’ subjective judgement picture credit: @sayres
  18. 18. One of the most popular interpretability methods for images: Saliency maps !18 Sanity check: If I change M a lot, will human perceive that E has changed a lot? SmoothGrad [Smilkov, Thorat, K., Viégas, Wattenberg ’17] Integrated gradient [Sundararajan, Taly, Yan ’17] widely used for images local undestandingNN humans’ subjective judgement picture credit: @sayres
  19. 19. Some confusing behaviors of saliency maps. Saliency map Sanity Checks for Saliency Maps Joint work with Adebayo, Gilmer, Goodfellow, Hardt, [NIPS 18]
  20. 20. Some confusing behaviors of saliency maps. Randomized weights! Network now makes garbage prediction. Saliency map Sanity Checks for Saliency Maps Joint work with Adebayo, Gilmer, Goodfellow, Hardt, [NIPS 18]
  21. 21. Some confusing behaviors of saliency maps. Saliency map Randomized weights! Network now makes garbage prediction. !!!!!???!? Sanity Checks for Saliency Maps Joint work with Adebayo, Gilmer, Goodfellow, Hardt, [NIPS 18]
  22. 22. Some saliency maps look similar when we randomize the network (= making the network completely useless) Before After Guided Backprop Integrated Gradient Sanity Checks for Saliency Maps Joint work with Adebayo, Gilmer, Goodfellow, Hardt, [NIPS 18]
  23. 23. • Potential human confirmation bias: Just because it “makes sense” to humans, doesn’t mean they reflect evidence for the prediction. • Our discovery is consistent with other findings. [Nie, Zhang, Patel ’18] [Ulyanov, Vedaldi, Lempitsky ’18] • Some of these methods have been shown to be useful for humans. Why? More studies needed. What can we learn from this? Sanity Checks for Saliency Maps Joint work with Adebayo, Gilmer, Goodfellow, Hardt, [NIPS 18]
  24. 24. !24 local undestanding lay person? human’s subjective judgement What can we do better? Creating a wishlist. Using input features as a language
  25. 25. !25 local undestanding lay person? human’s subjective judgement global ! quantitive Using input features as a language Something more human-friendly? What can we do better? Creating a wishlist.
  26. 26. Agenda TCAV [ICML’18] Joint work with Wattenberg, Gilmer, Cai, Wexler, Viegas, Sayres 1. Revisiting existing methods: Saliency maps 2. Making explanations using the way humans think: Testing with concept activation vectors (TCAV) Post-training explanations
  27. 27. Problem: Post-training explanation !27 cash-machine-ness A trained machine learning model (e.g., neural network) Why was this a cash machine? TCAV [ICML’18] Joint work with Wattenberg, Gilmer, Cai, Wexler, Viegas, Sayres
  28. 28. prediction: Cash machine https://pair-code.github.io/saliency/ SmoothGrad [Smilkov, Thorat, K., Viégas, Wattenberg ’17] Common solution: Saliency map !28 Let’s use this to help us think about what what we really want to ask.
  29. 29. prediction: Cash machine https://pair-code.github.io/saliency/ SmoothGrad [Smilkov, Thorat, K., Viégas, Wattenberg ’17] What we really want to ask… !29 Were there more pixels on the cash machine than on the person? Did the ‘human’ concept matter? Did the ‘wheels’ concept matter?
  30. 30. prediction: Cash machine https://pair-code.github.io/saliency/ SmoothGrad [Smilkov, Thorat, K., Viégas, Wattenberg ’17] What we really want to ask… !30 Were there more pixels on the cash machine than on the person? Which concept mattered more? Is this true for all other cash machine predictions? Did the ‘human’ concept matter? Did the ‘wheels’ concept matter?
  31. 31. prediction: Cash machine https://pair-code.github.io/saliency/ SmoothGrad [Smilkov, Thorat, K., Viégas, Wattenberg ’17] What we really want to ask… !31 Oh no! I can’t express these concepts as pixels!! They weren’t my input features either! Were there more pixels on the cash machine than on the person? Which concept mattered more? Is this true for all other cash machine predictions? Did the ‘human’ concept matter? Did the ‘wheels’ concept matter?
  32. 32. prediction: Cash machine https://pair-code.github.io/saliency/ SmoothGrad [Smilkov, Thorat, K., Viégas, Wattenberg ’17] What we really want to ask… !32 Were there more pixels on the cash machine than on the person? Which concept mattered more? Is this true for all other cash machine predictions? Wouldn’t it be great if we can quantitatively measure how important any of these user-chosen concepts are? Did the ‘human’ concept matter? Did the ‘wheels’ concept matter?
  33. 33. Quantitative explanation: how much a concept (e.g., gender, race) was important for a prediction in a trained model. …even if the concept was not part of the training. Goal of TCAV: Testing with Concept Activation Vectors !33 ICML 2018
  34. 34. Goal of TCAV: Testing with Concept Activation Vectors !34 Doctor-ness A trained machine learning model (e.g., neural network) vactruth.com healthcommunitiesproviderservices
  35. 35. Goal of TCAV: Testing with Concept Activation Vectors !35 Doctor-ness A trained machine learning model (e.g., neural network) Was gender concept important to this doctor image classifier? vactruth.com healthcommunitiesproviderservices
  36. 36. Goal of TCAV: Testing with Concept Activation Vectors !36 Doctor-ness TCAV score for womennot women Doctor A trained machine learning model (e.g., neural network) vactruth.com healthcommunitiesproviderservices Was gender concept important to this doctor image classifier?
  37. 37. Goal of TCAV: Testing with Concept Activation Vectors !37 Doctor-ness TCAV score for womennot women Doctor A trained machine learning model (e.g., neural network) vactruth.com healthcommunitiesproviderservices Was gender concept important to this doctor image classifier? TCAV provides quantitative importance of a concept if and only if your network learned about it.
  38. 38. Goal of TCAV: Testing with Concept Activation Vectors !38 zebra-ness A trained machine learning model (e.g., neural network) Was striped concept important to this zebra image classifier? TCAV score for not stripedstriped Zebra TCAV provides quantitative importance of a concept if and only if your network learned about it.
  39. 39. TCAV TCAV: Testing with Concept Activation Vectors !39 zebra-ness A trained machine learning model (e.g., neural network) Was striped concept important to this zebra image classifier? 1. Learning CAVs 1. How to define concepts?
  40. 40. Defining concept activation vector (CAV) Inputs: !40 Random images Examples of concepts A trained network under investigation and Internal tensors
  41. 41. !41 Inputs: Train a linear classifier to separate activations. CAV ( ) is the vector orthogonal to the decision boundary. [Smilkov ’17, Bolukbasi ’16 , Schmidt ’15] Defining concept activation vector (CAV)
  42. 42. TCAV TCAV: Testing with Concept Activation Vectors !42 zebra-ness A trained machine learning model (e.g., neural network) Was striped concept important to this zebra image classifier? 1. Learning CAVs 2. Getting TCAV score 2. How are the CAVs useful to get explanations?
  43. 43. TCAV core idea: Derivative with CAV to get prediction sensitivity !43 TCAV score Directional derivative with CAV
  44. 44. striped CAV TCAV score Directional derivative with CAV TCAV core idea: Derivative with CAV to get prediction sensitivity !44
  45. 45. TCAV TCAV: Testing with Concept Activation Vectors !45 zebra-ness A trained machine learning model (e.g., neural network) Was striped concept important to this zebra image classifier? 1. Learning CAVs 2. Getting TCAV score
  46. 46. TCAV TCAV: Testing with Concept Activation Vectors !46 zebra-ness A trained machine learning model (e.g., neural network) Was striped concept important to this zebra image classifier? 1. Learning CAVs 2. Getting TCAV score 3. CAV validation Qualitative Quantitative
  47. 47. Quantitative validation: Guarding against spurious CAV Did my CAVs returned high sensitivity by chance? !47
  48. 48. Learn many stripes CAVs using different sets of random images Quantitative validation: Guarding against spurious CAV !48
  49. 49. …… Zebra Quantitative validation: Guarding against spurious CAV !49
  50. 50. …… Zebra Quantitative validation: Guarding against spurious CAV !50
  51. 51. Check the distribution of is statistically different from random using t-test TCAV score random …… Zebra Quantitative validation: Guarding against spurious CAV !51 *
  52. 52. Recap TCAV: Testing with Concept Activation Vectors !52 1. Learning CAVs 2. Getting TCAV score 3. CAV validation Qualitative Quantitative TCAV provides quantitative importance of a concept if and only if your network learned about it. Even if your training data wasn’t tagged with the concept Even if your input feature did not include the concept
  53. 53. Results 1. Sanity check experiment 2. Biases in Inception V3 and GoogleNet 3. Domain expert confirmation from Diabetic Retinopathy !53
  54. 54. Results 1. Sanity check experiment 2. Biases from Inception V3 and GoogleNet 3. Domain expert confirmation from Diabetic Retinopathy !54
  55. 55. Sanity check experiment !55 If we know the ground truth (important concepts), will TCAV match?
  56. 56. Sanity check experiment setup !56 An image + Potentially noisy Caption
  57. 57. Sanity check experiment setup !57 An image + Potentially noisy Caption image concept models can use either image or caption concept for classification. caption concept
  58. 58. Sanity check experiment setup !58 Caption noise level in training set An image + Potentially noisy Caption image concept models can use either image or caption concept for classification. caption concept
  59. 59. Sanity check experiment setup !59 Test accuracy with no caption image = Importance of image concept Caption noise level in training set image concept caption concept models can use either image or caption concept for classification.
  60. 60. Sanity check experiment !60 Caption noise level in training set Caption noise level in training set Test accuracy with no caption image
  61. 61. !61 Cool, cool. Can saliency maps do this too?
  62. 62. Can saliency maps communicate the same information? !62 Ground truth Image concept Image concept Image concept Image concept Image with caption
  63. 63. Human subject experiment: Can saliency maps communicate the same information? • 50 turkers are • asked to judge importance of image vs. c. ept given saliency maps. • asked to indicate their confidence • shown 3 classes (cab, zebra, cucumber) x 2 saliency maps for one model !63 image caption
  64. 64. !64 • Random chance: 50% • Human performance with saliency map: 52% • Humans can’t agree: more than 50% no significant consensus • Humans are very confident even when they are wrong. Human subject experiment: Can saliency maps communicate the same information?
  65. 65. Human subject experiment: Can saliency maps communicate the same information? • Random chance: 50% • Human performance with saliency map: 52% • Humans can’t agree: more than 50% no significant consensus • Humans are very confident even when they are wrong. !65
  66. 66. Results 1. Sanity check experiment 2. Biases from Inception V3 and GoogleNet 3. Domain expert confirmation from Diabetic Retinopathy !66
  67. 67. TCAV in Two widely used image prediction models !67
  68. 68. TCAV in Two widely used image prediction models !68 Geographical bias! http://www.abc.net.au
  69. 69. TCAV in Two widely used image prediction models !69 Quantitative confirmation to previously qualitative findings [Stock & Cisse, 2017] Geographical bias?
  70. 70. TCAV in Two widely used image prediction models !70 Quantitative confirmation to previously qualitative findings [Stock & Cisse, 2017] Geographical bias? Goal of interpretability: To use machine learning responsibly we need to ensure that 1. our values are aligned 2. our knowledge is reflected
  71. 71. Results 1. Sanity check experiment 2. Biases Inception V3 and GoogleNet 3. Domain expert confirmation from Diabetic Retinopathy !71
  72. 72. Diabetic Retinopathy • Treatable but sight-threatening conditions • Have model to with accurate prediction of DR (85%) [Krause et al., 2017] !72 Concepts the ML model uses Vs Diagnostic Concepts human doctors use
  73. 73. Collect human doctor’s knowledge !73 PRP PRH/VH NV/FP VB MA HMA DR level 4 DR level 1 Concepts belong to this level Concepts do not belong to this level
  74. 74. TCAV for Diabetic Retinopathy !74 PRP PRH/VH NV/FP VB Green: domain expert’s label on concepts belong to the level Red: domain expert’s label on concepts does not belong to the level Prediction class DR level 4 Prediction accuracy High Example TCAV scores TCAV shows the model is consistent with doctor’s knowledge when model is accurate
  75. 75. PRP PRH/VH NV/FP VB Green: domain expert’s label on concepts belong to the level Red: domain expert’s label on concepts does not belong to the level Prediction class DR level 4 Prediction accuracy High Example TCAV scores TCAV shows the model is consistent with doctor’s knowledge when model is accurate TCAV shows the model is inconsistent with doctor’s knowledge for classes when model is less accurate DR level 1 Med TCAV for Diabetic Retinopathy !75 MA HMA
  76. 76. PRP PRH/VH NV/FP VB Green: domain expert’s label on concepts belong to the level Red: domain expert’s label on concepts does not belong to the level Prediction class DR level 4 Prediction accuracy High Example TCAV scores TCAV shows the model is consistent with doctor’s knowledge when model is accurate Level 1 was often confused to level 2. DR level 1 Low TCAV shows the model is inconsistent with doctor’s knowledge for classes when model is less accurate TCAV for Diabetic Retinopathy !76 MA HMA Goal of interpretability: To use machine learning responsibly we need to ensure that 1. our values are aligned 2. our knowledge is reflected
  77. 77. Summary: Testing with Concept Activation Vectors !77 stripes concept (score: 0.9) was important to zebra class for this trained network. PRP PRH/VH NV/FP VB Our values Our knowledge TCAV provides quantitative importance of a concept if and only if your network learned about it. Joint work with Wattenberg, Gilmer, Cai, Wexler, Viegas, Sayres ICML 2018

×