SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
WHITENING THE BLACKBOX : WHY AND HOW TO EXPLAIN
MACHINE LEARNING PREDICTIONS ?
PyData 2015 / Paris
Christophe Bourguignat (@chris_bour)
Marcin Detyniecki
Bora Eang
We are a new Data team
We are Python & scikit-learn heavy users and
hopefully contributors
We like tricky problems, doubts, questions
And love to share with data geeks about that !
DISCLAIMER - WHO ARE WE ?
2 | PyData 2015 - Paris
Why explaining Machine Learning models ?
3 | PyData 2015 - Paris
Why explaining Machine Learning models ?
4 | PyData 2015 - Paris
Explain why a given loan application
did not meet credit underwriting
policy
Why explaining Machine Learning models ?
5 | PyData 2015 - Paris
Explain why a given loan application
did not meet credit underwriting
policy
Explain why a given transaction is
suspicious
Why explaining Machine Learning models ?
6 | PyData 2015 - Paris
Explain why a given loan application
did not meet credit underwriting
policy
Explain why a given transaction is
suspicious
Explain why a given job is recommended
for an unemployed
Why explaining Machine Learning models ?
7 | PyData 2015 - Paris
French « Conseil d’Etat » recommendation
Impose to algorithm-based decisions a transparency requirement, on
personal data used by the algorithm, and the general reasoning it
followed. Give the person subject to the decision the possibility of
submitting its observations.
8 | PyData 2015 - Paris
What do we actually want ?
9 | PyData 2015 - Paris
We don’t ask for a “typical profile” of the selected population
We want a reason why an observation got selected by our algorithm
This reason must be simple and understandable (actionable), but can
be specific to it
Observation A, next to observation B on our selected population, can
be selected for a completely different reason
http://www.holehouse.org/mlclass/07_Regularization.html
Toy Example : Titanic Dataset (1/2)
10 | PyData 2015 - Paris
Toy Example : Titanic Dataset (2/2)
11 | PyData 2015 - Paris
These women were predicted with
a high confidence as « survived »
Toy Example : Titanic Dataset (2/2)
12 | PyData 2015 - Paris
These women were predicted with as
high confidence as « not survived »
These women were predicted with
a high confidence as « survived »
Toy Example : Titanic Dataset (2/2)
13 | PyData 2015 - Paris
These women were predicted with as
high confidence as « not survived »
These women were predicted with
a high confidence as « survived »
We are looking for a method saying : why ?
A simple case : linear models
14 | PyData 2015 - Paris
-5
0
5
10
1 2 3 4
Beta
x
Beta.x
-5
0
5
10
1 2 3 4
Beta
x
Beta.x
Observation 1 Observation 2
Unfortunately, predictive
models that are the most
powerful are usually the
least interpretable
http://machinelearningmastery.com/model-prediction-versus-interpretation-in-machine-learning/
scikit-learn includes the .feature_importances_ attribute …
 Implementation from Breiman, Friedman, "Classification and regression
trees", 1984 (“Gini Importance" or “Mean Decrease Impurity“)
 Louppe, 2014 “Understanding Random Forests”, PhD dissertation
 R package also implements “Mean Decrease Accuracy”
… but has nothing to show features contribution for a given observation
A complicated case : Random Forests
16 | PyData 2015 - Paris
http://ngm.nationalgeographic.com/2012/12/sequoias/quammen-text
« What if » explanation
Sensitivity of the variable
(i.e. derivative)
Feature contribution
 « path approach »
How to interpret a forest?
17 | PyData 2015 - Paris MENTION DE CONFIDENTIALITÉ
« What if » explanation
Sensitivity of the variable
(i.e. derivative)
Feature contribution
 « path approach »
How to interpret a forest?
18 | PyData 2015 - Paris
State Of The Art
19 | PyData 2015 - Paris
http://arxiv.org/abs/1312.1121
State Of The Art
20 | PyData 2015 - Paris
http://blog.datadive.net/interpreting-random-forests/
Scikit-Learn IPython demo
21 | PyData 2015 - Paris
Playing with Titanic Data
Traversing the trees and using a trivial metric :
 +1 when a feature is crossed
 + impurity when a feature is crossed
Limitations : Scikit learn stores :
 The number of samples for each node (tree_.n_node_samples)
 The breakdown by class (tree_.value), but only for leaves
Be able to compare subsets induced by each tree
What do we need ? (1/2)
22 | PyData 2015 - Paris
Mr Smith
Be able to compare subsets induced by each tree
What do we need ? (1/2)
23 | PyData 2015 - Paris
Mr Smith
Be able to compare subsets induced by each tree
What do we need ? (1/2)
24 | PyData 2015 - Paris
Mr Smith
Be able to compare subsets induced by each tree
What do we need ? (1/2)
25 | PyData 2015 - Paris
Mr Smith
Be able to compare subsets induced by each tree
What do we need ? (1/2)
26 | PyData 2015 - Paris
Mr Smith
Ideally, we would need easy access to all nodes attributes :
 Average_score
 Node_size (absolute or %)
 Number of class 0 samples (absolute or %)
 Number of class 1 samples (absolute or %)
 …
For each tree
 For each node
- Metric += F(parent_node, node, left_child_node, right_child_node, brother_node)
- E.g : F = parent_node.average_score – node.average_score
What do we need (2/2)
27 | PyData 2015 - Paris
Thank You
and join us, we have many other
problems to crack !
Christophe Bourguignat (@chris_bour)
Marcin Detyniecki
Bora Eang

Contenu connexe

Similaire à PyData Paris 2015 - Track 2.3 AXA

محازةرةي يةكةم
محازةرةي يةكةممحازةرةي يةكةم
محازةرةي يةكةم
Gaylan Blbas
 

Similaire à PyData Paris 2015 - Track 2.3 AXA (20)

محازةرةي يةكةم
محازةرةي يةكةممحازةرةي يةكةم
محازةرةي يةكةم
 
Varied encounters with data science (slide share)
Varied encounters with data science (slide share)Varied encounters with data science (slide share)
Varied encounters with data science (slide share)
 
How to build a Community of Practice (CoP) + How we build the Brussels Data S...
How to build a Community of Practice (CoP) + How we build the Brussels Data S...How to build a Community of Practice (CoP) + How we build the Brussels Data S...
How to build a Community of Practice (CoP) + How we build the Brussels Data S...
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
 
Creating Value With Artificial Intelligence, Rudradeb Mitra at Google Developers
Creating Value With Artificial Intelligence, Rudradeb Mitra at Google DevelopersCreating Value With Artificial Intelligence, Rudradeb Mitra at Google Developers
Creating Value With Artificial Intelligence, Rudradeb Mitra at Google Developers
 
Code mining : comment extraire et exploiter l’information détenue dans du cod...
Code mining : comment extraire et exploiter l’information détenue dans du cod...Code mining : comment extraire et exploiter l’information détenue dans du cod...
Code mining : comment extraire et exploiter l’information détenue dans du cod...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
DSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - MarkusDSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - Markus
 
Strata aerni 2015_09_30_1315
Strata aerni 2015_09_30_1315Strata aerni 2015_09_30_1315
Strata aerni 2015_09_30_1315
 
Business Impact From IoT? Just Add Data Science
Business Impact From IoT? Just Add Data ScienceBusiness Impact From IoT? Just Add Data Science
Business Impact From IoT? Just Add Data Science
 
Big Data – Is it a hype or for real?
 Big Data – Is it a hype or for real?  Big Data – Is it a hype or for real?
Big Data – Is it a hype or for real?
 
Howto build open_data_ecosystems_smart_cities_european experience
Howto build open_data_ecosystems_smart_cities_european experienceHowto build open_data_ecosystems_smart_cities_european experience
Howto build open_data_ecosystems_smart_cities_european experience
 
Machine Learning for Finance Master Class
Machine Learning for Finance Master Class Machine Learning for Finance Master Class
Machine Learning for Finance Master Class
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)
 
Presentation: How can Plan Ceibal Land into the Age of Big Data?
Presentation: How can Plan Ceibal Land into the Age of Big Data?Presentation: How can Plan Ceibal Land into the Age of Big Data?
Presentation: How can Plan Ceibal Land into the Age of Big Data?
 
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
Data Summer Conf 2018, “Architecting IoT system with Machine Learning (ENG)” ...
 
Architecting IoT with Machine Learning
Architecting IoT with Machine LearningArchitecting IoT with Machine Learning
Architecting IoT with Machine Learning
 
Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1Tutorial helsinki 20180313 v1
Tutorial helsinki 20180313 v1
 

Plus de Pôle Systematic Paris-Region

Plus de Pôle Systematic Paris-Region (20)

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
 
Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?
 
Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
 
Osis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritageOsis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritage
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
 

Dernier

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 

Dernier (20)

Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 

PyData Paris 2015 - Track 2.3 AXA

  • 1. WHITENING THE BLACKBOX : WHY AND HOW TO EXPLAIN MACHINE LEARNING PREDICTIONS ? PyData 2015 / Paris Christophe Bourguignat (@chris_bour) Marcin Detyniecki Bora Eang
  • 2. We are a new Data team We are Python & scikit-learn heavy users and hopefully contributors We like tricky problems, doubts, questions And love to share with data geeks about that ! DISCLAIMER - WHO ARE WE ? 2 | PyData 2015 - Paris
  • 3. Why explaining Machine Learning models ? 3 | PyData 2015 - Paris
  • 4. Why explaining Machine Learning models ? 4 | PyData 2015 - Paris Explain why a given loan application did not meet credit underwriting policy
  • 5. Why explaining Machine Learning models ? 5 | PyData 2015 - Paris Explain why a given loan application did not meet credit underwriting policy Explain why a given transaction is suspicious
  • 6. Why explaining Machine Learning models ? 6 | PyData 2015 - Paris Explain why a given loan application did not meet credit underwriting policy Explain why a given transaction is suspicious Explain why a given job is recommended for an unemployed
  • 7. Why explaining Machine Learning models ? 7 | PyData 2015 - Paris French « Conseil d’Etat » recommendation Impose to algorithm-based decisions a transparency requirement, on personal data used by the algorithm, and the general reasoning it followed. Give the person subject to the decision the possibility of submitting its observations.
  • 8. 8 | PyData 2015 - Paris
  • 9. What do we actually want ? 9 | PyData 2015 - Paris We don’t ask for a “typical profile” of the selected population We want a reason why an observation got selected by our algorithm This reason must be simple and understandable (actionable), but can be specific to it Observation A, next to observation B on our selected population, can be selected for a completely different reason http://www.holehouse.org/mlclass/07_Regularization.html
  • 10. Toy Example : Titanic Dataset (1/2) 10 | PyData 2015 - Paris
  • 11. Toy Example : Titanic Dataset (2/2) 11 | PyData 2015 - Paris These women were predicted with a high confidence as « survived »
  • 12. Toy Example : Titanic Dataset (2/2) 12 | PyData 2015 - Paris These women were predicted with as high confidence as « not survived » These women were predicted with a high confidence as « survived »
  • 13. Toy Example : Titanic Dataset (2/2) 13 | PyData 2015 - Paris These women were predicted with as high confidence as « not survived » These women were predicted with a high confidence as « survived » We are looking for a method saying : why ?
  • 14. A simple case : linear models 14 | PyData 2015 - Paris -5 0 5 10 1 2 3 4 Beta x Beta.x -5 0 5 10 1 2 3 4 Beta x Beta.x Observation 1 Observation 2
  • 15. Unfortunately, predictive models that are the most powerful are usually the least interpretable http://machinelearningmastery.com/model-prediction-versus-interpretation-in-machine-learning/
  • 16. scikit-learn includes the .feature_importances_ attribute …  Implementation from Breiman, Friedman, "Classification and regression trees", 1984 (“Gini Importance" or “Mean Decrease Impurity“)  Louppe, 2014 “Understanding Random Forests”, PhD dissertation  R package also implements “Mean Decrease Accuracy” … but has nothing to show features contribution for a given observation A complicated case : Random Forests 16 | PyData 2015 - Paris http://ngm.nationalgeographic.com/2012/12/sequoias/quammen-text
  • 17. « What if » explanation Sensitivity of the variable (i.e. derivative) Feature contribution  « path approach » How to interpret a forest? 17 | PyData 2015 - Paris MENTION DE CONFIDENTIALITÉ
  • 18. « What if » explanation Sensitivity of the variable (i.e. derivative) Feature contribution  « path approach » How to interpret a forest? 18 | PyData 2015 - Paris
  • 19. State Of The Art 19 | PyData 2015 - Paris http://arxiv.org/abs/1312.1121
  • 20. State Of The Art 20 | PyData 2015 - Paris http://blog.datadive.net/interpreting-random-forests/
  • 21. Scikit-Learn IPython demo 21 | PyData 2015 - Paris Playing with Titanic Data Traversing the trees and using a trivial metric :  +1 when a feature is crossed  + impurity when a feature is crossed Limitations : Scikit learn stores :  The number of samples for each node (tree_.n_node_samples)  The breakdown by class (tree_.value), but only for leaves
  • 22. Be able to compare subsets induced by each tree What do we need ? (1/2) 22 | PyData 2015 - Paris Mr Smith
  • 23. Be able to compare subsets induced by each tree What do we need ? (1/2) 23 | PyData 2015 - Paris Mr Smith
  • 24. Be able to compare subsets induced by each tree What do we need ? (1/2) 24 | PyData 2015 - Paris Mr Smith
  • 25. Be able to compare subsets induced by each tree What do we need ? (1/2) 25 | PyData 2015 - Paris Mr Smith
  • 26. Be able to compare subsets induced by each tree What do we need ? (1/2) 26 | PyData 2015 - Paris Mr Smith
  • 27. Ideally, we would need easy access to all nodes attributes :  Average_score  Node_size (absolute or %)  Number of class 0 samples (absolute or %)  Number of class 1 samples (absolute or %)  … For each tree  For each node - Metric += F(parent_node, node, left_child_node, right_child_node, brother_node) - E.g : F = parent_node.average_score – node.average_score What do we need (2/2) 27 | PyData 2015 - Paris
  • 28. Thank You and join us, we have many other problems to crack ! Christophe Bourguignat (@chris_bour) Marcin Detyniecki Bora Eang