SlideShare une entreprise Scribd logo
1  sur  26
Dr. Paolo Missier
School of Computing
Newcastle University
Innovation Opportunity of the GDPR for AI and ML
Digital Catapult London,
March 2nd, 2018
Transparency in ML and AI
(humble views from a concerned academic)
2
My current favourite book
<eventname>
How much of Big Data is My Data?
Is Data the problem?
Or the algorithms?
Or how much we trust them?
Is there a problem at all?
3
What matters?
<eventname>
Decisions made by processes based on algorithmically-generated
knowledge: Knowledge-Generating Systems (KGS)
• automatically filtering job applicants
• approving loans or other credit
• approving access to benefits schemes
• predicting insurance risk levels
• user profiling for policing purposes and to predict risk of criminal
recidivism
• identifying health risk factors
• …
4
GDPR and algorithmic decision making
<eventname>
Profiling is “any form of automated processing of personal data consisting of the use
of personal data to evaluate certain personal aspects relating to a natural person”
Thus profiling should be construed as a subset of processing, under two conditions:
the processing is automated, and the processing is for the purposes of evaluation.
Article 22: Automated individual decision-making, including profiling, paragraph
1 (see figure 1) prohibits any“decision based solely on automated processing,
including profiling” which “significantly affects” a data subject.
it stands to reason that an algorithm can only be explained if the trained model can be
articulated and understood by a human. It is reasonable to suppose that any adequate
explanation would provide an account of how input features relate to predictions:
- Is the model more or less likely to recommend a loan if the applicant is a minority?
- Which features play the largest role in prediction?
B. Goodman and S. Flaxman, “European Union regulations on algorithmic decision-making and a ‘right to explanation,’”
Proc. 2016 ICML Work. Hum. Interpret. Mach. Learn. (WHI 2016), Jun. 2016.
5
Heads up on the key questions:
• [to what extent, at what level] should lay people be educated about
algorithmic decision making?
• What mechanisms would you propose to engender trust in
algorithmic decision making?
• With regards to trust and transparency, what should Computer
Science researchers focus on?
• What kind of inter-disciplinary research do you see?
<eventname>
6
Recidivism Prediction Instruments (RPI)
<eventname>
• Increasingly popular within the criminal justice system
• Used or considered for use in pre-trial decision-making (USA)
Social debate and scholarly arguments…
Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s software used
across the country to predict future criminals. and it’s biased against blacks. 2016.
https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
black defendants who did not recidivate over a two-year period were
nearly twice as likely to be misclassified as higher risk compared to
their white counterparts (45 percent vs. 23 percent).
white defendants who re-offended within the next two years were
mistakenly labeled low risk almost twice as often as black re-offenders
(48 percent vs. 28 percent)
A. Chouldechova, “Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction
Instruments,” Big Data, vol. 5, no. 2, pp. 153–163, Jun. 2017.
In this paper we show that the differences in false positive and false negative rates
cited as evidence of racial bias in the ProPublica article are a direct consequence of
applying an instrument that is free from predictive bias to a population in which
recidivism prevalence differs across groups.
7
Opacity
<eventname>
J. Burrell, “How the machine ‘thinks’: Understanding opacity in machine learning algorithms,” Big Data
Soc., vol. 3, no. 1, p. 2053951715622512, 2016.
Three forms of opacity:
1- intentional corporate or state secrecy, institutional self-protection
2- opacity as technical illiteracy, writing (and reading) code is a specialist skill
• One proposed response is to make code available for scrutiny, through regulatory
means if necessary
3- mismatch between mathematical optimization in high-dimensionality characteristic of
machine learning and the demands of human-scale reasoning and styles of semantic
interpretation.
“Ultimately partnerships between legal scholars, social scientists, domain experts,
along with computer scientists may chip away at these challenging questions of
fairness in classification in light of the barrier of opacity”
8
<eventname>
But, is research focusing on the right problems?
Research and innovation:
React to threats,
Spot opportunities…
9
To recall…Predictive Data Analytics (Learning)
<eventname>
10
Interpretability (of machine learning models)
<eventname>
Z. C. Lipton, “The Mythos of Model Interpretability,” Proc. 2016 ICML Work. Hum. Interpret. Mach.
Learn. (WHI 2016), Jun. 2016.
- Transparency
- Are features understandable?
- Which features are more important?
- Post hoc interpretability
- Natural language explanations
- Visualisations of models
- Explanations by example
- “this tumor is classified as malignant
because to the model it looks a lot like
these other tumors”
11
“Why Should I Trust You?”
<eventname>
M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ : Explaining the Predictions of Any Classifier,” in Proceedings of
the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 2016, pp. 1135–1144.
Interpretability of model predictions has become a hot research topic in Machine Learning
“if the users do not trust a model or a prediction,
they will not use it”
By “explaining a prediction”, we mean presenting textual or visual artifacts that provide qualitative
understanding of the relationship between the instance’s components and the model’s prediction.
12
Explaining image classification
<eventname>
M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ : Explaining the Predictions of Any Classifier,” in Proceedings of
the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 2016, pp. 1135–1144.
13
Features: few, high level
<eventname>
SVM classifier, 94% accuracy
…but questionable!
14
Features
Volume: how many features contribute to the prediction?
Meaning : how suitable are the features for human interpretation?
• Raw: (low-level, non-semantic) signals such as images pixels
• Deep learning
• Visualisation ---- occlusion test
• Cases: Object recognition, and medical diagnosis
• Many features: (thousands is too many)
• Few, high-level features. -- is this the only chance?
15
Occlusion test for CNNs
Kemany, et al., Identifying Medical Diagnoses and treatable diseases by image based deep learning
Cell 2018
Zeiler, et al., Visualizing and Understanding Convolutional Networks, ECCV 2014
16
Attribute Learning
Layer for
Semantic Attributes
Neeraj Kumar, Alexander C. Berg, Peter N. Belhumeur, Shree K. Nayar,, "Describable Visual Attributes for Face
Verification and Image Search,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI),
vol. 33, no. 10, pp. 1962--1977, October 2011.
17
Can we control inferences made about us?
<eventname>
Facebook’s (and many other marketing companies) problem:
Personal characteristics are often hard to observe because of lack of data or
privacy restrictions
Solution: firms and governments increasingly depend on statistical inferences
drawn from available information.
Goal of the research:
- How to to give online users transparency into why certain inferences are
made about them by statistical models
- How to inhibit those inferences by hiding (“cloaking”) certain personal
information from inference
D. Chen, S. P. Fraiberger, R. Moakler, and F. Provost, “Enhancing Transparency and Control when Drawing Data-Driven
Inferences about Individuals,” in 2016 ICML Workshop on Human Interpretability in Machine network Learning (WHI
2016), 2016, pp. 21–25.
privacy invasions via statistical inferences are at least as
troublesome as privacy invasions based on revealing personal data
18
“Cloaking”
<eventname>
Which “evidence” in the input feature vectors is critical to make an accurate prediction?
evidence counterfactual: “what would the model have done if this evidence hadn’t been
present”?
Not an easy problem!
User 1 greatly affected
User 2 unaffected
19
Cloakability
<eventname>
How many Facebook
“Likes” should be “cloaked”
to inhibit a prediction?
Predicted
trait
Cloaking effort
=
Number of likes
to be removed
20
AI Guardians
<eventname>
A. Etzioni and O. Etzioni, “Designing AI Systems That Obey Our Laws and Values,” Commun. ACM, vol.
59, no. 9, pp. 29–31, Aug. 2016.
Operational AI systems (for example, self-driving cars) need to obey
both the law of the land and our values.
Why do we need oversight systems?
- AI systems learn continuously  they change over time
- AI systems are becoming opaque
- “black boxes” to human beings
- AI-guided systems have increasing autonomy
- they make choices “on their own.”
a major mission for AI is to develop in the near
future such AI oversight systems Auditors
Monitors
EnforcersEthics bots!
21
AI accountability – your next Pal?
<eventname>
Asked where AI systems are weak today, Veloso (*) says they should be more
transparent. "They need to explain themselves: why did they do this, why did
they do that, why did they detect this, why did they recommend that?
Accountability is absolutely necessary."
(*) Manuela Veloso, head of the Machine Learning Department at Carnegie-Mellon University
Gary Anthes. 2017. Artificial intelligence poised to ride a new wave. Commun. ACM 60, 7 (June 2017), 19-21.
DOI: https://doi.org/10.1145/3088342
IBM's Witbrock echoes the call for humanism in AI: …"It's an embodiment of a
human dream of having a patient, helpful, collaborative kind of companion."
22
A personal view
<eventname>
Hypothesis:
it is technically practical to provide a limited and IP-preserving degree of
transparency by surrounding and augmenting a black-box KGS with
metadata that describes the nature of its input, training and test data, and
can therefore be used to automatically generate explanations that can be
understood by lay persons.
Knowledge-Generating Systems (KGS)
…It’s the meta-data, stupid (*)
(*) https://en.wikipedia.org/wiki/It%27s_the_economy,_stupid
23
Something new to try, perhaps?
<eventname>
Contextualised
Classifications
Explanation
Service
KGS 2
limited
profile
KGS 1
limited
profile
Secure ledger
(Blockchain)
infomediary
users
User data
contributions
Shared
Vocabulary
And metadata
model
Informed
co-decision
process
KGS 2. (e.g. health)
Background
(Big) data
KGS 1 (e.g. pensions)
Background
(Big) data
Contextualised
Classifications
Secure ledger
(Blockchain)
- descriptive summary of background data
- high-level characterisation of algorithm
KGS profiles
Users instances
and classifications
Disclosure
policy
Disclosure
policy
Fig. 1
24
References (to take home)
<eventname>
• Gary Anthes. 2017. Artificial intelligence poised to ride a new wave. Commun. ACM 60, 7 (June 2017), 19-21. DOI:
https://doi.org/10.1145/3088342
• J. Burrell, “How the machine ‘thinks’: Understanding opacity in machine learning algorithms,” Big Data Soc., vol. 3, no. 1, p.
2053951715622512, 2016
• Caruana, Rich, Lou, Yin, Gehrke, Johannes, Koch, Paul, Sturm, Marc, and Elhadad, Noemie. Intelligible models for healthcare:
Predicting pneumonia risk and hospital 30-day readmission. In KDD, 2015
• D. Chen, S. P. Fraiberger, R. Moakler, and F. Provost, “Enhancing Transparency and Control when Drawing Data-Driven
Inferences about Individuals,” in 2016 ICML Workshop on Human Interpretability in Machine network Learning (WHI 2016), 2016,
pp. 21–25
• A. Chouldechova, “Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments,” Big Data, vol. 5,
no. 2, pp. 153–163, Jun. 2017.
• A. Etzioni and O. Etzioni, “Designing AI Systems That Obey Our Laws and Values,” Commun. ACM, vol. 59, no. 9, pp. 29–31, Aug.
2016.
• B. Goodman and S. Flaxman, “European Union regulations on algorithmic decision-making and a ‘right to explanation,’” Proc.
2016 ICML Work. Hum. Interpret. Mach. Learn. (WHI 2016), Jun. 2016.
• Kumar, et al. Describable visual attributes for face verification and image search PAMI, 2011. (Pattern Analysis and Machine
Intelligence)
• Z. C. Lipton, “The Mythos of Model Interpretability,” Proc. 2016 ICML Work. Hum. Interpret. Mach. Learn. (WHI 2016), Jun. 2016.
• M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ : Explaining the Predictions of Any Classifier,” in Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 2016, pp. 1135–1144.
• Zeiler, et al., Visualizing and Understanding Convolutional Networks, ECCV 2014
25
Questions to you:
• [to what extent, at what level] should lay people be educated about
algorithmic decision making?
• What mechanisms would you propose to engender trust in
algorithmic decision making?
• With regards to trust and transparency, what should Computer
Science researchers focus on?
• What kind of inter-disciplinary research do you see?
<eventname>
26
Scenarios
<eventname>
What kind of explanations would you request / expect / accept?
• My application for benefits has been denied but I am not sure why
• My insurance premium is higher than my partner’s, and it’s not clear
why
• My work performance has been deemed unsatisfactory, but I don’t
see why
• [can you suggest other scenarios close to your experience?]

Contenu connexe

Tendances

A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine Perception
Cory Andrew Henson
 
Introduction to Digital Biomarkers V1.0
Introduction to Digital Biomarkers V1.0Introduction to Digital Biomarkers V1.0
Introduction to Digital Biomarkers V1.0
Barry Vant-Hull
 

Tendances (20)

Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
 
Slima explainable deep learning using fuzzy logic human ist u fribourg ver 17...
Slima explainable deep learning using fuzzy logic human ist u fribourg ver 17...Slima explainable deep learning using fuzzy logic human ist u fribourg ver 17...
Slima explainable deep learning using fuzzy logic human ist u fribourg ver 17...
 
The Amazing Ways Artificial Intelligence Is Transforming Genomics and Gene Ed...
The Amazing Ways Artificial Intelligence Is Transforming Genomics and Gene Ed...The Amazing Ways Artificial Intelligence Is Transforming Genomics and Gene Ed...
The Amazing Ways Artificial Intelligence Is Transforming Genomics and Gene Ed...
 
323462348
323462348323462348
323462348
 
A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine Perception
 
Rapid COVID-19 Diagnosis Using Deep Learning of the Computerized Tomography ...
Rapid COVID-19 Diagnosis Using Deep Learning  of the Computerized Tomography ...Rapid COVID-19 Diagnosis Using Deep Learning  of the Computerized Tomography ...
Rapid COVID-19 Diagnosis Using Deep Learning of the Computerized Tomography ...
 
Prospects of Deep Learning in Medical Imaging
Prospects of Deep Learning in Medical ImagingProspects of Deep Learning in Medical Imaging
Prospects of Deep Learning in Medical Imaging
 
IRJET- Road Traffic Prediction using Machine Learning
IRJET- Road Traffic Prediction using Machine LearningIRJET- Road Traffic Prediction using Machine Learning
IRJET- Road Traffic Prediction using Machine Learning
 
Deep learning for biomedicine
Deep learning for biomedicineDeep learning for biomedicine
Deep learning for biomedicine
 
IRJET- Classifying Chest Pathology Images using Deep Learning Techniques
IRJET- Classifying Chest Pathology Images using Deep Learning TechniquesIRJET- Classifying Chest Pathology Images using Deep Learning Techniques
IRJET- Classifying Chest Pathology Images using Deep Learning Techniques
 
Fulltext02
Fulltext02Fulltext02
Fulltext02
 
COVID-19 detection from scarce chest X-Ray image data using few-shot deep lea...
COVID-19 detection from scarce chest X-Ray image data using few-shot deep lea...COVID-19 detection from scarce chest X-Ray image data using few-shot deep lea...
COVID-19 detection from scarce chest X-Ray image data using few-shot deep lea...
 
IRJET- Result on the Application for Multiple Disease Prediction from Symptom...
IRJET- Result on the Application for Multiple Disease Prediction from Symptom...IRJET- Result on the Application for Multiple Disease Prediction from Symptom...
IRJET- Result on the Application for Multiple Disease Prediction from Symptom...
 
Challenges in deep learning methods for medical imaging - Pubrica
Challenges in deep learning methods for medical imaging - PubricaChallenges in deep learning methods for medical imaging - Pubrica
Challenges in deep learning methods for medical imaging - Pubrica
 
Top downloaded article in academia 2020 - International Journal of Informatio...
Top downloaded article in academia 2020 - International Journal of Informatio...Top downloaded article in academia 2020 - International Journal of Informatio...
Top downloaded article in academia 2020 - International Journal of Informatio...
 
MitoGame: Gamification Method for Detecting Mitosis from Histopathological I...
MitoGame:  Gamification Method for Detecting Mitosis from Histopathological I...MitoGame:  Gamification Method for Detecting Mitosis from Histopathological I...
MitoGame: Gamification Method for Detecting Mitosis from Histopathological I...
 
Introduction to Digital Biomarkers V1.0
Introduction to Digital Biomarkers V1.0Introduction to Digital Biomarkers V1.0
Introduction to Digital Biomarkers V1.0
 
Intelligent generator of big data medical
Intelligent generator of big data medicalIntelligent generator of big data medical
Intelligent generator of big data medical
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaging
 

Similaire à Transparency in ML and AI (humble views from a concerned academic)

Defining the boundary for AI research in Intelligent Systems Dec 2021
Defining the boundary for AI research in Intelligent Systems Dec  2021Defining the boundary for AI research in Intelligent Systems Dec  2021
Defining the boundary for AI research in Intelligent Systems Dec 2021
Parasuram Balasubramanian
 
BB_54028_20160212 SVTLS AI_FINAL_shown
BB_54028_20160212 SVTLS AI_FINAL_shownBB_54028_20160212 SVTLS AI_FINAL_shown
BB_54028_20160212 SVTLS AI_FINAL_shown
Mark Holman
 
icon-aiincs-obusolini201809131800-190310184140.pptx
icon-aiincs-obusolini201809131800-190310184140.pptxicon-aiincs-obusolini201809131800-190310184140.pptx
icon-aiincs-obusolini201809131800-190310184140.pptx
yugandharadahiphale2
 
icon-aiincs-obusolini201809131800-190310184140.pptx
icon-aiincs-obusolini201809131800-190310184140.pptxicon-aiincs-obusolini201809131800-190310184140.pptx
icon-aiincs-obusolini201809131800-190310184140.pptx
yugandharadahiphale2
 

Similaire à Transparency in ML and AI (humble views from a concerned academic) (20)

Defining the boundary for AI research in Intelligent Systems Dec 2021
Defining the boundary for AI research in Intelligent Systems Dec  2021Defining the boundary for AI research in Intelligent Systems Dec  2021
Defining the boundary for AI research in Intelligent Systems Dec 2021
 
Tecnologías emergentes: priorizando al ciudadano
Tecnologías emergentes: priorizando al ciudadanoTecnologías emergentes: priorizando al ciudadano
Tecnologías emergentes: priorizando al ciudadano
 
BB_54028_20160212 SVTLS AI_FINAL_shown
BB_54028_20160212 SVTLS AI_FINAL_shownBB_54028_20160212 SVTLS AI_FINAL_shown
BB_54028_20160212 SVTLS AI_FINAL_shown
 
Industry Standards as vehicle to address socio-technical AI challenges
Industry Standards as vehicle to address socio-technical AI challengesIndustry Standards as vehicle to address socio-technical AI challenges
Industry Standards as vehicle to address socio-technical AI challenges
 
16190734.ppt
16190734.ppt16190734.ppt
16190734.ppt
 
inte
inteinte
inte
 
Beyond-Accuracy Perspectives: Explainability and Fairness
Beyond-Accuracy Perspectives: Explainability and FairnessBeyond-Accuracy Perspectives: Explainability and Fairness
Beyond-Accuracy Perspectives: Explainability and Fairness
 
Trusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceTrusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open Source
 
HUMAN RIGHTS IN THE AGE OF ARTIFICIAL INTELLIGENCE
HUMAN RIGHTS IN THE AGE OF ARTIFICIAL INTELLIGENCEHUMAN RIGHTS IN THE AGE OF ARTIFICIAL INTELLIGENCE
HUMAN RIGHTS IN THE AGE OF ARTIFICIAL INTELLIGENCE
 
Artificial Intelligence and Cybersecurity
Artificial Intelligence and CybersecurityArtificial Intelligence and Cybersecurity
Artificial Intelligence and Cybersecurity
 
Artificial intelligence-part-iii
Artificial intelligence-part-iiiArtificial intelligence-part-iii
Artificial intelligence-part-iii
 
Augmented intelligence as a response to the crisis of artificial intelligence
Augmented intelligence as a response to the crisis of artificial intelligenceAugmented intelligence as a response to the crisis of artificial intelligence
Augmented intelligence as a response to the crisis of artificial intelligence
 
Data ethics and machine learning: discrimination, algorithmic bias, and how t...
Data ethics and machine learning: discrimination, algorithmic bias, and how t...Data ethics and machine learning: discrimination, algorithmic bias, and how t...
Data ethics and machine learning: discrimination, algorithmic bias, and how t...
 
Fontys Eric van Tol
Fontys Eric van TolFontys Eric van Tol
Fontys Eric van Tol
 
Qual, Mixed, Machine and Everything in Between
Qual, Mixed, Machine and Everything in BetweenQual, Mixed, Machine and Everything in Between
Qual, Mixed, Machine and Everything in Between
 
The Need for Deep Learning Transparency
The Need for Deep Learning TransparencyThe Need for Deep Learning Transparency
The Need for Deep Learning Transparency
 
Schlussreferat: Bias in Algorithmen Marcel Blattner, Chief Data Scientist, Ta...
Schlussreferat: Bias in Algorithmen Marcel Blattner, Chief Data Scientist, Ta...Schlussreferat: Bias in Algorithmen Marcel Blattner, Chief Data Scientist, Ta...
Schlussreferat: Bias in Algorithmen Marcel Blattner, Chief Data Scientist, Ta...
 
icon-aiincs-obusolini201809131800-190310184140.pptx
icon-aiincs-obusolini201809131800-190310184140.pptxicon-aiincs-obusolini201809131800-190310184140.pptx
icon-aiincs-obusolini201809131800-190310184140.pptx
 
icon-aiincs-obusolini201809131800-190310184140.pptx
icon-aiincs-obusolini201809131800-190310184140.pptxicon-aiincs-obusolini201809131800-190310184140.pptx
icon-aiincs-obusolini201809131800-190310184140.pptx
 
Artificial intelligence: PwC Top Issues
Artificial intelligence: PwC Top IssuesArtificial intelligence: PwC Top Issues
Artificial intelligence: PwC Top Issues
 

Plus de Paolo Missier

Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
Paolo Missier
 

Plus de Paolo Missier (20)

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Provenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-ComputationProvenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-Computation
 
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
 
Selective and incremental re-computation in reaction to changes: an exercise ...
Selective and incremental re-computation in reaction to changes: an exercise ...Selective and incremental re-computation in reaction to changes: an exercise ...
Selective and incremental re-computation in reaction to changes: an exercise ...
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Transparency in ML and AI (humble views from a concerned academic)

  • 1. Dr. Paolo Missier School of Computing Newcastle University Innovation Opportunity of the GDPR for AI and ML Digital Catapult London, March 2nd, 2018 Transparency in ML and AI (humble views from a concerned academic)
  • 2. 2 My current favourite book <eventname> How much of Big Data is My Data? Is Data the problem? Or the algorithms? Or how much we trust them? Is there a problem at all?
  • 3. 3 What matters? <eventname> Decisions made by processes based on algorithmically-generated knowledge: Knowledge-Generating Systems (KGS) • automatically filtering job applicants • approving loans or other credit • approving access to benefits schemes • predicting insurance risk levels • user profiling for policing purposes and to predict risk of criminal recidivism • identifying health risk factors • …
  • 4. 4 GDPR and algorithmic decision making <eventname> Profiling is “any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person” Thus profiling should be construed as a subset of processing, under two conditions: the processing is automated, and the processing is for the purposes of evaluation. Article 22: Automated individual decision-making, including profiling, paragraph 1 (see figure 1) prohibits any“decision based solely on automated processing, including profiling” which “significantly affects” a data subject. it stands to reason that an algorithm can only be explained if the trained model can be articulated and understood by a human. It is reasonable to suppose that any adequate explanation would provide an account of how input features relate to predictions: - Is the model more or less likely to recommend a loan if the applicant is a minority? - Which features play the largest role in prediction? B. Goodman and S. Flaxman, “European Union regulations on algorithmic decision-making and a ‘right to explanation,’” Proc. 2016 ICML Work. Hum. Interpret. Mach. Learn. (WHI 2016), Jun. 2016.
  • 5. 5 Heads up on the key questions: • [to what extent, at what level] should lay people be educated about algorithmic decision making? • What mechanisms would you propose to engender trust in algorithmic decision making? • With regards to trust and transparency, what should Computer Science researchers focus on? • What kind of inter-disciplinary research do you see? <eventname>
  • 6. 6 Recidivism Prediction Instruments (RPI) <eventname> • Increasingly popular within the criminal justice system • Used or considered for use in pre-trial decision-making (USA) Social debate and scholarly arguments… Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. 2016. https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm black defendants who did not recidivate over a two-year period were nearly twice as likely to be misclassified as higher risk compared to their white counterparts (45 percent vs. 23 percent). white defendants who re-offended within the next two years were mistakenly labeled low risk almost twice as often as black re-offenders (48 percent vs. 28 percent) A. Chouldechova, “Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments,” Big Data, vol. 5, no. 2, pp. 153–163, Jun. 2017. In this paper we show that the differences in false positive and false negative rates cited as evidence of racial bias in the ProPublica article are a direct consequence of applying an instrument that is free from predictive bias to a population in which recidivism prevalence differs across groups.
  • 7. 7 Opacity <eventname> J. Burrell, “How the machine ‘thinks’: Understanding opacity in machine learning algorithms,” Big Data Soc., vol. 3, no. 1, p. 2053951715622512, 2016. Three forms of opacity: 1- intentional corporate or state secrecy, institutional self-protection 2- opacity as technical illiteracy, writing (and reading) code is a specialist skill • One proposed response is to make code available for scrutiny, through regulatory means if necessary 3- mismatch between mathematical optimization in high-dimensionality characteristic of machine learning and the demands of human-scale reasoning and styles of semantic interpretation. “Ultimately partnerships between legal scholars, social scientists, domain experts, along with computer scientists may chip away at these challenging questions of fairness in classification in light of the barrier of opacity”
  • 8. 8 <eventname> But, is research focusing on the right problems? Research and innovation: React to threats, Spot opportunities…
  • 9. 9 To recall…Predictive Data Analytics (Learning) <eventname>
  • 10. 10 Interpretability (of machine learning models) <eventname> Z. C. Lipton, “The Mythos of Model Interpretability,” Proc. 2016 ICML Work. Hum. Interpret. Mach. Learn. (WHI 2016), Jun. 2016. - Transparency - Are features understandable? - Which features are more important? - Post hoc interpretability - Natural language explanations - Visualisations of models - Explanations by example - “this tumor is classified as malignant because to the model it looks a lot like these other tumors”
  • 11. 11 “Why Should I Trust You?” <eventname> M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ : Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 2016, pp. 1135–1144. Interpretability of model predictions has become a hot research topic in Machine Learning “if the users do not trust a model or a prediction, they will not use it” By “explaining a prediction”, we mean presenting textual or visual artifacts that provide qualitative understanding of the relationship between the instance’s components and the model’s prediction.
  • 12. 12 Explaining image classification <eventname> M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ : Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 2016, pp. 1135–1144.
  • 13. 13 Features: few, high level <eventname> SVM classifier, 94% accuracy …but questionable!
  • 14. 14 Features Volume: how many features contribute to the prediction? Meaning : how suitable are the features for human interpretation? • Raw: (low-level, non-semantic) signals such as images pixels • Deep learning • Visualisation ---- occlusion test • Cases: Object recognition, and medical diagnosis • Many features: (thousands is too many) • Few, high-level features. -- is this the only chance?
  • 15. 15 Occlusion test for CNNs Kemany, et al., Identifying Medical Diagnoses and treatable diseases by image based deep learning Cell 2018 Zeiler, et al., Visualizing and Understanding Convolutional Networks, ECCV 2014
  • 16. 16 Attribute Learning Layer for Semantic Attributes Neeraj Kumar, Alexander C. Berg, Peter N. Belhumeur, Shree K. Nayar,, "Describable Visual Attributes for Face Verification and Image Search,” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 33, no. 10, pp. 1962--1977, October 2011.
  • 17. 17 Can we control inferences made about us? <eventname> Facebook’s (and many other marketing companies) problem: Personal characteristics are often hard to observe because of lack of data or privacy restrictions Solution: firms and governments increasingly depend on statistical inferences drawn from available information. Goal of the research: - How to to give online users transparency into why certain inferences are made about them by statistical models - How to inhibit those inferences by hiding (“cloaking”) certain personal information from inference D. Chen, S. P. Fraiberger, R. Moakler, and F. Provost, “Enhancing Transparency and Control when Drawing Data-Driven Inferences about Individuals,” in 2016 ICML Workshop on Human Interpretability in Machine network Learning (WHI 2016), 2016, pp. 21–25. privacy invasions via statistical inferences are at least as troublesome as privacy invasions based on revealing personal data
  • 18. 18 “Cloaking” <eventname> Which “evidence” in the input feature vectors is critical to make an accurate prediction? evidence counterfactual: “what would the model have done if this evidence hadn’t been present”? Not an easy problem! User 1 greatly affected User 2 unaffected
  • 19. 19 Cloakability <eventname> How many Facebook “Likes” should be “cloaked” to inhibit a prediction? Predicted trait Cloaking effort = Number of likes to be removed
  • 20. 20 AI Guardians <eventname> A. Etzioni and O. Etzioni, “Designing AI Systems That Obey Our Laws and Values,” Commun. ACM, vol. 59, no. 9, pp. 29–31, Aug. 2016. Operational AI systems (for example, self-driving cars) need to obey both the law of the land and our values. Why do we need oversight systems? - AI systems learn continuously  they change over time - AI systems are becoming opaque - “black boxes” to human beings - AI-guided systems have increasing autonomy - they make choices “on their own.” a major mission for AI is to develop in the near future such AI oversight systems Auditors Monitors EnforcersEthics bots!
  • 21. 21 AI accountability – your next Pal? <eventname> Asked where AI systems are weak today, Veloso (*) says they should be more transparent. "They need to explain themselves: why did they do this, why did they do that, why did they detect this, why did they recommend that? Accountability is absolutely necessary." (*) Manuela Veloso, head of the Machine Learning Department at Carnegie-Mellon University Gary Anthes. 2017. Artificial intelligence poised to ride a new wave. Commun. ACM 60, 7 (June 2017), 19-21. DOI: https://doi.org/10.1145/3088342 IBM's Witbrock echoes the call for humanism in AI: …"It's an embodiment of a human dream of having a patient, helpful, collaborative kind of companion."
  • 22. 22 A personal view <eventname> Hypothesis: it is technically practical to provide a limited and IP-preserving degree of transparency by surrounding and augmenting a black-box KGS with metadata that describes the nature of its input, training and test data, and can therefore be used to automatically generate explanations that can be understood by lay persons. Knowledge-Generating Systems (KGS) …It’s the meta-data, stupid (*) (*) https://en.wikipedia.org/wiki/It%27s_the_economy,_stupid
  • 23. 23 Something new to try, perhaps? <eventname> Contextualised Classifications Explanation Service KGS 2 limited profile KGS 1 limited profile Secure ledger (Blockchain) infomediary users User data contributions Shared Vocabulary And metadata model Informed co-decision process KGS 2. (e.g. health) Background (Big) data KGS 1 (e.g. pensions) Background (Big) data Contextualised Classifications Secure ledger (Blockchain) - descriptive summary of background data - high-level characterisation of algorithm KGS profiles Users instances and classifications Disclosure policy Disclosure policy Fig. 1
  • 24. 24 References (to take home) <eventname> • Gary Anthes. 2017. Artificial intelligence poised to ride a new wave. Commun. ACM 60, 7 (June 2017), 19-21. DOI: https://doi.org/10.1145/3088342 • J. Burrell, “How the machine ‘thinks’: Understanding opacity in machine learning algorithms,” Big Data Soc., vol. 3, no. 1, p. 2053951715622512, 2016 • Caruana, Rich, Lou, Yin, Gehrke, Johannes, Koch, Paul, Sturm, Marc, and Elhadad, Noemie. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In KDD, 2015 • D. Chen, S. P. Fraiberger, R. Moakler, and F. Provost, “Enhancing Transparency and Control when Drawing Data-Driven Inferences about Individuals,” in 2016 ICML Workshop on Human Interpretability in Machine network Learning (WHI 2016), 2016, pp. 21–25 • A. Chouldechova, “Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments,” Big Data, vol. 5, no. 2, pp. 153–163, Jun. 2017. • A. Etzioni and O. Etzioni, “Designing AI Systems That Obey Our Laws and Values,” Commun. ACM, vol. 59, no. 9, pp. 29–31, Aug. 2016. • B. Goodman and S. Flaxman, “European Union regulations on algorithmic decision-making and a ‘right to explanation,’” Proc. 2016 ICML Work. Hum. Interpret. Mach. Learn. (WHI 2016), Jun. 2016. • Kumar, et al. Describable visual attributes for face verification and image search PAMI, 2011. (Pattern Analysis and Machine Intelligence) • Z. C. Lipton, “The Mythos of Model Interpretability,” Proc. 2016 ICML Work. Hum. Interpret. Mach. Learn. (WHI 2016), Jun. 2016. • M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ : Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 2016, pp. 1135–1144. • Zeiler, et al., Visualizing and Understanding Convolutional Networks, ECCV 2014
  • 25. 25 Questions to you: • [to what extent, at what level] should lay people be educated about algorithmic decision making? • What mechanisms would you propose to engender trust in algorithmic decision making? • With regards to trust and transparency, what should Computer Science researchers focus on? • What kind of inter-disciplinary research do you see? <eventname>
  • 26. 26 Scenarios <eventname> What kind of explanations would you request / expect / accept? • My application for benefits has been denied but I am not sure why • My insurance premium is higher than my partner’s, and it’s not clear why • My work performance has been deemed unsatisfactory, but I don’t see why • [can you suggest other scenarios close to your experience?]

Notes de l'éditeur

  1. Individuals as well as businesses, which we will initially refer to as subjects (and later upgrade to active participants), increasingly find themselves at the receiving end of impactful decisions made by organisations on their behalf, based on processes that use algorithmically-generated knowledge.
  2. 3. we cannot look at the code directly for many important algorithms of classification that are in wide- spread use. This opacity (at one level) exists because of proprietary concerns. They are closed in order to main- tain competitive advantage and/or to keep a few steps ahead of adversaries. Adversaries could be other com- panies in the market or malicious attackers (relevant in many network security applications). However, it is possible to investigate the general computational designs that we know these algorithms use by drawing from educational materials. Machine learning models that prove useful (specifically, in terms of the ‘accuracy’ of classification) possess a degree of unavoidable complexity In a ‘Big Data’ era, billions or trillions of data examples and thousands or tens of thousands of prop- erties of the data (termed ‘features’ in machine learning) may be analyzed. The internal decision logic of the algorithm is altered as it ‘learns’ on training data. Handling a huge number especially of heterogeneous properties of data (i.e. not just words in spam email, but also email header info) adds complexity to the code. Machine
  3. Brings about the issue of trust in the models. Should I use the prediction? “Determining trust in individual predictions is an importantproblem when the model is used for decision making. When using machine learning for medical diagnosis [6] or terrorism detection, for example, predictions cannot be acted upon on blind faith, as the consequences may be catastrophic”
  4. Notes for Paolo: by checking significant performance decrease for masks in different locations
  5. information disclosed on social network sites (such as Facebook) can be used to predict personal characteristics with surprisingly high accuracy We introduce the idea of a “cloaking device” as a vehicle to offer users control over inferences,
  6. Kim, Been. Interactive and interpretable machine learning models for human machine collaboration. PhD thesis, Massachusetts Institute of Technology, 2015.