SlideShare une entreprise Scribd logo
1  sur  36
Paolo Missier
School of Computing
Newcastle University
Supporting Algorithm Accountability using Provenance
A ProvenanceWeek 2018 workshop
London, July 12th, 2018
Transparency and fairness of predictive models, and
the provenance of the data used to build them:
thoughts and challenges
2
One of my favourite books
<eventname>
How much of Big Data is My Data?
Is Data the problem?
Or the algorithms?
Or how much we trust them?
Is there a problem at all?
3
What matters?
<eventname>
• automatically filtering job applicants
• approving loans or other credit
• approving access to benefits schemes
• predicting insurance risk levels
• user profiling for policing purposes and to predict risk of criminal
recidivism
• identifying health risk factors
• …
Decisions made based on algorithmically-generated knowledge:
4
GDPR and algorithmic decision making
<eventname>
Article 22: Automated individual decision-making, including profiling, paragraph
1 (see figure 1) prohibits any“decision based solely on automated processing,
including profiling” which “significantly affects” a data subject.
it stands to reason that an algorithm can only be explained if the trained model can be
articulated and understood by a human.
It is reasonable to suppose that any adequate explanation would provide an account of
how input features relate to predictions:
- Is the model more or less likely to recommend a loan if the applicant is a minority?
- Which features play the largest role in prediction?
B. Goodman and S. Flaxman, “European Union regulations on algorithmic decision-making and a ‘right to explanation,’”
Proc. 2016 ICML Work. Hum. Interpret. Mach. Learn. (WHI 2016), Jun. 2016.
5
Interest is qualified and increasing
7
Transparency and interpretability
<eventname>
Of algorithms:  ML approaches  Model explanations
Of data:  data-based explanations  provenance
?
8
Interpretability (of machine learning models)
<eventname>
Z. C. Lipton, “The Mythos of Model Interpretability,” Proc. 2016 ICML Work. Hum. Interpret. Mach.
Learn. (WHI 2016), Jun. 2016.
- Transparency
- Are features understandable?
- Which features are more important?
- Post hoc interpretability
- Natural language explanations
- Visualisations of models
- Explanations by example
- “this tumor is classified as malignant
because to the model it looks a lot like
these other tumors”
W. Samek, T. Wiegand, and K.-R. Müller, “Explainable Artificial Intelligence: Understanding, Visualizing
and Interpreting Deep Learning Models,” Aug. 2017.
Interpretability: Ability to provide a qualitative understanding between the input
variables and the response
9
Black-box approaches
<eventname>
Model agnostic:
An explainer should be able to explain any model, and thus be model-
agnostic (i.e. treat the original model as a black box)
Local fidelity:
for an explanation to be meaningful it must at least be locally faithful, i.e. it
must correspond to how the model behaves in the vicinity of the instance
being predicted
10
Occlusion testing
<eventname>
M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ : Explaining the Predictions of Any Classifier,” in Proceedings of
the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 2016, pp. 1135–1144.
11
Expected accuracy not enough for trust
<eventname>
SVM classifier, 94% accuracy
…but questionable!
13
LIME
<eventname>
Model agnostic
Locally faithful: it must
correspond to how the model
behaves in the vicinity of the
instance being predicted
M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ : Explaining the Predictions of Any Classifier,” in Proceedings of
the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 2016, pp. 1135–1144.
14
Other model explanation approaches
<eventname>
[1] Lakkaraju, H., Kamar, E., Caruana, R., & Leskovec, J. (2017). Interpretable & Explorable
Approximations of Black Box Models. arXiv preprint arXiv:1707.01154.
[2] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, “Intelligible models for healthcare:
Predicting pneumonia risk and hospital 30-day readmission,” in Proceedings of the 21th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 1721–1730.
1. Black Box Explanations through Transparent Approximations (BETA) [1]
• Decision Set approximation of black box models
• Fidelity + interpretability of the explanation
• Global (unlike LIME)
2. Intelligible additive models [2]
• General Additive Model (GAM)
• Pairwise interactions General Additive Model (GA2M)
15
Data  Model  Predictions
<eventname>
Model
Population data pre-processing
Raw
datasets
features
Predicted you:
- Ranking
- Score
- Class
Data
collection
Instances
Key decisions are made during data collection:
- Where does the data come from?
- What’s in the dataset?
Complementing current ML approaches to model interpretability
16
Possible roles for provenance
<eventname>
1) Data acquisition: Provenance  Transparency  Trust
17
Data  Model  Predictions
<eventname>
Model
Population data pre-processing
Raw
datasets
features
Predicted you:
- Ranking
- Score
- Class
Data
collection
Instances
Key decisions are made during
- Data collection:
- where does the data come from? What’s in
the dataset?
- Data preparation: how was it pre-processed?
1. Can we explain these decisions?
2. Are these explanations useful?
18
Explaining data preparation
PaoloMissier(Computing),DennisPrangle(Stats)
Data
collection
Model
Population data pre-processing
Raw
datasets
features
Predicted you:
- Ranking
- Score
- Class
- Integration
- Cleaning
- Outlier removal
- Normalisation
- Feature selection
- Class rebalancing
- Sampling
- Stratification
- …
Data acquisition and wrangling:
- How were datasets acquired?
- How recently?
- For what purpose?
- Are they being reused /
repurposed?
- What is their quality?
Instances
- Scripts  Python / TensorFlow, Pandas, Spark
- Workflows  Knime, …
Provenance  Transparency
19
Provenance for transparency
<eventname>
1. Collection
- Program-level
- System-level
2. Representation
- W3C PROV (for interoperability)
- Multiple proprietary formats (for efficient encoding)
3. Querying / analysis
• RDBMS
• GDBMS
• RDF / SPARQL
• Configuration of each pre-processing step
• Data dependency graph
- Which kind of normalisation did you apply?
- Was the data (down/up) sampled? How?
- How did you define / remove outliers?
- How did you window your time series?
- Was the data repurposed (acquired from a repository)?
- How was the original protocol defined?
20
Example
<eventname>
• The classic ”Titanic” dataset
• Can you predict survival probabilities?
• A simple logistic regression analysis
Survived - Survival (0 = No; 1 = Yes)
Pclass - Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
Name - Name
Sex - Sex
Age - Age
SibSp - Number of Siblings/Spouses Aboard
Parch - Number of Parents/Children Aboard
Ticket - Ticket Number
Fare - Passenger Fare (British pound)
Cabin - Cabin
Embarked - Port of Embarkation (C = Cherbourg; Q =
Queenstown; S = Southampton)
21
Enable analysis of data pre-processing
<eventname>
Managing
missing
values
Is the target
class
balanced?
• Data preparation workflow includes a number of decisions
Dropping
irrelevant
attributes
PassengerId',
'Name',
'Ticket',
'Cabin'
Dropping
correlated
features (?)
Age missing in
714/891 records
“Pclass is a
good predictor
for age”
Impute Age values
using average age
for PClass
Drop
“Fare”, “Pclass”
22
Example: missing values imputation
<eventname>
23
Exploring the effect of alternative pre-processing
<eventname>
D
P1 D1 Learn M1 Predict
x
y1
How can knowledge of P1, P2 help understand why y1 ≠ y2 ?
Ex. Alternative imputation methods for missing values
Ex. Boost minority class / downsample majority class
P2 D2 Learn M2 Predict y2
y1 ≠ y2
24
Also: script alludes to human decisions
<eventname>
How do we capture these decisions?
To what extent can they be inferred from code?
25
Correlation analysis
<eventname>
• Is Pclass really a good
predictor for Age?
• Why drop both PClass
and Fare?
1. Dropped Age only
(Nearly identical performance (F1=0.77, 0.76))
2. Use sex, Pclass only
Alternative pre-processing:
26
Possible roles for provenance
<eventname>
1) Data acquisition: Provenance  Transparency  Trust
2) Data transformation: Provenance  explanations
- Is data preparation correct?
- Is training data fit to learn from?
- What is the effect of alternative pre-processing?
- Can we infer data prep decisions from pre-processing code?
27
Bias (in ML)
<eventname>
(*) Mitchell, T. M. (1980). The need for biases in learning generalizations. Tech. rep. CBMTR-117,
Rutgers University, New Brunswick, NJ
Bias: “Any basis for choosing one generalization [hypothesis] over another,
other than strict consistency with the observed training instances." (*)
Absolute bias:
• certain hypotheses are entirely eliminated from the hypothesis space)
• Eg “A priori choice of model (decision trees, SVM, NN, …)
Relative bias:
• certain hypotheses are preferred over others
• Eg “prefer shallow simple decision trees to deep ones”
28
Fairness and bias: the (notorious) COMPAS case
<eventname>
• Increasingly popular within the criminal justice system
• Used or considered for use in pre-trial decision-making (USA)
1: The initial claim
Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s
software used across the country to predict future criminals. and it’s biased against
blacks. 2016.
https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-
algorithm
black defendants who did not recidivate over a two-year period were nearly twice as
likely to be misclassified as higher risk compared to their white counterparts (45
percent vs. 23 percent).
white defendants who re-offended within the next two years were mistakenly labeled
low risk almost twice as often as black re-offenders (48 percent vs. 28 percent)
29
Model Fairness and data bias
<eventname>
A. Chouldechova, “Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction
Instruments,” Big Data, vol. 5, no. 2, pp. 153–163, Jun. 2017.
In this paper we show that the differences in false positive and false negative rates cited as
evidence of racial bias in the ProPublica article are a direct consequence of applying an instrument
that is free from predictive bias to a population in which recidivism prevalence differs across
groups.
COMPAS complies with the test fairness condition:
Observed P(Y | S=s) largely independent of R
30
COMPAS Scores are skewed
<eventname>
- scores for white defendants were skewed toward lower-risk categories,
while black defendants were evenly distributed across scores
- large discrepancies in FPR and FNR between Black and White defendants
- … but this does not mean that the score itself is unfair
6,172 defendants
who had not been
arrested for a new
offense or who had
recidivated within two
years
31
FPR / FNR
<eventname>
positive predictive value of Sc:
The test fairness condition (2.1) can be expressed as the constraint that PPV
does not depend on R
recidivism prevalence within groups:
False positive rate:
False negative rate:
When the recidivism prevalence differs between two groups, a test-fair score
cannot have equal FPR, FNR across those groups
32
The actual “provenance” of the analysis
<eventname>
https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
Data acquisition + transformation  Model bias and fairness
- Can knowledge of data prep explain model bias?
- Does data prep introduce / remove bias?
33
Fairness: many possible definitions
<eventname>
(*) M. J. Kusner, J. Loftus, C. Russell, and R. Silva, “Counterfactual Fairness,” in Advances in Neural
Information Processing Systems 30, I. Guyon, U. V Luxburg, S. Bengio, H. Wallach, R. Fergus,
S.Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, pp. 4066–4076.
34
Causality and counterfactual fairness
<eventname>
aggressive
driving
accident
rate
red cars
preference
Driver’s
race
Latent
Protected
Predicted
Observable
• Individuals belonging to a race A are more likely to drive red cars (A  X)
• However, race is not a good predictor for either U or Y
• Aggressive drivers tend to prefer red cars (U  X)
Using X to predict Y leads to a counterfactually unfair model:
• it may charge individuals of a certain race more than others, even though no
race is more likely to have an accident
Is knowledge of data prep useful at all to determine
this kind of fairness?
35
Possible roles for provenance
<eventname>
1) Data acquisition: Provenance  Transparency  Trust
2) Data transformation: Provenance  explanations
- Is data preparation correct?
- Is training data fit to learn from?
- What is the effect of alternative pre-processing?
3) Data acquisition + transformation  Model bias and fairness
- Is provenance useful to diagnose an unfair / biased model?
- Does data prep introduce / remove bias?
36
Opportunities and challenges: Summary
<eventname>
1) Data acquisition: Provenance  Transparency  Trust
2) Data transformation: Provenance  explanations
- Is data preparation correct?
- Is training data fit to learn from?
- What is the effect of alternative pre-processing?
3) Data acquisition + transformation  Model bias and fairness
- Is provenance useful to diagnose an unfair / biased model?
- Does data prep introduce / remove bias?
37
A few initial references
[1] C. O’Neill, Weapons of Math Destruction. Crown books, 2016.
[2] B. Goodman and S. Flaxman, “European Union regulations on algorithmic decision-making and a ‘right to
explanation,’” Proc. 2016 ICML Work. Hum. Interpret. Mach. Learn. (WHI 2016), Jun. 2016.
[3] M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ : Explaining the Predictions of Any
Classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining - KDD ’16, 2016, pp. 1135–1144.
[4] H. Lakkaraju, S. H. Bach, and J. Leskovec, “Interpretable Decision Sets: A Joint Framework for Description and
Prediction,” in
Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016,
pp. 1675–1684.
[5] K. Yang and J. Stoyanovich, “Measuring Fairness in Ranked Outputs,” in Proceedings of the 29th International
Conference on Scientific and Statistical Database Management - SSDBM ’17, 2017, pp. 1–6.
[6] T. Gebru et al., “Datasheets for Datasets,” 2108.
[7] Z. Abedjan, L. Golab, and F. Naumann, “Profiling relational data: a survey,” VLDB J., vol. 24, no. 557, 2015.
[8] A. Weller, “Challenges for Transparency,” in Proceedings of the 2016 ICML Workshop on Human Interpretability
in Machine Learning (WHI 2016).
[8] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, “Intelligible models for healthcare: Predicting
pneumonia risk and hospital 30-day readmission,” in Proceedings of the 21th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, 2015, pp. 1721–1730.
38
Thank you
<eventname>
Paolo.Missier@newcastle.ac.uk
School of Computing, Newcastle University
http://tinyurl.com/paolomissier
LinkedIn: www.linkedin.com/in/paolomissier
Twitter: @Pmissier

Contenu connexe

Tendances

Introduction to Digital Biomarkers V1.0
Introduction to Digital Biomarkers V1.0Introduction to Digital Biomarkers V1.0
Introduction to Digital Biomarkers V1.0Barry Vant-Hull
 
Deep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining IIDeep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining IIDeakin University
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET Journal
 
An Extensive Review on Generative Adversarial Networks GAN’s
An Extensive Review on Generative Adversarial Networks GAN’sAn Extensive Review on Generative Adversarial Networks GAN’s
An Extensive Review on Generative Adversarial Networks GAN’sijtsrd
 
Deep learning for episodic interventional data
Deep learning for episodic interventional dataDeep learning for episodic interventional data
Deep learning for episodic interventional dataDeakin University
 
IRJET- Classifying Chest Pathology Images using Deep Learning Techniques
IRJET- Classifying Chest Pathology Images using Deep Learning TechniquesIRJET- Classifying Chest Pathology Images using Deep Learning Techniques
IRJET- Classifying Chest Pathology Images using Deep Learning TechniquesIRJET Journal
 
IRJET- Prediction of Heart Disease using RNN Algorithm
IRJET- Prediction of Heart Disease using RNN AlgorithmIRJET- Prediction of Heart Disease using RNN Algorithm
IRJET- Prediction of Heart Disease using RNN AlgorithmIRJET Journal
 
DATA AUGMENTATION TECHNIQUES AND TRANSFER LEARNING APPROACHES APPLIED TO FACI...
DATA AUGMENTATION TECHNIQUES AND TRANSFER LEARNING APPROACHES APPLIED TO FACI...DATA AUGMENTATION TECHNIQUES AND TRANSFER LEARNING APPROACHES APPLIED TO FACI...
DATA AUGMENTATION TECHNIQUES AND TRANSFER LEARNING APPROACHES APPLIED TO FACI...ijaia
 
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATAA BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATAIJSCAI Journal
 
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONREVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONijaia
 
MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR ML
MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR MLMITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR ML
MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR MLijaia
 
Intelligent data analysis for medicinal diagnosis
Intelligent data analysis for medicinal diagnosisIntelligent data analysis for medicinal diagnosis
Intelligent data analysis for medicinal diagnosisIRJET Journal
 
A survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsA survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsJoseph Paul Cohen PhD
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.docbutest
 
University at Buffalo’s Center for Computational Research
University at Buffalo’s Center for Computational ResearchUniversity at Buffalo’s Center for Computational Research
University at Buffalo’s Center for Computational ResearchAllineaSoftware
 
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and PrimersGASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and Primersijdmtaiir
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedSri Ambati
 

Tendances (20)

Introduction to Digital Biomarkers V1.0
Introduction to Digital Biomarkers V1.0Introduction to Digital Biomarkers V1.0
Introduction to Digital Biomarkers V1.0
 
Deep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining IIDeep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining II
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
 
An Extensive Review on Generative Adversarial Networks GAN’s
An Extensive Review on Generative Adversarial Networks GAN’sAn Extensive Review on Generative Adversarial Networks GAN’s
An Extensive Review on Generative Adversarial Networks GAN’s
 
323462348
323462348323462348
323462348
 
Deep learning for episodic interventional data
Deep learning for episodic interventional dataDeep learning for episodic interventional data
Deep learning for episodic interventional data
 
IRJET- Classifying Chest Pathology Images using Deep Learning Techniques
IRJET- Classifying Chest Pathology Images using Deep Learning TechniquesIRJET- Classifying Chest Pathology Images using Deep Learning Techniques
IRJET- Classifying Chest Pathology Images using Deep Learning Techniques
 
IRJET- Prediction of Heart Disease using RNN Algorithm
IRJET- Prediction of Heart Disease using RNN AlgorithmIRJET- Prediction of Heart Disease using RNN Algorithm
IRJET- Prediction of Heart Disease using RNN Algorithm
 
DATA AUGMENTATION TECHNIQUES AND TRANSFER LEARNING APPROACHES APPLIED TO FACI...
DATA AUGMENTATION TECHNIQUES AND TRANSFER LEARNING APPROACHES APPLIED TO FACI...DATA AUGMENTATION TECHNIQUES AND TRANSFER LEARNING APPROACHES APPLIED TO FACI...
DATA AUGMENTATION TECHNIQUES AND TRANSFER LEARNING APPROACHES APPLIED TO FACI...
 
Fulltext02
Fulltext02Fulltext02
Fulltext02
 
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATAA BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA
A BINARY BAT INSPIRED ALGORITHM FOR THE CLASSIFICATION OF BREAST CANCER DATA
 
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATIONREVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
REVIEWING PROCESS MINING APPLICATIONS AND TECHNIQUES IN EDUCATION
 
MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR ML
MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR MLMITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR ML
MITIGATION TECHNIQUES TO OVERCOME DATA HARM IN MODEL BUILDING FOR ML
 
Intelligent data analysis for medicinal diagnosis
Intelligent data analysis for medicinal diagnosisIntelligent data analysis for medicinal diagnosis
Intelligent data analysis for medicinal diagnosis
 
A survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applicationsA survey of deep learning approaches to medical applications
A survey of deep learning approaches to medical applications
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 
Qt7355g8v8
Qt7355g8v8Qt7355g8v8
Qt7355g8v8
 
University at Buffalo’s Center for Computational Research
University at Buffalo’s Center for Computational ResearchUniversity at Buffalo’s Center for Computational Research
University at Buffalo’s Center for Computational Research
 
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and PrimersGASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 

Similaire à algorithmic-decisions, fairness, machine learning, provenance, transparency

DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedJohannes Hoppe
 
Introduction to Data and Computation: Essential capabilities for everyone in ...
Introduction to Data and Computation: Essential capabilities for everyone in ...Introduction to Data and Computation: Essential capabilities for everyone in ...
Introduction to Data and Computation: Essential capabilities for everyone in ...Kim Flintoff
 
Analysing a Complex Agent-Based Model Using Data-Mining Techniques
Analysing a Complex Agent-Based Model  Using Data-Mining TechniquesAnalysing a Complex Agent-Based Model  Using Data-Mining Techniques
Analysing a Complex Agent-Based Model Using Data-Mining TechniquesBruce Edmonds
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACwebuploader
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Aalto University
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELSREPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELScscpconf
 
To Explain Or To Predict?
To Explain Or To Predict?To Explain Or To Predict?
To Explain Or To Predict?Galit Shmueli
 
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Sri Ambati
 
factorization methods
factorization methodsfactorization methods
factorization methodsShaina Raza
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifePeea Bal Chakraborty
 
Data Science: Origins, Methods, Challenges and the future?
Data Science: Origins, Methods, Challenges and the future?Data Science: Origins, Methods, Challenges and the future?
Data Science: Origins, Methods, Challenges and the future?Cagatay Turkay
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchFranciscoJAzuajeG
 
Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniquesijsrd.com
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009Ian Foster
 
Pt2520 Unit 6 Data Mining Project
Pt2520 Unit 6 Data Mining ProjectPt2520 Unit 6 Data Mining Project
Pt2520 Unit 6 Data Mining ProjectJoyce Williams
 
Ben Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of DiscoveryBen Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of Discoveryruss9595
 

Similaire à algorithmic-decisions, fairness, machine learning, provenance, transparency (20)

DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
 
Introduction to Data and Computation: Essential capabilities for everyone in ...
Introduction to Data and Computation: Essential capabilities for everyone in ...Introduction to Data and Computation: Essential capabilities for everyone in ...
Introduction to Data and Computation: Essential capabilities for everyone in ...
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Analysing a Complex Agent-Based Model Using Data-Mining Techniques
Analysing a Complex Agent-Based Model  Using Data-Mining TechniquesAnalysing a Complex Agent-Based Model  Using Data-Mining Techniques
Analysing a Complex Agent-Based Model Using Data-Mining Techniques
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SAC
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELSREPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
REPRESENTATION OF UNCERTAIN DATA USING POSSIBILISTIC NETWORK MODELS
 
To Explain Or To Predict?
To Explain Or To Predict?To Explain Or To Predict?
To Explain Or To Predict?
 
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
 
factorization methods
factorization methodsfactorization methods
factorization methods
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
 
Data Science: Origins, Methods, Challenges and the future?
Data Science: Origins, Methods, Challenges and the future?Data Science: Origins, Methods, Challenges and the future?
Data Science: Origins, Methods, Challenges and the future?
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
 
Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniques
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
Pt2520 Unit 6 Data Mining Project
Pt2520 Unit 6 Data Mining ProjectPt2520 Unit 6 Data Mining Project
Pt2520 Unit 6 Data Mining Project
 
Ben Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of DiscoveryBen Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of Discovery
 
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
 

Plus de Paolo Missier

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Paolo Missier
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...Paolo Missier
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Paolo Missier
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewPaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...Paolo Missier
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Paolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...Paolo Missier
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff UniversityPaolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...Paolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Provenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-ComputationProvenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-ComputationPaolo Missier
 
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...Paolo Missier
 
Selective and incremental re-computation in reaction to changes: an exercise ...
Selective and incremental re-computation in reaction to changes: an exercise ...Selective and incremental re-computation in reaction to changes: an exercise ...
Selective and incremental re-computation in reaction to changes: an exercise ...Paolo Missier
 
Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Paolo Missier
 

Plus de Paolo Missier (20)

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Provenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-ComputationProvenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-Computation
 
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
Design and evaluation of a genomics variant analysis pipeline using GATK Spar...
 
Selective and incremental re-computation in reaction to changes: an exercise ...
Selective and incremental re-computation in reaction to changes: an exercise ...Selective and incremental re-computation in reaction to changes: an exercise ...
Selective and incremental re-computation in reaction to changes: an exercise ...
 
Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...
 

Dernier

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Dernier (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

algorithmic-decisions, fairness, machine learning, provenance, transparency

  • 1. Paolo Missier School of Computing Newcastle University Supporting Algorithm Accountability using Provenance A ProvenanceWeek 2018 workshop London, July 12th, 2018 Transparency and fairness of predictive models, and the provenance of the data used to build them: thoughts and challenges
  • 2. 2 One of my favourite books <eventname> How much of Big Data is My Data? Is Data the problem? Or the algorithms? Or how much we trust them? Is there a problem at all?
  • 3. 3 What matters? <eventname> • automatically filtering job applicants • approving loans or other credit • approving access to benefits schemes • predicting insurance risk levels • user profiling for policing purposes and to predict risk of criminal recidivism • identifying health risk factors • … Decisions made based on algorithmically-generated knowledge:
  • 4. 4 GDPR and algorithmic decision making <eventname> Article 22: Automated individual decision-making, including profiling, paragraph 1 (see figure 1) prohibits any“decision based solely on automated processing, including profiling” which “significantly affects” a data subject. it stands to reason that an algorithm can only be explained if the trained model can be articulated and understood by a human. It is reasonable to suppose that any adequate explanation would provide an account of how input features relate to predictions: - Is the model more or less likely to recommend a loan if the applicant is a minority? - Which features play the largest role in prediction? B. Goodman and S. Flaxman, “European Union regulations on algorithmic decision-making and a ‘right to explanation,’” Proc. 2016 ICML Work. Hum. Interpret. Mach. Learn. (WHI 2016), Jun. 2016.
  • 5. 5 Interest is qualified and increasing
  • 6. 7 Transparency and interpretability <eventname> Of algorithms:  ML approaches  Model explanations Of data:  data-based explanations  provenance ?
  • 7. 8 Interpretability (of machine learning models) <eventname> Z. C. Lipton, “The Mythos of Model Interpretability,” Proc. 2016 ICML Work. Hum. Interpret. Mach. Learn. (WHI 2016), Jun. 2016. - Transparency - Are features understandable? - Which features are more important? - Post hoc interpretability - Natural language explanations - Visualisations of models - Explanations by example - “this tumor is classified as malignant because to the model it looks a lot like these other tumors” W. Samek, T. Wiegand, and K.-R. Müller, “Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models,” Aug. 2017. Interpretability: Ability to provide a qualitative understanding between the input variables and the response
  • 8. 9 Black-box approaches <eventname> Model agnostic: An explainer should be able to explain any model, and thus be model- agnostic (i.e. treat the original model as a black box) Local fidelity: for an explanation to be meaningful it must at least be locally faithful, i.e. it must correspond to how the model behaves in the vicinity of the instance being predicted
  • 9. 10 Occlusion testing <eventname> M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ : Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 2016, pp. 1135–1144.
  • 10. 11 Expected accuracy not enough for trust <eventname> SVM classifier, 94% accuracy …but questionable!
  • 11. 13 LIME <eventname> Model agnostic Locally faithful: it must correspond to how the model behaves in the vicinity of the instance being predicted M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ : Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 2016, pp. 1135–1144.
  • 12. 14 Other model explanation approaches <eventname> [1] Lakkaraju, H., Kamar, E., Caruana, R., & Leskovec, J. (2017). Interpretable & Explorable Approximations of Black Box Models. arXiv preprint arXiv:1707.01154. [2] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, “Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 1721–1730. 1. Black Box Explanations through Transparent Approximations (BETA) [1] • Decision Set approximation of black box models • Fidelity + interpretability of the explanation • Global (unlike LIME) 2. Intelligible additive models [2] • General Additive Model (GAM) • Pairwise interactions General Additive Model (GA2M)
  • 13. 15 Data  Model  Predictions <eventname> Model Population data pre-processing Raw datasets features Predicted you: - Ranking - Score - Class Data collection Instances Key decisions are made during data collection: - Where does the data come from? - What’s in the dataset? Complementing current ML approaches to model interpretability
  • 14. 16 Possible roles for provenance <eventname> 1) Data acquisition: Provenance  Transparency  Trust
  • 15. 17 Data  Model  Predictions <eventname> Model Population data pre-processing Raw datasets features Predicted you: - Ranking - Score - Class Data collection Instances Key decisions are made during - Data collection: - where does the data come from? What’s in the dataset? - Data preparation: how was it pre-processed? 1. Can we explain these decisions? 2. Are these explanations useful?
  • 16. 18 Explaining data preparation PaoloMissier(Computing),DennisPrangle(Stats) Data collection Model Population data pre-processing Raw datasets features Predicted you: - Ranking - Score - Class - Integration - Cleaning - Outlier removal - Normalisation - Feature selection - Class rebalancing - Sampling - Stratification - … Data acquisition and wrangling: - How were datasets acquired? - How recently? - For what purpose? - Are they being reused / repurposed? - What is their quality? Instances - Scripts  Python / TensorFlow, Pandas, Spark - Workflows  Knime, … Provenance  Transparency
  • 17. 19 Provenance for transparency <eventname> 1. Collection - Program-level - System-level 2. Representation - W3C PROV (for interoperability) - Multiple proprietary formats (for efficient encoding) 3. Querying / analysis • RDBMS • GDBMS • RDF / SPARQL • Configuration of each pre-processing step • Data dependency graph - Which kind of normalisation did you apply? - Was the data (down/up) sampled? How? - How did you define / remove outliers? - How did you window your time series? - Was the data repurposed (acquired from a repository)? - How was the original protocol defined?
  • 18. 20 Example <eventname> • The classic ”Titanic” dataset • Can you predict survival probabilities? • A simple logistic regression analysis Survived - Survival (0 = No; 1 = Yes) Pclass - Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd) Name - Name Sex - Sex Age - Age SibSp - Number of Siblings/Spouses Aboard Parch - Number of Parents/Children Aboard Ticket - Ticket Number Fare - Passenger Fare (British pound) Cabin - Cabin Embarked - Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)
  • 19. 21 Enable analysis of data pre-processing <eventname> Managing missing values Is the target class balanced? • Data preparation workflow includes a number of decisions Dropping irrelevant attributes PassengerId', 'Name', 'Ticket', 'Cabin' Dropping correlated features (?) Age missing in 714/891 records “Pclass is a good predictor for age” Impute Age values using average age for PClass Drop “Fare”, “Pclass”
  • 20. 22 Example: missing values imputation <eventname>
  • 21. 23 Exploring the effect of alternative pre-processing <eventname> D P1 D1 Learn M1 Predict x y1 How can knowledge of P1, P2 help understand why y1 ≠ y2 ? Ex. Alternative imputation methods for missing values Ex. Boost minority class / downsample majority class P2 D2 Learn M2 Predict y2 y1 ≠ y2
  • 22. 24 Also: script alludes to human decisions <eventname> How do we capture these decisions? To what extent can they be inferred from code?
  • 23. 25 Correlation analysis <eventname> • Is Pclass really a good predictor for Age? • Why drop both PClass and Fare? 1. Dropped Age only (Nearly identical performance (F1=0.77, 0.76)) 2. Use sex, Pclass only Alternative pre-processing:
  • 24. 26 Possible roles for provenance <eventname> 1) Data acquisition: Provenance  Transparency  Trust 2) Data transformation: Provenance  explanations - Is data preparation correct? - Is training data fit to learn from? - What is the effect of alternative pre-processing? - Can we infer data prep decisions from pre-processing code?
  • 25. 27 Bias (in ML) <eventname> (*) Mitchell, T. M. (1980). The need for biases in learning generalizations. Tech. rep. CBMTR-117, Rutgers University, New Brunswick, NJ Bias: “Any basis for choosing one generalization [hypothesis] over another, other than strict consistency with the observed training instances." (*) Absolute bias: • certain hypotheses are entirely eliminated from the hypothesis space) • Eg “A priori choice of model (decision trees, SVM, NN, …) Relative bias: • certain hypotheses are preferred over others • Eg “prefer shallow simple decision trees to deep ones”
  • 26. 28 Fairness and bias: the (notorious) COMPAS case <eventname> • Increasingly popular within the criminal justice system • Used or considered for use in pre-trial decision-making (USA) 1: The initial claim Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s software used across the country to predict future criminals. and it’s biased against blacks. 2016. https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism- algorithm black defendants who did not recidivate over a two-year period were nearly twice as likely to be misclassified as higher risk compared to their white counterparts (45 percent vs. 23 percent). white defendants who re-offended within the next two years were mistakenly labeled low risk almost twice as often as black re-offenders (48 percent vs. 28 percent)
  • 27. 29 Model Fairness and data bias <eventname> A. Chouldechova, “Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments,” Big Data, vol. 5, no. 2, pp. 153–163, Jun. 2017. In this paper we show that the differences in false positive and false negative rates cited as evidence of racial bias in the ProPublica article are a direct consequence of applying an instrument that is free from predictive bias to a population in which recidivism prevalence differs across groups. COMPAS complies with the test fairness condition: Observed P(Y | S=s) largely independent of R
  • 28. 30 COMPAS Scores are skewed <eventname> - scores for white defendants were skewed toward lower-risk categories, while black defendants were evenly distributed across scores - large discrepancies in FPR and FNR between Black and White defendants - … but this does not mean that the score itself is unfair 6,172 defendants who had not been arrested for a new offense or who had recidivated within two years
  • 29. 31 FPR / FNR <eventname> positive predictive value of Sc: The test fairness condition (2.1) can be expressed as the constraint that PPV does not depend on R recidivism prevalence within groups: False positive rate: False negative rate: When the recidivism prevalence differs between two groups, a test-fair score cannot have equal FPR, FNR across those groups
  • 30. 32 The actual “provenance” of the analysis <eventname> https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm Data acquisition + transformation  Model bias and fairness - Can knowledge of data prep explain model bias? - Does data prep introduce / remove bias?
  • 31. 33 Fairness: many possible definitions <eventname> (*) M. J. Kusner, J. Loftus, C. Russell, and R. Silva, “Counterfactual Fairness,” in Advances in Neural Information Processing Systems 30, I. Guyon, U. V Luxburg, S. Bengio, H. Wallach, R. Fergus, S.Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, pp. 4066–4076.
  • 32. 34 Causality and counterfactual fairness <eventname> aggressive driving accident rate red cars preference Driver’s race Latent Protected Predicted Observable • Individuals belonging to a race A are more likely to drive red cars (A  X) • However, race is not a good predictor for either U or Y • Aggressive drivers tend to prefer red cars (U  X) Using X to predict Y leads to a counterfactually unfair model: • it may charge individuals of a certain race more than others, even though no race is more likely to have an accident Is knowledge of data prep useful at all to determine this kind of fairness?
  • 33. 35 Possible roles for provenance <eventname> 1) Data acquisition: Provenance  Transparency  Trust 2) Data transformation: Provenance  explanations - Is data preparation correct? - Is training data fit to learn from? - What is the effect of alternative pre-processing? 3) Data acquisition + transformation  Model bias and fairness - Is provenance useful to diagnose an unfair / biased model? - Does data prep introduce / remove bias?
  • 34. 36 Opportunities and challenges: Summary <eventname> 1) Data acquisition: Provenance  Transparency  Trust 2) Data transformation: Provenance  explanations - Is data preparation correct? - Is training data fit to learn from? - What is the effect of alternative pre-processing? 3) Data acquisition + transformation  Model bias and fairness - Is provenance useful to diagnose an unfair / biased model? - Does data prep introduce / remove bias?
  • 35. 37 A few initial references [1] C. O’Neill, Weapons of Math Destruction. Crown books, 2016. [2] B. Goodman and S. Flaxman, “European Union regulations on algorithmic decision-making and a ‘right to explanation,’” Proc. 2016 ICML Work. Hum. Interpret. Mach. Learn. (WHI 2016), Jun. 2016. [3] M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’ : Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, 2016, pp. 1135–1144. [4] H. Lakkaraju, S. H. Bach, and J. Leskovec, “Interpretable Decision Sets: A Joint Framework for Description and Prediction,” in Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1675–1684. [5] K. Yang and J. Stoyanovich, “Measuring Fairness in Ranked Outputs,” in Proceedings of the 29th International Conference on Scientific and Statistical Database Management - SSDBM ’17, 2017, pp. 1–6. [6] T. Gebru et al., “Datasheets for Datasets,” 2108. [7] Z. Abedjan, L. Golab, and F. Naumann, “Profiling relational data: a survey,” VLDB J., vol. 24, no. 557, 2015. [8] A. Weller, “Challenges for Transparency,” in Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016). [8] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad, “Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 1721–1730.
  • 36. 38 Thank you <eventname> Paolo.Missier@newcastle.ac.uk School of Computing, Newcastle University http://tinyurl.com/paolomissier LinkedIn: www.linkedin.com/in/paolomissier Twitter: @Pmissier

Notes de l'éditeur

  1. Individuals as well as businesses, which we will initially refer to as subjects (and later upgrade to active participants), increasingly find themselves at the receiving end of impactful decisions made by organisations on their behalf, based on processes that use algorithmically-generated knowledge.
  2. Brings about the issue of trust in the models. Should I use the prediction? “Determining trust in individual predictions is an importantproblem when the model is used for decision making. When using machine learning for medical diagnosis [6] or terrorism detection, for example, predictions cannot be acted upon on blind faith, as the consequences may be catastrophic”
  3. How about the data used to train / build the model?
  4. How about the data used to train / build the model?
  5. Relatively easy to keep track of data pre-processing  provenance