SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Considerations on
              Fairness-aware Data Mining

   Toshihiro Kamishima*, Shotaro Akaho*, Hideki Asoh*, and Jun Sakuma†
        *National Institute of Advanced Industrial Science and Technology (AIST), Japan
          †University of Tsukuba, Japan; and Japan Science and Technology Agency



  IEEE ICDM International Workshop on Discrimination and Privacy-Aware Data Mining
                             @ Brussels, Belgium, Dec. 10, 2012




START                                                                                     1
Overview

             Fairness-aware Data Mining
      data analysis taking into account potential issues of
      fairness, discrimination, neutrality, or independence




                Introduction of the following topics
       to give an overview of fairness-aware data mining
Applications of fairness-aware data mining for the other purposes
besides avoiding discrimination
An integrated view of existing fairness measures based on the
statistical independence
Connections of fairness-aware data mining with other techniques
for data analysis

                                                                    2
Applications of
Fairness-Aware Data Mining




                             3
Discrimination-aware Data Mining
                                                                              [Pedreschi+ 08]

                   Discrimination-aware Data Mining
               Detection or elimination of the influence
         of socially sensitive information to serious decisions
    information considered from the viewpoint of social fairness
    ex. gender, religion, race, ethnicity, handicap, political conviction
    information restricted by law or contracts
    ex. insider information, customers’ private information




             This framework for detecting or eliminating
    the influence of a specific information to the analysis results
  can be used for other purposes besides avoiding discrimination
   enhancing the neutrality of the prediction
   excluding uninteresting information from analysis results
✴This is the reason why we use the term “fairness-aware” instead of “discrimination-aware”

                                                                                                4
Filter Bubble
                                                        [TED Talk by Eli Pariser]

                       The Filter Bubble Problem
Pariser posed a concern that personalization technologies narrow and
bias the topics of information provided to people
                     http://www.thefilterbubble.com/

              Friend Recommendation List in Facebook




To fit for Pariser’s preference, conservative people are eliminated from
     his recommendation list, while this fact is not notified to him
                                                                                    5
Information Neutral Recommender System
                                                           [Kamishima+ 12]


                     The Filter Bubble Problem



            Information Neutral Recommender System
   enhancing the neutrality from a viewpoint specified by a user
            and other viewpoints are not considered


       Fairness-aware data mining techniques are used for
       removing the influence of the specified information


ex. A system enhances the neutrality in terms of whether conservative
   or progressive, but it is allowed to make biased recommendations
   in terms of other viewpoints, for example, the birthplace or age of
   friends
                                                                             6
Ignoring Uninteresting Information
                                                               [Gondek+ 04]

non-redundant clustering : find clusters that are as independent
from a given uninteresting partition as possible
          a conditional information bottleneck method,
      which is a variant of an information bottleneck method
clustering facial images

                           Simple clustering methods find two clusters:
                           one contains only faces, and the other
                           contains faces with shoulders

                           Data analysts consider this clustering is
                           useless and uninteresting

                           A non-redundant clustering method derives
                           more useful male and female clusters,
                           which are independent of the above clusters
                                                                              7
An Integrated View of
Existing Fairness Measures




                             8
Notations of Variables
                                    a result of serious decision
Y   objective variable              ex., whether or not to allow credit
                                    socially sensitive information
S   sensitive feature               ex., gender or religion

X   non-sensitive feature vector
                          all features other than a sensitive feature
                          non-sensitive, but may correlate with S

    X   (E)    explainable non-sensitive feature vector


    X   (U )   unexplainable non-sensitive feature vector
    Non-sensitive features can be further classified explainable and
    unexplainable (we will introduce later)

                                                                          9
Prejudice
                                                             [Kamishima+ 12]

Prejudice : the statistical dependences of an objective variable or
non-sensitive features on a sensitive feature

   Direct Prejudice           Y 6? S | X
                                 ?
 a clearly unfair state that a prediction model directly depends on a
 sensitive feature
 implying the conditional dependence between Y and S given X

  Indirect Prejudice           Y 6? S | ;
                                  ?
 the dependence of an objective variable Y on a sensitive feature S
 bringing a red-lining effect

Indirect Prejudice with Explainable Features          Y 6? S | X(E)
                                                         ?
 the conditional dependence between Y and S given explainable non-
 sensitive features, if a part of non-sensitive features are explainable
                                                                               10
Prejudice


The degree of prejudices can be evaluated:
 computing statistics for the independence (among observations)
 ex. mutual information, χ2-statistics
 applying tests of independence (among sampled population)
 χ2-tests


Relations between this prejudice and other existing measures for
unfairness
 elift (extended lift)
 CV score (Calders-Verwer’s discrimination score)
 CV score with explainable features



                                                                   11
elift (extended lift)
                                                     [Pedreschi+ 08, Ruggieri+ 10]


                             conf(X = x, S= ) Y = )
   elift (extended lift) =
                                conf(X = x ) Y = )
    the ratio of the confidence of a rule with additional condition
                    to the confidence of a base rule

The condition elift = 1 means that no unfair treatments, and it implies
        Pr[Y = , S=           |X=x] = Pr[Y =          |X=x]
        If this condition is the case for any x 2 Dom(X)
    (the condition of no direct discrimination in their definition)
                  and S and Y are binary variables:
                      Pr[Y, S|X] = Pr[Y |X]
    This is equivalent to the condition of no direct prejudice
        Similarly, their condition of no-indirect-discrimination
       is equivalent to that of no-indirect-prejudice condition
                                                                                     12
CV Score
      (Calders-Verwer’s Discrimination Score)
                                                           [Calders+ 10]

        Calders-Verwer discrimination score (CV score)
            Pr[Y = + |S=+]          Pr[Y = + |S= ]
       The conditional probability of the advantageous decision
for non-protected members subtracted by that for protected members




                The condition CV score = 0 means
            that neither unfair or affirmative treatments


                        Pr[Y |S] = Pr[Y ]
   This is equivalent to the condition of no indirect prejudice

                                                                           13
Explainability
                                                               [Zliobaite+ 11]

non-discriminatory cases, even if distributions of an objective variable
depends on a sensitive feature
ex : admission to a university
                        more females apply
                        to medicine
 explainable feature    more males apply to          sensitive feature
       program          computer                         gender
 medicine / computer                                   male / female

                                  objective variable
 medicine : low acceptance         acceptance ratio
 computer : high acceptance       accept / not accept
Because females tend to apply to a more competitive program,
females are more frequently rejected
Such difference is explainable and is considered as non-discriminatory
                                                                                 14
Explainability
To remove unfair treatments under such a condition, the probability of
                                       †
receiving an advantageous decision, 	Pr [Y = + |X (E) ], is chosen so as
to satisfy the condition:
         †
     Pr [Y = + |X (E) ] =
     Pr† [Y = + |X (E) , S=+] = Pr† [Y = + |X (E) , S= ]


                           This condition implies
                    Pr[Y |X (E) , S] = Pr[Y |X (E) ]
                       This is equivalent to
 the condition of no indirect prejudice with explainable features
✴Notions that is similar to this explainability are proposed as a legally-grounded
 feature [Luong+ 11] and are considered in the rule generalization procedure
 [Haijan+ 12].

                                                                                     15
Connections with Other Techniques
        for Data Analysis




                                    16
Privacy-Preserving Data Mining
                     indirect prejudice
 the dependence between an objective Y and a sensitive feature S



           from the information theoretic perspective,
          mutual information between Y and S is non-zero



           from the viewpoint of privacy-preservation,
leakage of sensitive information when an objective variable is known

                    Different points from PPDM
introducing randomness is occasionally inappropriate for severe
decisions, such as job application
disclosure of identity isn’t problematic in FADM, generally
                                                                       17
Causal Inference
                                                                        [Pearl 2009]

                          Causal inference
  a general technique for dealing with probabilistic causal relations

Both FADM and causal inference can cope with the influence of
sensitive information to an objective, but their premises are quite
different
While causal inference demands more information about causality
among variables, it can handle the facts that were not occurred

example of the connection between FADM and causal inference
A person of S=+ actually receives Y=+
If S was changed to -, how was the probability of receiving Y=-?
Under the condition called exogenous, the range of this probability is
max[0, Pr[Y = + |S=+]   Pr[Y = + |S= ]]  PNS  min[Pr[Y =+, X=+], Pr[Y = , X= ]]

             This lower bound coincides with a CV score
                                                                                       18
Cost-Sensitive Learning
                                                                  [Elkan 2001]


Cost-Sensitive Learning: learning classifiers so as to optimize
classification costs, instead of maximizing prediction accuracies


FADM can be regarded as a kind of cost-sensitive learning that
pays the costs for taking fairness into consideration



Cost matrix C(i | j): cost if a true class j is predicted as class i
Total cost to minimize is formally defined as (if class Y=+ or -):
                             P
                L(x, i) =        j   Pr[j | x]C(i | j)
An object x is classified into the class i whose cost is minimized



                                                                                 19
Cost-Sensitive Learning
                                                                 [Elkan 2001]

  Theorem 1 in [Elkan 2001]
  If negative examples in a data set is over-sampled by the factor of
                               C(+| )
                               C( |+)
  and a classifier is learned from this samples, a classifier to
  optimize specified costs is obtained


  In a FADM case, an over-sampling technique is used for avoiding
  unfair treatments

  A corresponding cost matrix can be computed by this theorem,
  which connects a cost matrix and the class ratio in training data

✴Note that this over-sampling technique is simple and effective for
 avoiding unfair decisions, but its weak point that it completely
 ignores non-sensitive features
                                                                                20
Other Connected Techniques
Legitimacy
 Data mining models can be deployed in the real world
Independent Component Analysis
  Transformation while maintaining the independence between features
Delegate Data
 To perform statistical tests, specific information is removed from data
 sets
Dummy Query
 Dummy queries are inputted for protecting users’ demographics into
 search engines or recommender systems
Visual Anonymization
 To protect identities of persons in images, faces or other information
 is blurred

                                                                          21
Conclusion

Contributions
We introduced the following topics to give an overview of fairness-
aware data mining
 We showed other FADM applications besides avoiding discrimination:
 enhancing neutrality and ignoring uninteresting information.
 We discussed the relations between our notion of prejudice and other
 existing measures of unfairness.
 We showed the connections of FADM with privacy-preserving data
 mining, causal inference, cost-sensitive learning, and so on.
Socially Responsible Mining
 Methods of data exploitation that do not damage people’s lives, such
 as fairness-aware data mining, PPDM, or adversarial learning,
 together comprise the notion of socially responsible mining, which
 it should become an important concept in the near future.

                                                                        22
Program Codes and Data Sets
               Fairness-Aware Data Mining
            http://www.kamishima.net/fadm

         Information Neutral Recommender System
            http://www.kamishima.net/inrs



                  Acknowledgements
This work is supported by MEXT/JSPS KAKENHI Grant Number
16700157, 21500154, 22500142, 23240043, and 24500194, and JST
PRESTO 09152492



                                                                23

Contenu connexe

Tendances

Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922stone55
 
Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...Andrea Dal Pozzolo
 
ISEN 792 (ISE tool, method, process)
ISEN 792 (ISE tool, method, process)ISEN 792 (ISE tool, method, process)
ISEN 792 (ISE tool, method, process)Marcia Nealy
 
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...essp2
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble AlgorithmsSara Hooker
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep DiveSara Hooker
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear RegressionSara Hooker
 
Chahine Hypothesis Testing,
Chahine Hypothesis Testing,Chahine Hypothesis Testing,
Chahine Hypothesis Testing,Saad Chahine
 
"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AIErika Agostinelli
 
An Explanation Framework for Interpretable Credit Scoring
An Explanation Framework for Interpretable Credit Scoring An Explanation Framework for Interpretable Credit Scoring
An Explanation Framework for Interpretable Credit Scoring gerogepatton
 
Acemoglu lecture2
Acemoglu lecture2Acemoglu lecture2
Acemoglu lecture2NBER
 
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERSPROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERSAndresz26
 
International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER) International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER) ijceronline
 

Tendances (19)

Recommendation Independence
Recommendation IndependenceRecommendation Independence
Recommendation Independence
 
Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922Summer 07-mfin7011-tang1922
Summer 07-mfin7011-tang1922
 
Malhotra07
Malhotra07Malhotra07
Malhotra07
 
Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...
 
Malhotra04
Malhotra04Malhotra04
Malhotra04
 
ISEN 792 (ISE tool, method, process)
ISEN 792 (ISE tool, method, process)ISEN 792 (ISE tool, method, process)
ISEN 792 (ISE tool, method, process)
 
Classification Using Decision tree
Classification Using Decision treeClassification Using Decision tree
Classification Using Decision tree
 
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
The Formation of Job Referral Networks: Evidence from a Field Experiment in U...
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble Algorithms
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep Dive
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
 
Chahine Hypothesis Testing,
Chahine Hypothesis Testing,Chahine Hypothesis Testing,
Chahine Hypothesis Testing,
 
"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI"I don't trust AI": the role of explainability in responsible AI
"I don't trust AI": the role of explainability in responsible AI
 
An Explanation Framework for Interpretable Credit Scoring
An Explanation Framework for Interpretable Credit Scoring An Explanation Framework for Interpretable Credit Scoring
An Explanation Framework for Interpretable Credit Scoring
 
Acemoglu lecture2
Acemoglu lecture2Acemoglu lecture2
Acemoglu lecture2
 
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERSPROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
 
Threads 2013
Threads 2013Threads 2013
Threads 2013
 
International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER) International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)
 
Malhotra08
Malhotra08Malhotra08
Malhotra08
 

En vedette

文献紹介:A Joint Segmentation and Classification Framework for Sentiment Analysis
文献紹介:A Joint Segmentation and Classification Framework for Sentiment Analysis文献紹介:A Joint Segmentation and Classification Framework for Sentiment Analysis
文献紹介:A Joint Segmentation and Classification Framework for Sentiment AnalysisShohei Okada
 
文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections
文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections
文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word ConnectionsShohei Okada
 
文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...
文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...
文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...Shohei Okada
 
文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...
文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...
文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...Shohei Okada
 
文献紹介:Sentiment Analysis of Conditional Sentences
文献紹介:Sentiment Analysis of Conditional Sentences文献紹介:Sentiment Analysis of Conditional Sentences
文献紹介:Sentiment Analysis of Conditional SentencesShohei Okada
 
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...Shohei Okada
 
文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment Words
文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment Words文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment Words
文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment WordsShohei Okada
 
文献紹介:An Empirical Study on the Effect of Negation words on Sentiment
文献紹介:An Empirical Study on the Effect of Negation words on Sentiment文献紹介:An Empirical Study on the Effect of Negation words on Sentiment
文献紹介:An Empirical Study on the Effect of Negation words on SentimentShohei Okada
 
文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random Fields
文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random Fields文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random Fields
文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random FieldsShohei Okada
 
Absolute and Relative Clustering
Absolute and Relative ClusteringAbsolute and Relative Clustering
Absolute and Relative ClusteringToshihiro Kamishima
 
Enhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in RecommendationEnhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in RecommendationToshihiro Kamishima
 
文献紹介:SemEval(SENSEVAL)におけるWSDタスクについて
文献紹介:SemEval(SENSEVAL)におけるWSDタスクについて文献紹介:SemEval(SENSEVAL)におけるWSDタスクについて
文献紹介:SemEval(SENSEVAL)におけるWSDタスクについてShohei Okada
 
文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...
文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...
文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...Shohei Okada
 
OpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシングOpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシングToshihiro Kamishima
 
The Infamous Hello World Program
The Infamous Hello World ProgramThe Infamous Hello World Program
The Infamous Hello World ProgramShohei Okada
 
PRML勉強会@長岡 第4章線形識別モデル
PRML勉強会@長岡 第4章線形識別モデルPRML勉強会@長岡 第4章線形識別モデル
PRML勉強会@長岡 第4章線形識別モデルShohei Okada
 
Pythonによる機械学習実験の管理
Pythonによる機械学習実験の管理Pythonによる機械学習実験の管理
Pythonによる機械学習実験の管理Toshihiro Kamishima
 

En vedette (20)

文献紹介:A Joint Segmentation and Classification Framework for Sentiment Analysis
文献紹介:A Joint Segmentation and Classification Framework for Sentiment Analysis文献紹介:A Joint Segmentation and Classification Framework for Sentiment Analysis
文献紹介:A Joint Segmentation and Classification Framework for Sentiment Analysis
 
文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections
文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections
文献紹介:Opinion Mining in Newspaper Articles by Entropy-based Word Connections
 
文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...
文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...
文献紹介:Bidirectional Inter-dependencies of Subjective Expressions and Targets a...
 
文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...
文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...
文献紹介:Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automati...
 
文献紹介:Sentiment Analysis of Conditional Sentences
文献紹介:Sentiment Analysis of Conditional Sentences文献紹介:Sentiment Analysis of Conditional Sentences
文献紹介:Sentiment Analysis of Conditional Sentences
 
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
文献紹介:An Iterative 'Sudoku Style' Approach to Subgraph-based Word Sense DIsamb...
 
文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment Words
文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment Words文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment Words
文献紹介:Fine-Grained Contextual Predictions for Hard Sentiment Words
 
文献紹介:An Empirical Study on the Effect of Negation words on Sentiment
文献紹介:An Empirical Study on the Effect of Negation words on Sentiment文献紹介:An Empirical Study on the Effect of Negation words on Sentiment
文献紹介:An Empirical Study on the Effect of Negation words on Sentiment
 
文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random Fields
文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random Fields文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random Fields
文献紹介:Extracting Opinion Expression with semi-Markov Conditional Random Fields
 
Absolute and Relative Clustering
Absolute and Relative ClusteringAbsolute and Relative Clustering
Absolute and Relative Clustering
 
Enhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in RecommendationEnhancement of the Neutrality in Recommendation
Enhancement of the Neutrality in Recommendation
 
文献紹介:SemEval(SENSEVAL)におけるWSDタスクについて
文献紹介:SemEval(SENSEVAL)におけるWSDタスクについて文献紹介:SemEval(SENSEVAL)におけるWSDタスクについて
文献紹介:SemEval(SENSEVAL)におけるWSDタスクについて
 
文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...
文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...
文献紹介:Recursive Deep Models for Semantic Compositionality Over a Sentiment Tre...
 
OpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシングOpenOpt の線形計画で圧縮センシング
OpenOpt の線形計画で圧縮センシング
 
WSDM2016勉強会 資料
WSDM2016勉強会 資料WSDM2016勉強会 資料
WSDM2016勉強会 資料
 
The Infamous Hello World Program
The Infamous Hello World ProgramThe Infamous Hello World Program
The Infamous Hello World Program
 
PRML勉強会@長岡 第4章線形識別モデル
PRML勉強会@長岡 第4章線形識別モデルPRML勉強会@長岡 第4章線形識別モデル
PRML勉強会@長岡 第4章線形識別モデル
 
Pythonによる機械学習実験の管理
Pythonによる機械学習実験の管理Pythonによる機械学習実験の管理
Pythonによる機械学習実験の管理
 
ICML2015読み会 資料
ICML2015読み会 資料ICML2015読み会 資料
ICML2015読み会 資料
 
Python入門
Python入門Python入門
Python入門
 

Similaire à Consideration on Fairness-aware Data Mining

C0364012016
C0364012016C0364012016
C0364012016theijes
 
Study of Privacy Discrimination and Prevention in Data Mining
Study of Privacy Discrimination and Prevention in Data MiningStudy of Privacy Discrimination and Prevention in Data Mining
Study of Privacy Discrimination and Prevention in Data MiningIRJET Journal
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Carlos Castillo (ChaTo)
 
UN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingUN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingMicah Altman
 
Classification with No Direct Discrimination
Classification with No Direct DiscriminationClassification with No Direct Discrimination
Classification with No Direct DiscriminationEditor IJCATR
 
Protection of Direct and Indirect Discrimination using Prevention Methods
Protection of Direct and Indirect Discrimination using Prevention Methods Protection of Direct and Indirect Discrimination using Prevention Methods
Protection of Direct and Indirect Discrimination using Prevention Methods IOSR Journals
 
Data Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context MissionsData Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context Missionsijdms
 

Similaire à Consideration on Fairness-aware Data Mining (10)

C0364012016
C0364012016C0364012016
C0364012016
 
Study of Privacy Discrimination and Prevention in Data Mining
Study of Privacy Discrimination and Prevention in Data MiningStudy of Privacy Discrimination and Prevention in Data Mining
Study of Privacy Discrimination and Prevention in Data Mining
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
UN Global Pulse Privacy Framing
UN Global Pulse Privacy FramingUN Global Pulse Privacy Framing
UN Global Pulse Privacy Framing
 
Classification with No Direct Discrimination
Classification with No Direct DiscriminationClassification with No Direct Discrimination
Classification with No Direct Discrimination
 
Protection of Direct and Indirect Discrimination using Prevention Methods
Protection of Direct and Indirect Discrimination using Prevention Methods Protection of Direct and Indirect Discrimination using Prevention Methods
Protection of Direct and Indirect Discrimination using Prevention Methods
 
Disability Rights, Concepts & Practices (Classroom).pptx
Disability Rights, Concepts & Practices (Classroom).pptxDisability Rights, Concepts & Practices (Classroom).pptx
Disability Rights, Concepts & Practices (Classroom).pptx
 
WiMLDS Paris O Vereschak Trust AI-assisted decisions
WiMLDS Paris O Vereschak Trust AI-assisted decisionsWiMLDS Paris O Vereschak Trust AI-assisted decisions
WiMLDS Paris O Vereschak Trust AI-assisted decisions
 
J046065457
J046065457J046065457
J046065457
 
Data Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context MissionsData Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context Missions
 

Plus de Toshihiro Kamishima

RecSys2018論文読み会 資料
RecSys2018論文読み会 資料RecSys2018論文読み会 資料
RecSys2018論文読み会 資料Toshihiro Kamishima
 
機械学習研究でのPythonの利用
機械学習研究でのPythonの利用機械学習研究でのPythonの利用
機械学習研究でのPythonの利用Toshihiro Kamishima
 
科学技術計算関連Pythonパッケージの概要
科学技術計算関連Pythonパッケージの概要科学技術計算関連Pythonパッケージの概要
科学技術計算関連Pythonパッケージの概要Toshihiro Kamishima
 
PyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしないPyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしないToshihiro Kamishima
 

Plus de Toshihiro Kamishima (6)

RecSys2018論文読み会 資料
RecSys2018論文読み会 資料RecSys2018論文読み会 資料
RecSys2018論文読み会 資料
 
WSDM2018読み会 資料
WSDM2018読み会 資料WSDM2018読み会 資料
WSDM2018読み会 資料
 
機械学習研究でのPythonの利用
機械学習研究でのPythonの利用機械学習研究でのPythonの利用
機械学習研究でのPythonの利用
 
KDD2016勉強会 資料
KDD2016勉強会 資料KDD2016勉強会 資料
KDD2016勉強会 資料
 
科学技術計算関連Pythonパッケージの概要
科学技術計算関連Pythonパッケージの概要科学技術計算関連Pythonパッケージの概要
科学技術計算関連Pythonパッケージの概要
 
PyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしないPyMCがあれば,ベイズ推定でもう泣いたりなんかしない
PyMCがあれば,ベイズ推定でもう泣いたりなんかしない
 

Dernier

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 

Dernier (20)

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 

Consideration on Fairness-aware Data Mining

  • 1. Considerations on Fairness-aware Data Mining Toshihiro Kamishima*, Shotaro Akaho*, Hideki Asoh*, and Jun Sakuma† *National Institute of Advanced Industrial Science and Technology (AIST), Japan †University of Tsukuba, Japan; and Japan Science and Technology Agency IEEE ICDM International Workshop on Discrimination and Privacy-Aware Data Mining @ Brussels, Belgium, Dec. 10, 2012 START 1
  • 2. Overview Fairness-aware Data Mining data analysis taking into account potential issues of fairness, discrimination, neutrality, or independence Introduction of the following topics to give an overview of fairness-aware data mining Applications of fairness-aware data mining for the other purposes besides avoiding discrimination An integrated view of existing fairness measures based on the statistical independence Connections of fairness-aware data mining with other techniques for data analysis 2
  • 4. Discrimination-aware Data Mining [Pedreschi+ 08] Discrimination-aware Data Mining Detection or elimination of the influence of socially sensitive information to serious decisions information considered from the viewpoint of social fairness ex. gender, religion, race, ethnicity, handicap, political conviction information restricted by law or contracts ex. insider information, customers’ private information This framework for detecting or eliminating the influence of a specific information to the analysis results can be used for other purposes besides avoiding discrimination enhancing the neutrality of the prediction excluding uninteresting information from analysis results ✴This is the reason why we use the term “fairness-aware” instead of “discrimination-aware” 4
  • 5. Filter Bubble [TED Talk by Eli Pariser] The Filter Bubble Problem Pariser posed a concern that personalization technologies narrow and bias the topics of information provided to people http://www.thefilterbubble.com/ Friend Recommendation List in Facebook To fit for Pariser’s preference, conservative people are eliminated from his recommendation list, while this fact is not notified to him 5
  • 6. Information Neutral Recommender System [Kamishima+ 12] The Filter Bubble Problem Information Neutral Recommender System enhancing the neutrality from a viewpoint specified by a user and other viewpoints are not considered Fairness-aware data mining techniques are used for removing the influence of the specified information ex. A system enhances the neutrality in terms of whether conservative or progressive, but it is allowed to make biased recommendations in terms of other viewpoints, for example, the birthplace or age of friends 6
  • 7. Ignoring Uninteresting Information [Gondek+ 04] non-redundant clustering : find clusters that are as independent from a given uninteresting partition as possible a conditional information bottleneck method, which is a variant of an information bottleneck method clustering facial images Simple clustering methods find two clusters: one contains only faces, and the other contains faces with shoulders Data analysts consider this clustering is useless and uninteresting A non-redundant clustering method derives more useful male and female clusters, which are independent of the above clusters 7
  • 8. An Integrated View of Existing Fairness Measures 8
  • 9. Notations of Variables a result of serious decision Y objective variable ex., whether or not to allow credit socially sensitive information S sensitive feature ex., gender or religion X non-sensitive feature vector all features other than a sensitive feature non-sensitive, but may correlate with S X (E) explainable non-sensitive feature vector X (U ) unexplainable non-sensitive feature vector Non-sensitive features can be further classified explainable and unexplainable (we will introduce later) 9
  • 10. Prejudice [Kamishima+ 12] Prejudice : the statistical dependences of an objective variable or non-sensitive features on a sensitive feature Direct Prejudice Y 6? S | X ? a clearly unfair state that a prediction model directly depends on a sensitive feature implying the conditional dependence between Y and S given X Indirect Prejudice Y 6? S | ; ? the dependence of an objective variable Y on a sensitive feature S bringing a red-lining effect Indirect Prejudice with Explainable Features Y 6? S | X(E) ? the conditional dependence between Y and S given explainable non- sensitive features, if a part of non-sensitive features are explainable 10
  • 11. Prejudice The degree of prejudices can be evaluated: computing statistics for the independence (among observations) ex. mutual information, χ2-statistics applying tests of independence (among sampled population) χ2-tests Relations between this prejudice and other existing measures for unfairness elift (extended lift) CV score (Calders-Verwer’s discrimination score) CV score with explainable features 11
  • 12. elift (extended lift) [Pedreschi+ 08, Ruggieri+ 10] conf(X = x, S= ) Y = ) elift (extended lift) = conf(X = x ) Y = ) the ratio of the confidence of a rule with additional condition to the confidence of a base rule The condition elift = 1 means that no unfair treatments, and it implies Pr[Y = , S= |X=x] = Pr[Y = |X=x] If this condition is the case for any x 2 Dom(X) (the condition of no direct discrimination in their definition) and S and Y are binary variables: Pr[Y, S|X] = Pr[Y |X] This is equivalent to the condition of no direct prejudice Similarly, their condition of no-indirect-discrimination is equivalent to that of no-indirect-prejudice condition 12
  • 13. CV Score (Calders-Verwer’s Discrimination Score) [Calders+ 10] Calders-Verwer discrimination score (CV score) Pr[Y = + |S=+] Pr[Y = + |S= ] The conditional probability of the advantageous decision for non-protected members subtracted by that for protected members The condition CV score = 0 means that neither unfair or affirmative treatments Pr[Y |S] = Pr[Y ] This is equivalent to the condition of no indirect prejudice 13
  • 14. Explainability [Zliobaite+ 11] non-discriminatory cases, even if distributions of an objective variable depends on a sensitive feature ex : admission to a university more females apply to medicine explainable feature more males apply to sensitive feature program computer gender medicine / computer male / female objective variable medicine : low acceptance acceptance ratio computer : high acceptance accept / not accept Because females tend to apply to a more competitive program, females are more frequently rejected Such difference is explainable and is considered as non-discriminatory 14
  • 15. Explainability To remove unfair treatments under such a condition, the probability of † receiving an advantageous decision, Pr [Y = + |X (E) ], is chosen so as to satisfy the condition: † Pr [Y = + |X (E) ] = Pr† [Y = + |X (E) , S=+] = Pr† [Y = + |X (E) , S= ] This condition implies Pr[Y |X (E) , S] = Pr[Y |X (E) ] This is equivalent to the condition of no indirect prejudice with explainable features ✴Notions that is similar to this explainability are proposed as a legally-grounded feature [Luong+ 11] and are considered in the rule generalization procedure [Haijan+ 12]. 15
  • 16. Connections with Other Techniques for Data Analysis 16
  • 17. Privacy-Preserving Data Mining indirect prejudice the dependence between an objective Y and a sensitive feature S from the information theoretic perspective, mutual information between Y and S is non-zero from the viewpoint of privacy-preservation, leakage of sensitive information when an objective variable is known Different points from PPDM introducing randomness is occasionally inappropriate for severe decisions, such as job application disclosure of identity isn’t problematic in FADM, generally 17
  • 18. Causal Inference [Pearl 2009] Causal inference a general technique for dealing with probabilistic causal relations Both FADM and causal inference can cope with the influence of sensitive information to an objective, but their premises are quite different While causal inference demands more information about causality among variables, it can handle the facts that were not occurred example of the connection between FADM and causal inference A person of S=+ actually receives Y=+ If S was changed to -, how was the probability of receiving Y=-? Under the condition called exogenous, the range of this probability is max[0, Pr[Y = + |S=+] Pr[Y = + |S= ]]  PNS  min[Pr[Y =+, X=+], Pr[Y = , X= ]] This lower bound coincides with a CV score 18
  • 19. Cost-Sensitive Learning [Elkan 2001] Cost-Sensitive Learning: learning classifiers so as to optimize classification costs, instead of maximizing prediction accuracies FADM can be regarded as a kind of cost-sensitive learning that pays the costs for taking fairness into consideration Cost matrix C(i | j): cost if a true class j is predicted as class i Total cost to minimize is formally defined as (if class Y=+ or -): P L(x, i) = j Pr[j | x]C(i | j) An object x is classified into the class i whose cost is minimized 19
  • 20. Cost-Sensitive Learning [Elkan 2001] Theorem 1 in [Elkan 2001] If negative examples in a data set is over-sampled by the factor of C(+| ) C( |+) and a classifier is learned from this samples, a classifier to optimize specified costs is obtained In a FADM case, an over-sampling technique is used for avoiding unfair treatments A corresponding cost matrix can be computed by this theorem, which connects a cost matrix and the class ratio in training data ✴Note that this over-sampling technique is simple and effective for avoiding unfair decisions, but its weak point that it completely ignores non-sensitive features 20
  • 21. Other Connected Techniques Legitimacy Data mining models can be deployed in the real world Independent Component Analysis Transformation while maintaining the independence between features Delegate Data To perform statistical tests, specific information is removed from data sets Dummy Query Dummy queries are inputted for protecting users’ demographics into search engines or recommender systems Visual Anonymization To protect identities of persons in images, faces or other information is blurred 21
  • 22. Conclusion Contributions We introduced the following topics to give an overview of fairness- aware data mining We showed other FADM applications besides avoiding discrimination: enhancing neutrality and ignoring uninteresting information. We discussed the relations between our notion of prejudice and other existing measures of unfairness. We showed the connections of FADM with privacy-preserving data mining, causal inference, cost-sensitive learning, and so on. Socially Responsible Mining Methods of data exploitation that do not damage people’s lives, such as fairness-aware data mining, PPDM, or adversarial learning, together comprise the notion of socially responsible mining, which it should become an important concept in the near future. 22
  • 23. Program Codes and Data Sets Fairness-Aware Data Mining http://www.kamishima.net/fadm Information Neutral Recommender System http://www.kamishima.net/inrs Acknowledgements This work is supported by MEXT/JSPS KAKENHI Grant Number 16700157, 21500154, 22500142, 23240043, and 24500194, and JST PRESTO 09152492 23