Learning from positive and unlabeled data

Learning from
Positive and Unlabeled Data
jessa.bekker@cs.kuleuven.be
people.cs.kuleuven.be/~jessa.bekker
Jessa Bekker

These minions
have diabetes
Machine Learning Reality
2
Please check the
others, Dr. Nefario

These
minions have
diabetes
Please check
the others, Dr.
Nefario
Learning with Positive and Unlabeled Data
Or….
We can use the data as is,
keeping in mind that the undiagnosed
minions might still have diabetes.
3

4
Binary Classification
Healthy minion
Minion with
diabetes

4
Healthy minion
Minion with
diabetes
Classifier: positive
Classifier: negative

4
Healthy minion
Minion with
diabetes
Classifier: positive
Classifier: negative
Negative labelPositive label

Supervised
5
Positive and
Unlabeled (PU)What we want
What we get in practice

Positive and Unlabeled Data is Everywhere
Medical records Incomplete gene/protein databasesBookmarks/likes
6

7

7
Tom
Age: 25
Sex: male
Known issues:
• Low vision
• Hot tibia
Jessa
Age: 27
Sex: female
Known issues:
• Lumbago
• Mono
Vincent
Age: 26
Sex: male
Known issues:
/

8

9

9
All have undesirable side effects 

9
All have undesirable side effects 
Complete database

10

What do I know?
• PhD Student @ Machine Learning Research Group, KU Leuven
• Fundamental research on learning with Positive and Unlabeled Data
• Estimating the Class Prior in Positive and Unlabeled Data through Decision
Tree Induction. AAAI, 2018.
• Positive and Unlabeled Relational Classification through Label Frequency
Estimation. ILP, 2017. (Most promising paper award)
• Ongoing work…
jessa.bekker@cs.kuleuven.be
people.cs.kuleuven.be/~jessa.bekker

Outline
1. The Easy Case
2. The Hard Case
3. The Extremely Hard Case
12

The Easy Case:
(Linear) Separability
13

Method 1: Biased Learning
Unlabeled = Negative
Strongly penalize wrongly classified positive examples
18

20
-1
-1
-1
-1 -1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1 -1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1 -1
-1
-1
-1
-1
-1
-1-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-45
-1

21
-1
-1
-1
-1 -1
-1
-1
-1
-1
-1
-1
-1
-1
-1 -1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1 -1
-1
-1
-1
-1
-1
-1000
-1031

22
-1
-1
-1
-1 -1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1 -1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1 -1
-1
-1
-1
-1
-1
-1-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-51
-1
-1
-1
-1-1
-1
-1

Method 2: Two-Step Strategy Techniques
1. Find reliable negatives
2. Traditional semi-supervised learning
23

24
y
x0 1 2 3 4 5 6
5
4
3
2
1
0
Positive features
𝑥 ≤ 2
𝑦 ≥ 2
34

24
y
x0 1 2 3 4 5 6
5
4
3
2
1
0
Positive features
𝑥 ≤ 2
𝑦 ≥ 2
35

2537
Semi-supervised
learning magic

The hard case:
Not separable but labels are
selected completely at random
26

Non-Separability Demands Probabilities
30
50%
100%
0%

Non-Separability is Difficult
31

Selected Completely At Random Assumption
Observed positive examples are
selected completely at random
from the positive set.
32

Selected
Completely
At Random
33
Not Selected
Completely
At Random
Pr 𝑙𝑎𝑏𝑒𝑙𝑒𝑑 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒, 𝑙𝑒𝑓𝑡 > Pr(𝑙𝑎𝑏𝑒𝑙𝑒𝑑|𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒, 𝑟𝑖𝑔ℎ𝑡)Pr 𝑙𝑎𝑏𝑒𝑙𝑒𝑑 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒, 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡

0%
50%
100%
0%
Relative Probabilities with
Selected Completely at Random
34

0%
50%
100%
0%
25%
50%
Relative Probabilities with
Selected Completely at Random
34

Learn Naive Classifier, then Scale
Naive classifier predicts probability of being labeled
Scale
Option 1: So that proportion of positives is correct
=> Need to know proportion of positives!
Option 2: So that the maximum probability is 1
35

The extremely hard case:
Not separable but labels are
selected conditionally at random
36

Selected Conditionally At Random Assumption
Observed positive examples are
selected conditionally at random
from the positive set,
conditioned on the attributes.
The probability of a positive example to be selected
is a function of (some of) the attributes in the data,
called propensity score.
37

Selected
Completely
At Random
38
Selected
Conditionally
At Random

Selected
Completely
At Random
38
Selected
Conditionally
At Random
For each
amount of :
Random

Learn Naive Classifier, then scale
Scale
Use propensity score function
39

Learn Naive Classifier, then scale
Scale
Use propensity score function
=> Need to know propensity score function!
39

Learn Classifier and Propensity Score
Simultaneously
Use available knowledge
• Attributes in propensity score function
• Proportion of positives
• Domain knowledge that classifier must adhere to
40

Conclusions
• PU learning is very useful in practice
• We need assumptions to learn from PU data
• Linearly separable
• Selected completely at random
 Scale probabilities
• Use proportion of positives
• Maximum scale
• Selected conditionally at random
Use propensity score
• Ongoing work
41

Learning from positive and unlabeled data

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Learning from positive and unlabeled data

Similaire à Learning from positive and unlabeled data (20)

Plus de Data Science Leuven

Plus de Data Science Leuven (20)

Dernier

Dernier (20)

Learning from positive and unlabeled data