SlideShare a Scribd company logo
1 of 37
On the Stratification of
   Multi-Label Data
  Konstantinos Sechidis, Grigorios Tsoumakas, Ioannis Vlahavas


  Machine Learning & Knowledge Discovery Group
  Department of Informatics
  Aristotle University of Thessaloniki
  Greece
Stratified Sampling
   Sampling plays a key role in practical machine
    learning and data mining
       Exploration and efficient processing of vast data
       Generation of training, validation and test sets for accuracy
        estimation, model selection, hyper-parameter selection and
        overfitting avoidance (e.g. reduced error pruning)
   The stratified version of sampling is typically used in
    classification tasks
       The proportion of the examples of each class in a sample
        of a dataset follows that of the full dataset
       It has been found to improve standard cross-validation both
        in terms of bias and variance of estimate (Kohavi, 1995)
Stratifying Multi-Label Data
   Instances associated with a subset
    of a fixed set of labels
                                     Male, Horse, Natural,
                                       Animals, Sunny,
                                       Day, Mountains,
                                     Clouds, Sky, Plants,
                                           Outdoor
Stratifying Multi-Label Data
   Random sampling is typically used in the literature
   We consider two main approaches for the
    stratification of multi-label data
       Stratified sampling based on labelsets (label combinations)
           The number of labelsets is often quite large and each labelset
            is associated with very few examples, rendering this approach
            impractical
       Set as goal the maintenance of the distribution of positive
        and negative examples of each label
           This views the problem independently for each label
           It cannot be achieved by simple independent stratification of
            each label, as the produced subsets need to be the same
           Our solution: iterative stratification of labels
Stratification Based on Labelsets
                                     1st Fold


instance   λ1   λ2   λ3   labelset
   i1      1    0    1       5

   i2      0    0    1       1
                             2       2nd Fold
   i3      0    1    0
                             4
   i4      1    0    0
                             3
   i5      0    1    1
                             6
   i6      1    1    0
                             5
   i7      1    0    1               3rd Fold
                             5
   i8      1    0    1       1
   i9      0    0    1
Stratification Based on Labelsets
                                          1st Fold
                                     i1      1   0   1   5
instance   λ1   λ2   λ3   labelset   i2      0   0   1   1
   i1      1    0    1       5       i3      0   1   0   2
   i2      0    0    1       1
                             2            2nd Fold
   i3      0    1    0
                             4       i7      1   0   1   5
   i4      1    0    0
                             3       i9      0   0   1   1
   i5      0    1    1
                             6       i4      1   0   0   4
   i6      1    1    0
                             5
   i7      1    0    1                    3rd Fold
                             5
   i8      1    0    1       1       i8      1   0   1   5
   i9      0    0    1               i5      0   1   1   3
                                     i6      1   1   0   6
Statistics of Multi-Label Data

                                                    examples per        examples per
                                                      labelset             label
                                        labelsets
    dataset   labels examples labelsets      /
                                        examples    min   avg   max    min   avg     max
     Scene       6     2407       15       0.01      1    160    405   364    431    533
    Emotions     6      593       27       0.05      1     22    81    148    185    264
   TMC2007      22     28596     1341      0.05      1     21   2486   441   2805   16173
    Genbase     27      662       32       0.05      1     21    170    1      31    171
      Yeast     14     2417       198      0.08      1     12    237    34    731    1816
     Medical    45      978       94       0.1       1     10    155    1      27    266
    Mediamill   101    43907     6555      0.15      1     7    2363    31   1902   33869
   Bookmarks    208    87856    18716      0.21      1     5    6087   300    857    6772
     Bibtex     159    7395      2856      0.39      1     3     471    51    112    1042
      Enron     53     1702       753      0.44      1     2     163    1     108    913
     Corel5k    374    5000      3175      0.64      1     2     55     1      47    1120
ImageCLEF2010 93       8000      7366      0.92      1     1     32     12   1038    7484
    Delicious   983    16105    15806      0.98      1     1     19     21    312    6495
Iterative Stratification Algorithm
   Select the label with the fewest remaining examples
       If rare labels are not examined in priority, they may be
        distributed in an undesired way, beyond subsequent repair
       For frequent labels, we have the chance to modify the
        current distribution towards the desired one in a subsequent
        iteration, due to the availability of more examples
   For each example of this label, select the subset with
       The largest desired number of examples for this label
       The largest desired number of examples, in case of ties
       Further ties are broken randomly
   Update statistics
       Desired number of examples per label at each subset

Note: No hard constrain on the desired number of examples
Example
                                1st Fold



Instance   λ1   λ2   λ3

   i1      1    0    1    desired   1.7    1   2
   i2      0    0    1
                                2nd Fold
   i3      0    1    0
   i4      1    0    0
   i5      0    1    1
   i6      1    1    0    desired   1.7    1   2

   i7      1    0    1
                                3rd Fold
   i8      1    0    1
   i9      0    0    1
 sum       5    3    6
                          desired   1.7    1   2
Example
                                                    1st Fold
                          Firstly
                          Distribute the
Instance   λ1   λ2   λ3
                          positive examples
   i1      1    0    1    of λ2
                                              desired   1.7    1   2
   i2      0    0    1
                                                    2nd Fold
   i3      0    1    0
   i4      1    0    0
   i5      0    1    1
   i6      1    1    0                        desired   1.7    1   2

   i7      1    0    1
                                                    3rd Fold
   i8      1    0    1
   i9      0    0    1
 sum       5    3    6
                                              desired   1.7    1   2
Example
                                                     1st Fold
                          Firstly               i3        0     1   0
                          Distribute the
Instance   λ1   λ2   λ3
                          positive examples
   i1      1    0    1    of λ2
                                              desired    1.7    0   2
   i2      0    0    1
                                                     2nd Fold


   i4      1    0    0
   i5      0    1    1
   i6      1    1    0                        desired    1.7    1   2

   i7      1    0    1
                                                     3rd Fold
   i8      1    0    1
   i9      0    0    1
 sum       5    2    6
                                              desired    1.7    1   2
Example
                                                     1st Fold
                          Firstly               i3        0     1   0
                          Distribute the
Instance   λ1   λ2   λ3
                          positive examples
   i1      1    0    1    of λ2
                                              desired    1.7    0   2
   i2      0    0    1
                                                     2nd Fold


   i4      1    0    0


   i6      1    1    0                        desired    1.7    1   2

   i7      1    0    1
                                                     3rd Fold
   i8      1    0    1                          i5        0     1   1
   i9      0    0    1
 sum       5    1    5
                                              desired    1.7    0   1
Example
                                                     1st Fold
                          Firstly               i3        0     1   0
                          Distribute the
Instance   λ1   λ2   λ3
                          positive examples
   i1      1    0    1    of λ2
                                              desired    1.7    0   2
   i2      0    0    1
                                                     2nd Fold
                                                i6        1     1   0
   i4      1    0    0


                                              desired    0.7    0   2

   i7      1    0    1
                                                     3rd Fold
   i8      1    0    1                          i5        0     1   1
   i9      0    0    1
 sum       4    -    5
                                              desired    1.7    0   1
Example
                                                           1st Fold
                          Secondly                    i3        0     1   0
                          Distribute the positive
Instance   λ1   λ2   λ3   examples of λ1
   i1      1    0    1                              desired    1.7    0   2
   i2      0    0    1
                                                           2nd Fold
                                                      i6        1     1   0
   i4      1    0    0


                                                    desired    0.7    0   2

   i7      1    0    1
                                                           3rd Fold
   i8      1    0    1                                i5        0     1   1
   i9      0    0    1
 sum       4    -    5
                                                    desired    1.7    0   1
Example
                                                           1st Fold
                          Secondly                    i3        0     1   0
                          Distribute the positive
                                                      i1        1     0   1
Instance   λ1   λ2   λ3   examples of λ1

                                                    desired    0.7    0   1
   i2      0    0    1
                                                           2nd Fold
                                                      i6        1     1   0
   i4      1    0    0


                                                    desired    0.7    0   2

   i7      1    0    1
                                                           3rd Fold
   i8      1    0    1                                i5        0     1   1
   i9      0    0    1
 sum       3    -    4
                                                    desired    1.7    0   1
Example
                                                           1st Fold
                          Secondly                    i3        0     1   0
                          Distribute the positive
                                                      i1        1     0   1
Instance   λ1   λ2   λ3   examples of λ1

                                                    desired    0.7    0   1
   i2      0    0    1
                                                           2nd Fold
                                                      i6        1     1   0




                                                    desired    0.7    0   2

   i7      1    0    1
                                                           3rd Fold
   i8      1    0    1                                i5        0     1   1
   i9      0    0    1                                i4        1     0   0
 sum       2    -    4

                                                    desired    0.7    0   1
Example
                                                           1st Fold
                          Secondly                    i3        0     1   0
                          Distribute the positive
                                                      i1        1     0   1
Instance   λ1   λ2   λ3   examples of λ1

                                                    desired    0.7    0   1
   i2      0    0    1
                                                           2nd Fold
                                                      i6         1    1   0
                                                      i7         1    0   1


                                                    desired    -0.3   0   1

                                                           3rd Fold
   i8      1    0    1                                i5        0     1   1
   i9      0    0    1                                i4        1     0   0
 sum       1    -    3

                                                    desired    0.7    0   1
Example
                                                              1st Fold
                          Secondly                    i3           0     1   0
                          Distribute the positive
                                                      i1           1     0   1
Instance   λ1   λ2   λ3   examples of λ1
                                                      i8           1     0   1
                                                    desired       -0.3   0   0
   i2      0    0    1
                                                           2nd Fold
                                                      i6           1     1   0
                                                      i7           1     0   1


                                                    desired       -0.3   0   1

                                                           3rd Fold
                                                      i5           0     1   1
   i9      0    0    1                                i4           1     0   0
 sum       -    -    2

                                                    desired       0.7    0   1
Example
                                                              1st Fold
                          Thirdly                     i3           0     1   0
                          Distribute the positive
                                                      i1           1     0   1
Instance   λ1   λ2   λ3   examples of λ3
                                                      i8           1     0   1
                                                    desired       -0.3   0   0
   i2      0    0    1
                                                           2nd Fold
                                                      i6           1     1   0
                                                      i7           1     0   1


                                                    desired       -0.3   0   1

                                                           3rd Fold
                                                      i5           0     1   1
   i9      0    0    1                                i4           1     0   0
 sum       -    -    2

                                                    desired       0.7    0   1
Example
                                                              1st Fold
                          Thirdly                     i3           0     1   0
                          Distribute the positive
                                                      i1           1     0   1
Instance   λ1   λ2   λ3   examples of λ3
                                                      i8           1     0   1
                                                    desired       -0.3   0   0
                                                           2nd Fold
                                                      i6           1     1   0
                                                      i7           1     0   1
                                                      i2           0     0   1
                                                    desired       -0.3   0   0

                                                           3rd Fold
                                                      i5           0     1   1
   i9      0    0    1                                i4           1     0   0
 sum       -    -    1

                                                    desired       0.7    0   1
Example
                                                              1st Fold
                          Thirdly                     i3           0     1   0
                          Distribute the positive
                                                      i1           1     0   1
Instance   λ1   λ2   λ3   examples of λ3
                                                      i8           1     0   1
                                                    desired       -0.3   0   0
                                                           2nd Fold
                                                      i6           1     1   0
                                                      i7           1     0   1
                                                      i2           0     0   1
                                                    desired       -0.3   0   0

                                                           3rd Fold
                                                      i5           0     1   1
                                                      i4           1     0   0
 sum       -    -    -                                i9           0     0   1
                                                    desired       0.7    0   0
The Triggering Event
   Implementation of evaluation software
       Stratification of multi-label data concerned us a while ago
        during the development of the Mulan open-source library
   However, a more practical issue triggered this work
       During our participation at ImageCLEF 2010, x-validation
        experiments led to subsets without positive examples for
        some labels, and problems in the calculation of the main
        evaluation measure of the challenge, Mean Avg Precision
Subsets Without Label Examples
   When can this happen?
       When there are rare labels
   Problems in calculation of evaluation measures
       A test set without positive examples for a label (fn=tp=0)
        renders recall undefined, and so gets F1, AUC and MAP
       Furthermore, if the model is correct (fp=0) then
        precision is undefined

                                Predicted
                            negative positive   Recall: tp/(tp+fn)
                 negative      tn       fp      Precision: tp/(tp+fp)
        Actual
                 positive      fn       tp
Comparison of the Approaches
intends to maintain joint distribution

           random               based on labelsets                iterative
           1st Fold                      1st Fold                  1st Fold
      i1      1   0   1   5       i1        1   0   1   5   i3        0   1   0   2
      i2      0   0   1   1       i2        0   0   1   1   i1        1   0   1   5
      i3      0   1   0   2       i3        0   1   0   2   i8        1   0   1   5
           2nd Fold                      2nd Fold                 2nd Fold
      i4      1   0   0   4       i7        1   0   1   5   i6        1   1   0   6
      i5      0   1   1   3       i9        0   0   1   1   i7        1   0   1   5
      i6      1   1   0   6       i4        1   0   0   4   i2        0   0   1   1
           3rd Fold                      3rd Fold                  3rd Fold
      i7      1   0   1   5        i8       1   0   1   5    i5       0   1   1   3
      i8      1   0   1   5        i5       0   1   1   3    i4       1   0   0   4
      i9      0   0   1   1        i6       1   1   0   6    i9       0   0   1   1

                      intends to maintain marginal distribution
Experiments
   Sampling approaches
       Random (R)
       Stratified sampling based on labelsets (L)
       Iterative stratification algorithm (I)
   We experiment on 13 multi-label datasets
       10-fold CV on datasets with up to 15k examples and
       Holdout (2/3 for training and 1/3 for testing) on larger ones
   Experiments are repeated 5 times with different
    random orderings of the training examples
       Presented results are averages over these 5 experiments
Distribution of Labels & Examples
   Notation
       q labels, k subsets, cj desired examples in subset j,
       Di: set of examples of label i, Sj: set of examples in subset j
       Sij: set of examples of label i in subset j
   Labels distribution (LD) and examples distribution (ED)
        1     q1 k         S ij       Di            1 k
    LD = ∑  ∑                     −          ÷   ED = ∑ S j − c j
        q i =1  k j =1 S j − S ji
                                     D − Di   ÷       k j =1
                                             
   Subsets without positive examples
       Number of folds that contain at least one label with
        zero positive examples (FZ), number of fold-label
        pairs with zero positive examples (FLZ)
0.2
                                                                                                                                      0.4
                                                                                                                                            0.6
                                                                                                                                                  0.8




                                                                                                                            0
                                                                                                                                                        1
                                                                                      Sc
                                                                                            en
                                                                                                   e
                                                                                                        (0
                                                                                                             .0
                                                                                                                  1)
                                                                                 Em
                                                                                      oti
                                                                                            on
                                                                                                   s(
                                                                                                         0.
                                                                                                              05
                                                                                                                    )
                                                                                 Ge
                                                                                      nb
                                                                                             as
                                                                                                   e
                                                                                                        (0
                                                                                                             .0
                                                                                                                  5)
                                                                                 TM
                                                                                      C2
                                                                                             00
                                                                                                   7
                                                                                                         (0
                                                                                                              .0
                                                                                                                   5)

                                                                                         Ye
                                                                                               as
                                                                                                     t(
                                                                                                          0.
                                                                                                            08
                                                                                                                        )
                                                                                     M
                                                                                         ed
                                                                                              ic a
                                                                                                     l(
                                                                                                          0.
                                                                                                               10
                                                                                                                       )
                                                                                 M
                                                                                     ed
                                                                                          ia m
                                                                                                  ill
                                                                                                        (0
                                                                                                             .1
                                                                                                                  5)
                                                                            Bo
                                                                                 ok
                                                                                      m
                                                                                          ar
                                                                                               ks
                                                                                                        (0
                                                                                                             .2
                                                                                                                  1)

                                                                                      Bi
                                                                                            bte
                                                                                                   x(
                                                                                                          0.
                                                                                                               39
                                                                                                                        )
                                                                                                                                                            Labels Distribution Normalized to the LD value of R




                                                                                         En
                                                                                               ro
                                                                                                    n(
                                                                                                             0.
                                                                                                                  44
                                                                                                                        )
                                                                                     Co
                                                                                            re
                                                                                                 l5 k
                                                                  Im                                     (0
                                                                       ag                                     .9
                                                                            eC                                     2)
                                                                                 LE
Datasets are sorted in increasing order of #labelsets/#examples



                                                                                      F2
                                                                                         0    10
                                                                                                        (0
                                                                                                             .9
                                                                                                                2   )
                                                                                 De
                                                                                      lic
                                                                                            io u
                                                                                                   s(
                                                                                                         0.
                                                                                                               98
                                                                                                                    )
                                                                                                                                                                                                                  Labels Distribution (normalized)
                                                                                                                                                        I
                                                                                                                                                        L
                                                                                                                                                        R
Examples Distribution
                        80
                                                                                                                                                                                                                                    R

                        70                                                                                                                   Iterative stratification                                                               L
                                                                                                                                                                                                                                    I
                                                                                                                                        trades off example distribution
                        60
                                                                                                                                              for label distribution
                        50
                                                                                                                                             The larger the dataset
Examples Distribution




                        40                                                                                                              the larger the deviation from the
                                                                                                                                          desired number of examples
                        30


                        20
                                                                                                                                   Mediamill contains 1730 examples
                                                                                                                                        without any annotation
                        10

                                                        ?
                         0




                                                                                                                                                                                  4)
                                                                                                                                        2)




                                                                                                                                                                                                                5)
                                                                       )




                                                                                                                         9)




                                                                                                                                                       )




                                                                                                                                                                     )




                                                                                                                                                                                                 0)
                                                                                      8)
                                                        5)




                                                                                                                                                                                                                               5)
                                       1)




                                                                                                           )
                                                                      05




                                                                                                                                                                    01
                                                                                                                                                    08
                                                                                                            2




                                                                                                                                                                                  .4
                                                                                                                                        .9




                                                                                                                                                                                                 .1




                                                                                                                                                                                                                .0
                                                                                                                       .3
                                                                                      .9
                                                     .1
                                       .2




                                                                                                                                                                                                                               .0
                                                                                                         .9




                                                                                                                                                  0.
                                                                     .




                                                                                                                                                                     .



                                                                                                                                                                              (0




                                                                                                                                                                                                            (0
                                                                                                                                    (0




                                                                                                                                                                                                (0
                                                                  (0




                                                                                                                                                                  (0
                                                                                     (0




                                                                                                                     (0
                                                     (0
                                   (0




                                                                                                                                                                                                                             (0
                                                                                                      (0




                                                                                                                                                  t(




                                                                                                                                                                              n



                                                                                                                                                                                           al




                                                                                                                                                                                                            e
                                                                  7




                                                                                                                                                             e
                                                                                                                                   k
                                                                                                                  ex
                                                                                us
                                                 il l
                                  ks




                                                                                                                                                                                                                          ns
                                                                                                 10




                                                                                                                                              as
                                                                                                                                   l5
                                                               00




                                                                                                                                                                          ro




                                                                                                                                                                                                           as
                                                                                                                                                             en




                                                                                                                                                                                           ic
                                                m




                                                                                                                bt
                                                                                io
                              ar




                                                                                                                               re




                                                                                                                                                                                                                        io
                                                                                                  0




                                                                                                                                             Ye




                                                                                                                                                                         En




                                                                                                                                                                                       ed



                                                                                                                                                                                                       nb
                                                                                                                                                           Sc
                                                ia



                                                             C2




                                                                                                                Bi
                                                                            lic




                                                                                               F2




                                                                                                                                                                                                                      ot
                              m




                                                                                                                              Co
                                            ed




                                                                                                                                                                                       M



                                                                                                                                                                                                      Ge



                                                                                                                                                                                                                     Em
                                                                           De
                             ok




                                                          TM




                                                                                           LE
                                            M
                         Bo




                                                                                          eC
                                                                                      ag
                                                                                     Im




                         Datasets are sorted in decreasing order of #examples
Subsets Without Label Examples
                                 examples per label        FZ             FLZ
    dataset  labels labelsets /
                     examples min avg          max    R    L     I  R    L     I
     Scene      6      0.01     364 431        533     0    0   0   0     0    0
    Emotions    6      0.05     148 185        264     0    0   0   0     0    0
    Genbase    27      0.05      1   31        171    10   10   10 90    77   74
      Yeast    14      0.08      34 731        1816    1    0   0   1     0    0
     Medical   45       0.1      1   27        266    10   10   10 203 179 173
     Bibtex    159     0.39      51 112        1042    1    1   0   1     1    0
      Enron    53      0.44      1  108        913    10   10   10 95    88   47
     Corel5k   374     0.64      1   47        1120   10   10   10 1140 1118 788
ImageCLEF2010 93       0.92      12 1038       7484    4    4   0   4     0    0

- Iterative stratification produces the lowest FZ & FLZ in all datasets
- All schemes fail in Genbase, Medical, Enron and Corel5k due to label rarity
- All schemes do well in Scene, Emotions, where examples per label abound
- Only iterative stratification does well in Bibtex and ImageCLEF2010
Variance of 10-fold CV Estimates
   Algorithms
       Binary Relevance (one-versus-rest)
       Calibrated Label Ranking (Fürnkranz et al., 2008)
            Combination of pairwise and one-versus-rest models
            Considers label dependencies
   Measures
              Measure        Required type of output
            Hamming Loss           Bipartition
         Subset Accuracy           Bipartition
              Coverage              Ranking
            Ranking Loss            Ranking
    Mean Average Precision        Probabilities
        Micro-averaged AUC        Probabilities
Average Ranking for BR (1/3)
          On all 9 datasets               Average Rank for Stadard Deviation using BR

                                                                                                                    R
 3                                                                                   6 fails        4 fails         L
                                                                                                                    I


                                                                                  7 fails
2.5




 2




1.5




 1
                                                   e




                                                                                               AP




                                                                                                                C
                                                                             ss
                   ss




                                                 ag
                                      cy




                                                                                                               AU
                                                                          Lo
                Lo




                                                                                               M
                                   ra




                                                 r
                                              ve




                                                                                                             o-
                                 cu




                                                                         g
               g




                                            Co




                                                                                                         icr
                                                                        in
           in




                               Ac




                                                                      nk
           m




                                                                                                         M
          m




                              et




                                                                    Ra
      Ha




                          bs
                         Su




                        only based on scene and emotions
Average Ranking for BR (2/3)
         On 5 datasets where #labelsets/#examples ≤ 0.1
                             Average Rank for Stadard Deviation with ratio of labelsets over examples less or erqual than 0.1 using BR

                                                                                                                                                          R
 3
                                                                                                        3 fails                          2 fails          L
                                                                                                                                                          I


                                                                                                     3 fails
2.5




 2




1.5




 1
                                                                      e




                                                                                                                          AP




                                                                                                                                                      C
                                                                                                ss
                  ss




                                                                 ag
                                     cy




                                                                                                                                                     AU
                                                                                            Lo
              Lo




                                                                                                                        M
                                  ra




                                                                  r
                                                               ve




                                                                                                                                                   o-
                                cu




                                                                                            g
              g




                                                            Co




                                                                                                                                               icr
                                                                                        ki n
          in




                              Ac
          m




                                                                                                                                              M
                                                                                        n
         m




                             et




                                                                                     Ra
      Ha




                         bs
                        Su




                       only based on scene and emotions
Average Ranking for BR (3/3)
         On 4 datasets where #labelsets/#examples ≥ 0.39
                           Average Rank for Stadard Deviation with ratio of labelsets over examples greter or erqual than 0.39 using BR

                                                                                                                                                     R
 3                                                                                                                                                   L
                                                                                                                                                     I




2.5




 2




1.5




 1
                                                                                   e




                                                                                                                                                 C
                                                                                                                    ss
                      ss




                                                                               ag
                                               cy




                                                                                                                                                AU
                                                                                                                  Lo
                    Lo




                                            ra




                                                                               r
                                                                            ve




                                                                                                                                              o-
                                           u




                                                                                                              g
                g




                                        cc




                                                                          Co




                                                                                                                                          icr
                                                                                                             in
               in




                                                                                                           nk
                                        A
            m




                                                                                                                                          M
           m




                                     et




                                                                                                         Ra
          Ha




                                   bs
                                 Su




                                                                                                      Fails in MAP – R: 4, L: 4, I: 2
Average Ranking for CLR
          On 5 datasets with #labels < 50 for complexity
           reasons (those that #labelsets/#examples ≤ 0.1)
                     Average Rank for Stadard Deviation with ratio of labelsets over examples less or erqual than 0.1 using CLR

                                                                                                                                                 R
 3                                                                                        3 fails                                 2 fails        L
                                                                                                                                                 I




2.5

                                                                                               3 fails
 2




1.5




 1
                                                            e
              ss




                                                                                          ss




                                                                                                                                             C
                                y




                                                                                                                   P
                                                          ag
                              ac




                                                                                                                                            U
                                                                                                                  A
                                                                                        Lo
            Lo




                                                                                                                                           -A
                                                                                                                 M
                                                       er
                            ur




                                                                                   g
                                                     ov
            g




                                                                                                                                         ro
                         cc
          in




                                                                                    n




                                                                                                                                       ic
                                                    C




                                                                                 ki
                        tA
         m




                                                                                                                                      M
                                                                               an
       am




                      se




                                                                              R
                    ub
      H




                   S




                   only based on scene and emotions
BR vs CLR
           On 5 datasets where #labelsets/#examples ≤ 0.1
                             Comparison between BR and CLR for datasets with ratio of labelsets over examples less or erqual than 0.1

                                                                                                                                              BR with R
 3                                                                                                                                            CLR with R
                                                                                                                                              BR with L
                                                                                                                                              CLR with L
                                                                                                                                              BR with I
                                                                                                                                              CLR with I
2.5




 2




1.5




 1
                                                              e




                                                                                                              AP




                                                                                                                                          C
                                                                                       ss
                 ss




                                                           ag
                                  cy




                                                                                                                                        AU
                                                                                      Lo
             Lo




                                                                                                             M
                               ra




                                                           r
                                                        ve




                                                                                                                                        o-
                                u




                                                                                  g
             g




                             cc




                                                      Co




                                                                                                                                    icr
                                                                                 in
           in




                           A




                                                                               nk
        m




                                                                                                                                   M
       m




                        et




                                                                             Ra
      Ha




                        bs
                      Su




      Iterative stratification suits BR                                                                   Labelsets-based suits CLR
Conclusions
   Labelsets-based stratification
       Works well when #labelsets/#examples is small
       Works well with Calibrated Label Ranking
   Iterative stratification
       Works well when #labelsets/#examples is large
       Works well with Binary Relevance
       Works well for estimating the Ranking Loss
       Handles rare labels in a better way
       Maintains the imbalance ratio of each label in each subset
   Random sampling
       Is consistently worse and should be avoided, contrary to
        the typical multi-label experimental setup of the literature
Future Work
   Iterative stratification
       Investigate the effect of changing the algorithm to respect
        the desired number of examples at each subset
   Hybrid approach
       Stratification based on labelsets of the examples of
        frequent labelsets
       Iterative stratification for the rest of the examples
   Sampling and generalization performance
       Conduct statistically valid experiments to assess the quality
        of the sampling schemes in terms of estimating the test
        error (unbiased and low variance)

More Related Content

What's hot

PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6
PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6
PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6
Hiroyuki Kato
 

What's hot (20)

理解して使うRNA Velocity解析ツール-最近のツール編
理解して使うRNA Velocity解析ツール-最近のツール編理解して使うRNA Velocity解析ツール-最近のツール編
理解して使うRNA Velocity解析ツール-最近のツール編
 
構造方程式モデルによる因果探索と非ガウス性
構造方程式モデルによる因果探索と非ガウス性構造方程式モデルによる因果探索と非ガウス性
構造方程式モデルによる因果探索と非ガウス性
 
Reinforcement Learning @ NeurIPS2018
Reinforcement Learning @ NeurIPS2018Reinforcement Learning @ NeurIPS2018
Reinforcement Learning @ NeurIPS2018
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
 
大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック
大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック 大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック
大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック
 
[DLHacks]StyleGANとBigGANのStyle mixing, morphing
[DLHacks]StyleGANとBigGANのStyle mixing, morphing[DLHacks]StyleGANとBigGANのStyle mixing, morphing
[DLHacks]StyleGANとBigGANのStyle mixing, morphing
 
順推論と逆推論
順推論と逆推論順推論と逆推論
順推論と逆推論
 
Bayesian Neural Networks : Survey
Bayesian Neural Networks : SurveyBayesian Neural Networks : Survey
Bayesian Neural Networks : Survey
 
最近傍探索と直積量子化(Nearest neighbor search and Product Quantization)
最近傍探索と直積量子化(Nearest neighbor search and Product Quantization)最近傍探索と直積量子化(Nearest neighbor search and Product Quantization)
最近傍探索と直積量子化(Nearest neighbor search and Product Quantization)
 
[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain Randomization[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain Randomization
 
【DL輪読会】The Forward-Forward Algorithm: Some Preliminary
【DL輪読会】The Forward-Forward Algorithm: Some Preliminary【DL輪読会】The Forward-Forward Algorithm: Some Preliminary
【DL輪読会】The Forward-Forward Algorithm: Some Preliminary
 
PRML 2.3節
PRML 2.3節PRML 2.3節
PRML 2.3節
 
非ガウス性を利用した 因果構造探索
非ガウス性を利用した因果構造探索非ガウス性を利用した因果構造探索
非ガウス性を利用した 因果構造探索
 
PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6
PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6
PRML上巻勉強会 at 東京大学 資料 第2章2.3.3 〜 2.3.6
 
ゼロから始める転移学習
ゼロから始める転移学習ゼロから始める転移学習
ゼロから始める転移学習
 
How to use in R model-agnostic data explanation with DALEX & iml
How to use in R model-agnostic data explanation with DALEX & imlHow to use in R model-agnostic data explanation with DALEX & iml
How to use in R model-agnostic data explanation with DALEX & iml
 
AlphaGoのしくみ
AlphaGoのしくみAlphaGoのしくみ
AlphaGoのしくみ
 
zk-SNARKsの仕組みについて
zk-SNARKsの仕組みについてzk-SNARKsの仕組みについて
zk-SNARKsの仕組みについて
 
探索と活用の戦略 ベイズ最適化と多腕バンディット
探索と活用の戦略 ベイズ最適化と多腕バンディット探索と活用の戦略 ベイズ最適化と多腕バンディット
探索と活用の戦略 ベイズ最適化と多腕バンディット
 
MCMC法
MCMC法MCMC法
MCMC法
 

Viewers also liked

5 M Web Ex Run Chart Analysis Slides 04.02.08
5 M Web Ex Run Chart Analysis Slides 04.02.085 M Web Ex Run Chart Analysis Slides 04.02.08
5 M Web Ex Run Chart Analysis Slides 04.02.08
David Velasquez
 
The way to teach and deploy Lean and Six Sigma is broken 2015 03 24
The way to teach and deploy Lean and Six Sigma is broken 2015 03 24The way to teach and deploy Lean and Six Sigma is broken 2015 03 24
The way to teach and deploy Lean and Six Sigma is broken 2015 03 24
Francisco Pulgar-Vidal, MBA, Lean Six Sigma MBB
 
Lean Six Sigma Green Belt roadmap poster
Lean Six Sigma Green Belt roadmap posterLean Six Sigma Green Belt roadmap poster
Lean Six Sigma Green Belt roadmap poster
Francisco Pulgar-Vidal, MBA, Lean Six Sigma MBB
 

Viewers also liked (13)

fkiQuality overview 2014 HR experience
fkiQuality overview 2014 HR experiencefkiQuality overview 2014 HR experience
fkiQuality overview 2014 HR experience
 
5 M Web Ex Run Chart Analysis Slides 04.02.08
5 M Web Ex Run Chart Analysis Slides 04.02.085 M Web Ex Run Chart Analysis Slides 04.02.08
5 M Web Ex Run Chart Analysis Slides 04.02.08
 
152 4 Lean Six Sigma makes problem solving exciting
152 4 Lean Six Sigma makes problem solving exciting152 4 Lean Six Sigma makes problem solving exciting
152 4 Lean Six Sigma makes problem solving exciting
 
Spc implementation flow chart
Spc implementation flow chartSpc implementation flow chart
Spc implementation flow chart
 
SPC Basics Training V1 By Carlos Sanchez
SPC Basics Training V1 By Carlos SanchezSPC Basics Training V1 By Carlos Sanchez
SPC Basics Training V1 By Carlos Sanchez
 
171 Red beads The company as a system - Essential Lean 2014 01
171 Red beads   The company as a system - Essential Lean 2014 01171 Red beads   The company as a system - Essential Lean 2014 01
171 Red beads The company as a system - Essential Lean 2014 01
 
170 Fundamentals of Lean Thinking 2014 01
170 Fundamentals of Lean Thinking 2014 01170 Fundamentals of Lean Thinking 2014 01
170 Fundamentals of Lean Thinking 2014 01
 
152-3 Lean six sigma gets you from coping to solving operational problems
152-3 Lean six sigma gets you from coping to solving operational problems152-3 Lean six sigma gets you from coping to solving operational problems
152-3 Lean six sigma gets you from coping to solving operational problems
 
The way to teach and deploy Lean and Six Sigma is broken 2015 03 24
The way to teach and deploy Lean and Six Sigma is broken 2015 03 24The way to teach and deploy Lean and Six Sigma is broken 2015 03 24
The way to teach and deploy Lean and Six Sigma is broken 2015 03 24
 
Six sigma green belt project roadmap in 10 deliverables 2013 07
Six sigma green belt project roadmap in 10 deliverables 2013 07Six sigma green belt project roadmap in 10 deliverables 2013 07
Six sigma green belt project roadmap in 10 deliverables 2013 07
 
4 things Project Managers and Green Belts should learn from one another
4 things Project Managers and Green Belts should learn from one another4 things Project Managers and Green Belts should learn from one another
4 things Project Managers and Green Belts should learn from one another
 
Lean Six Sigma Green Belt roadmap poster
Lean Six Sigma Green Belt roadmap posterLean Six Sigma Green Belt roadmap poster
Lean Six Sigma Green Belt roadmap poster
 
Six Sigma Sample Project
Six Sigma Sample ProjectSix Sigma Sample Project
Six Sigma Sample Project
 

On the Stratification of Multi-Label Data

  • 1. On the Stratification of Multi-Label Data Konstantinos Sechidis, Grigorios Tsoumakas, Ioannis Vlahavas Machine Learning & Knowledge Discovery Group Department of Informatics Aristotle University of Thessaloniki Greece
  • 2. Stratified Sampling  Sampling plays a key role in practical machine learning and data mining  Exploration and efficient processing of vast data  Generation of training, validation and test sets for accuracy estimation, model selection, hyper-parameter selection and overfitting avoidance (e.g. reduced error pruning)  The stratified version of sampling is typically used in classification tasks  The proportion of the examples of each class in a sample of a dataset follows that of the full dataset  It has been found to improve standard cross-validation both in terms of bias and variance of estimate (Kohavi, 1995)
  • 3. Stratifying Multi-Label Data  Instances associated with a subset of a fixed set of labels Male, Horse, Natural, Animals, Sunny, Day, Mountains, Clouds, Sky, Plants, Outdoor
  • 4. Stratifying Multi-Label Data  Random sampling is typically used in the literature  We consider two main approaches for the stratification of multi-label data  Stratified sampling based on labelsets (label combinations)  The number of labelsets is often quite large and each labelset is associated with very few examples, rendering this approach impractical  Set as goal the maintenance of the distribution of positive and negative examples of each label  This views the problem independently for each label  It cannot be achieved by simple independent stratification of each label, as the produced subsets need to be the same  Our solution: iterative stratification of labels
  • 5. Stratification Based on Labelsets 1st Fold instance λ1 λ2 λ3 labelset i1 1 0 1 5 i2 0 0 1 1 2 2nd Fold i3 0 1 0 4 i4 1 0 0 3 i5 0 1 1 6 i6 1 1 0 5 i7 1 0 1 3rd Fold 5 i8 1 0 1 1 i9 0 0 1
  • 6. Stratification Based on Labelsets 1st Fold i1 1 0 1 5 instance λ1 λ2 λ3 labelset i2 0 0 1 1 i1 1 0 1 5 i3 0 1 0 2 i2 0 0 1 1 2 2nd Fold i3 0 1 0 4 i7 1 0 1 5 i4 1 0 0 3 i9 0 0 1 1 i5 0 1 1 6 i4 1 0 0 4 i6 1 1 0 5 i7 1 0 1 3rd Fold 5 i8 1 0 1 1 i8 1 0 1 5 i9 0 0 1 i5 0 1 1 3 i6 1 1 0 6
  • 7. Statistics of Multi-Label Data examples per examples per labelset label labelsets dataset labels examples labelsets / examples min avg max min avg max Scene 6 2407 15 0.01 1 160 405 364 431 533 Emotions 6 593 27 0.05 1 22 81 148 185 264 TMC2007 22 28596 1341 0.05 1 21 2486 441 2805 16173 Genbase 27 662 32 0.05 1 21 170 1 31 171 Yeast 14 2417 198 0.08 1 12 237 34 731 1816 Medical 45 978 94 0.1 1 10 155 1 27 266 Mediamill 101 43907 6555 0.15 1 7 2363 31 1902 33869 Bookmarks 208 87856 18716 0.21 1 5 6087 300 857 6772 Bibtex 159 7395 2856 0.39 1 3 471 51 112 1042 Enron 53 1702 753 0.44 1 2 163 1 108 913 Corel5k 374 5000 3175 0.64 1 2 55 1 47 1120 ImageCLEF2010 93 8000 7366 0.92 1 1 32 12 1038 7484 Delicious 983 16105 15806 0.98 1 1 19 21 312 6495
  • 8. Iterative Stratification Algorithm  Select the label with the fewest remaining examples  If rare labels are not examined in priority, they may be distributed in an undesired way, beyond subsequent repair  For frequent labels, we have the chance to modify the current distribution towards the desired one in a subsequent iteration, due to the availability of more examples  For each example of this label, select the subset with  The largest desired number of examples for this label  The largest desired number of examples, in case of ties  Further ties are broken randomly  Update statistics  Desired number of examples per label at each subset Note: No hard constrain on the desired number of examples
  • 9. Example 1st Fold Instance λ1 λ2 λ3 i1 1 0 1 desired 1.7 1 2 i2 0 0 1 2nd Fold i3 0 1 0 i4 1 0 0 i5 0 1 1 i6 1 1 0 desired 1.7 1 2 i7 1 0 1 3rd Fold i8 1 0 1 i9 0 0 1 sum 5 3 6 desired 1.7 1 2
  • 10. Example 1st Fold Firstly Distribute the Instance λ1 λ2 λ3 positive examples i1 1 0 1 of λ2 desired 1.7 1 2 i2 0 0 1 2nd Fold i3 0 1 0 i4 1 0 0 i5 0 1 1 i6 1 1 0 desired 1.7 1 2 i7 1 0 1 3rd Fold i8 1 0 1 i9 0 0 1 sum 5 3 6 desired 1.7 1 2
  • 11. Example 1st Fold Firstly i3 0 1 0 Distribute the Instance λ1 λ2 λ3 positive examples i1 1 0 1 of λ2 desired 1.7 0 2 i2 0 0 1 2nd Fold i4 1 0 0 i5 0 1 1 i6 1 1 0 desired 1.7 1 2 i7 1 0 1 3rd Fold i8 1 0 1 i9 0 0 1 sum 5 2 6 desired 1.7 1 2
  • 12. Example 1st Fold Firstly i3 0 1 0 Distribute the Instance λ1 λ2 λ3 positive examples i1 1 0 1 of λ2 desired 1.7 0 2 i2 0 0 1 2nd Fold i4 1 0 0 i6 1 1 0 desired 1.7 1 2 i7 1 0 1 3rd Fold i8 1 0 1 i5 0 1 1 i9 0 0 1 sum 5 1 5 desired 1.7 0 1
  • 13. Example 1st Fold Firstly i3 0 1 0 Distribute the Instance λ1 λ2 λ3 positive examples i1 1 0 1 of λ2 desired 1.7 0 2 i2 0 0 1 2nd Fold i6 1 1 0 i4 1 0 0 desired 0.7 0 2 i7 1 0 1 3rd Fold i8 1 0 1 i5 0 1 1 i9 0 0 1 sum 4 - 5 desired 1.7 0 1
  • 14. Example 1st Fold Secondly i3 0 1 0 Distribute the positive Instance λ1 λ2 λ3 examples of λ1 i1 1 0 1 desired 1.7 0 2 i2 0 0 1 2nd Fold i6 1 1 0 i4 1 0 0 desired 0.7 0 2 i7 1 0 1 3rd Fold i8 1 0 1 i5 0 1 1 i9 0 0 1 sum 4 - 5 desired 1.7 0 1
  • 15. Example 1st Fold Secondly i3 0 1 0 Distribute the positive i1 1 0 1 Instance λ1 λ2 λ3 examples of λ1 desired 0.7 0 1 i2 0 0 1 2nd Fold i6 1 1 0 i4 1 0 0 desired 0.7 0 2 i7 1 0 1 3rd Fold i8 1 0 1 i5 0 1 1 i9 0 0 1 sum 3 - 4 desired 1.7 0 1
  • 16. Example 1st Fold Secondly i3 0 1 0 Distribute the positive i1 1 0 1 Instance λ1 λ2 λ3 examples of λ1 desired 0.7 0 1 i2 0 0 1 2nd Fold i6 1 1 0 desired 0.7 0 2 i7 1 0 1 3rd Fold i8 1 0 1 i5 0 1 1 i9 0 0 1 i4 1 0 0 sum 2 - 4 desired 0.7 0 1
  • 17. Example 1st Fold Secondly i3 0 1 0 Distribute the positive i1 1 0 1 Instance λ1 λ2 λ3 examples of λ1 desired 0.7 0 1 i2 0 0 1 2nd Fold i6 1 1 0 i7 1 0 1 desired -0.3 0 1 3rd Fold i8 1 0 1 i5 0 1 1 i9 0 0 1 i4 1 0 0 sum 1 - 3 desired 0.7 0 1
  • 18. Example 1st Fold Secondly i3 0 1 0 Distribute the positive i1 1 0 1 Instance λ1 λ2 λ3 examples of λ1 i8 1 0 1 desired -0.3 0 0 i2 0 0 1 2nd Fold i6 1 1 0 i7 1 0 1 desired -0.3 0 1 3rd Fold i5 0 1 1 i9 0 0 1 i4 1 0 0 sum - - 2 desired 0.7 0 1
  • 19. Example 1st Fold Thirdly i3 0 1 0 Distribute the positive i1 1 0 1 Instance λ1 λ2 λ3 examples of λ3 i8 1 0 1 desired -0.3 0 0 i2 0 0 1 2nd Fold i6 1 1 0 i7 1 0 1 desired -0.3 0 1 3rd Fold i5 0 1 1 i9 0 0 1 i4 1 0 0 sum - - 2 desired 0.7 0 1
  • 20. Example 1st Fold Thirdly i3 0 1 0 Distribute the positive i1 1 0 1 Instance λ1 λ2 λ3 examples of λ3 i8 1 0 1 desired -0.3 0 0 2nd Fold i6 1 1 0 i7 1 0 1 i2 0 0 1 desired -0.3 0 0 3rd Fold i5 0 1 1 i9 0 0 1 i4 1 0 0 sum - - 1 desired 0.7 0 1
  • 21. Example 1st Fold Thirdly i3 0 1 0 Distribute the positive i1 1 0 1 Instance λ1 λ2 λ3 examples of λ3 i8 1 0 1 desired -0.3 0 0 2nd Fold i6 1 1 0 i7 1 0 1 i2 0 0 1 desired -0.3 0 0 3rd Fold i5 0 1 1 i4 1 0 0 sum - - - i9 0 0 1 desired 0.7 0 0
  • 22. The Triggering Event  Implementation of evaluation software  Stratification of multi-label data concerned us a while ago during the development of the Mulan open-source library  However, a more practical issue triggered this work  During our participation at ImageCLEF 2010, x-validation experiments led to subsets without positive examples for some labels, and problems in the calculation of the main evaluation measure of the challenge, Mean Avg Precision
  • 23. Subsets Without Label Examples  When can this happen?  When there are rare labels  Problems in calculation of evaluation measures  A test set without positive examples for a label (fn=tp=0) renders recall undefined, and so gets F1, AUC and MAP  Furthermore, if the model is correct (fp=0) then precision is undefined Predicted negative positive Recall: tp/(tp+fn) negative tn fp Precision: tp/(tp+fp) Actual positive fn tp
  • 24. Comparison of the Approaches intends to maintain joint distribution random based on labelsets iterative 1st Fold 1st Fold 1st Fold i1 1 0 1 5 i1 1 0 1 5 i3 0 1 0 2 i2 0 0 1 1 i2 0 0 1 1 i1 1 0 1 5 i3 0 1 0 2 i3 0 1 0 2 i8 1 0 1 5 2nd Fold 2nd Fold 2nd Fold i4 1 0 0 4 i7 1 0 1 5 i6 1 1 0 6 i5 0 1 1 3 i9 0 0 1 1 i7 1 0 1 5 i6 1 1 0 6 i4 1 0 0 4 i2 0 0 1 1 3rd Fold 3rd Fold 3rd Fold i7 1 0 1 5 i8 1 0 1 5 i5 0 1 1 3 i8 1 0 1 5 i5 0 1 1 3 i4 1 0 0 4 i9 0 0 1 1 i6 1 1 0 6 i9 0 0 1 1 intends to maintain marginal distribution
  • 25. Experiments  Sampling approaches  Random (R)  Stratified sampling based on labelsets (L)  Iterative stratification algorithm (I)  We experiment on 13 multi-label datasets  10-fold CV on datasets with up to 15k examples and  Holdout (2/3 for training and 1/3 for testing) on larger ones  Experiments are repeated 5 times with different random orderings of the training examples  Presented results are averages over these 5 experiments
  • 26. Distribution of Labels & Examples  Notation  q labels, k subsets, cj desired examples in subset j,  Di: set of examples of label i, Sj: set of examples in subset j  Sij: set of examples of label i in subset j  Labels distribution (LD) and examples distribution (ED) 1 q1 k S ij Di  1 k LD = ∑  ∑ − ÷ ED = ∑ S j − c j q i =1  k j =1 S j − S ji D − Di ÷ k j =1    Subsets without positive examples  Number of folds that contain at least one label with zero positive examples (FZ), number of fold-label pairs with zero positive examples (FLZ)
  • 27. 0.2 0.4 0.6 0.8 0 1 Sc en e (0 .0 1) Em oti on s( 0. 05 ) Ge nb as e (0 .0 5) TM C2 00 7 (0 .0 5) Ye as t( 0. 08 ) M ed ic a l( 0. 10 ) M ed ia m ill (0 .1 5) Bo ok m ar ks (0 .2 1) Bi bte x( 0. 39 ) Labels Distribution Normalized to the LD value of R En ro n( 0. 44 ) Co re l5 k Im (0 ag .9 eC 2) LE Datasets are sorted in increasing order of #labelsets/#examples F2 0 10 (0 .9 2 ) De lic io u s( 0. 98 ) Labels Distribution (normalized) I L R
  • 28. Examples Distribution 80 R 70 Iterative stratification L I trades off example distribution 60 for label distribution 50 The larger the dataset Examples Distribution 40 the larger the deviation from the desired number of examples 30 20 Mediamill contains 1730 examples without any annotation 10 ? 0 4) 2) 5) ) 9) ) ) 0) 8) 5) 5) 1) ) 05 01 08 2 .4 .9 .1 .0 .3 .9 .1 .2 .0 .9 0. . . (0 (0 (0 (0 (0 (0 (0 (0 (0 (0 (0 (0 t( n al e 7 e k ex us il l ks ns 10 as l5 00 ro as en ic m bt io ar re io 0 Ye En ed nb Sc ia C2 Bi lic F2 ot m Co ed M Ge Em De ok TM LE M Bo eC ag Im Datasets are sorted in decreasing order of #examples
  • 29. Subsets Without Label Examples examples per label FZ FLZ dataset labels labelsets / examples min avg max R L I R L I Scene 6 0.01 364 431 533 0 0 0 0 0 0 Emotions 6 0.05 148 185 264 0 0 0 0 0 0 Genbase 27 0.05 1 31 171 10 10 10 90 77 74 Yeast 14 0.08 34 731 1816 1 0 0 1 0 0 Medical 45 0.1 1 27 266 10 10 10 203 179 173 Bibtex 159 0.39 51 112 1042 1 1 0 1 1 0 Enron 53 0.44 1 108 913 10 10 10 95 88 47 Corel5k 374 0.64 1 47 1120 10 10 10 1140 1118 788 ImageCLEF2010 93 0.92 12 1038 7484 4 4 0 4 0 0 - Iterative stratification produces the lowest FZ & FLZ in all datasets - All schemes fail in Genbase, Medical, Enron and Corel5k due to label rarity - All schemes do well in Scene, Emotions, where examples per label abound - Only iterative stratification does well in Bibtex and ImageCLEF2010
  • 30. Variance of 10-fold CV Estimates  Algorithms  Binary Relevance (one-versus-rest)  Calibrated Label Ranking (Fürnkranz et al., 2008)  Combination of pairwise and one-versus-rest models  Considers label dependencies  Measures Measure Required type of output Hamming Loss Bipartition Subset Accuracy Bipartition Coverage Ranking Ranking Loss Ranking Mean Average Precision Probabilities Micro-averaged AUC Probabilities
  • 31. Average Ranking for BR (1/3)  On all 9 datasets Average Rank for Stadard Deviation using BR R 3 6 fails 4 fails L I 7 fails 2.5 2 1.5 1 e AP C ss ss ag cy AU Lo Lo M ra r ve o- cu g g Co icr in in Ac nk m M m et Ra Ha bs Su only based on scene and emotions
  • 32. Average Ranking for BR (2/3)  On 5 datasets where #labelsets/#examples ≤ 0.1 Average Rank for Stadard Deviation with ratio of labelsets over examples less or erqual than 0.1 using BR R 3 3 fails 2 fails L I 3 fails 2.5 2 1.5 1 e AP C ss ss ag cy AU Lo Lo M ra r ve o- cu g g Co icr ki n in Ac m M n m et Ra Ha bs Su only based on scene and emotions
  • 33. Average Ranking for BR (3/3)  On 4 datasets where #labelsets/#examples ≥ 0.39 Average Rank for Stadard Deviation with ratio of labelsets over examples greter or erqual than 0.39 using BR R 3 L I 2.5 2 1.5 1 e C ss ss ag cy AU Lo Lo ra r ve o- u g g cc Co icr in in nk A m M m et Ra Ha bs Su Fails in MAP – R: 4, L: 4, I: 2
  • 34. Average Ranking for CLR  On 5 datasets with #labels < 50 for complexity reasons (those that #labelsets/#examples ≤ 0.1) Average Rank for Stadard Deviation with ratio of labelsets over examples less or erqual than 0.1 using CLR R 3 3 fails 2 fails L I 2.5 3 fails 2 1.5 1 e ss ss C y P ag ac U A Lo Lo -A M er ur g ov g ro cc in n ic C ki tA m M an am se R ub H S only based on scene and emotions
  • 35. BR vs CLR  On 5 datasets where #labelsets/#examples ≤ 0.1 Comparison between BR and CLR for datasets with ratio of labelsets over examples less or erqual than 0.1 BR with R 3 CLR with R BR with L CLR with L BR with I CLR with I 2.5 2 1.5 1 e AP C ss ss ag cy AU Lo Lo M ra r ve o- u g g cc Co icr in in A nk m M m et Ra Ha bs Su Iterative stratification suits BR Labelsets-based suits CLR
  • 36. Conclusions  Labelsets-based stratification  Works well when #labelsets/#examples is small  Works well with Calibrated Label Ranking  Iterative stratification  Works well when #labelsets/#examples is large  Works well with Binary Relevance  Works well for estimating the Ranking Loss  Handles rare labels in a better way  Maintains the imbalance ratio of each label in each subset  Random sampling  Is consistently worse and should be avoided, contrary to the typical multi-label experimental setup of the literature
  • 37. Future Work  Iterative stratification  Investigate the effect of changing the algorithm to respect the desired number of examples at each subset  Hybrid approach  Stratification based on labelsets of the examples of frequent labelsets  Iterative stratification for the rest of the examples  Sampling and generalization performance  Conduct statistically valid experiments to assess the quality of the sampling schemes in terms of estimating the test error (unbiased and low variance)

Editor's Notes

  1. What about multi-label data
  2. Could also say this together with the next slide.
  3. why we study ED? Because our iterative stratification approach does respect the desired number of examples
  4. add animations
  5. Only fails in the 4 files where the minimum number of examples per label is 1
  6. We assumed that CV offers a good estimate of the test error and focused on its variance. Not real CV in iterative.