SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
SUPERVISOR prof.            Anna CORAZZA
  UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II
                                                                              CO-SUPERVISOR prof.         Ezio CATANZARITI




         State-of-the-art Clustering Techniques
             Support Vector Methods and Minimum Bregman Information Principle



                                                       by
                                        VINCENZO RUSSO




VINCENZO RUSSO      STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II             Introduction


What is the clustering?
Non-structured data




                                                                           Unsupervised learning: groups a
                                                                           set of objects in subsets called
                                                                           clusters
                                                                           The objects are represented as
                                                                           points in a subspace of Rd
                                                                           d is the number of point
                                        CLUSTERING                         components, also called attributes
                                                                           or features
3-clusters structure




                                                                           Several application domains:
                                                                           information retrieval, bioinformatics,
                                                                           cheminformatics, image retrieval,
                                                                           astrophsics, market segmentation,
                                                                           etc.
                       VINCENZO RUSSO        STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II           Goals


Two state-of-the-art approaches
       Support Vector Clustering (SVC)
       Bregman Co-clustering

                        Goals                                                   Application domain
      Robustness w.r.t. Missing-valued Data                                            Astrophysics
            Robustness w.r.t. Sparse Data                                           Textual documents
    Robustness w.r.t. High “dimensionality”                                         Textual documents
          Robustness w.r.t. Noise/Outliers                                                   t
                                                                                      Synthetic data
                                                                                             t

         Other desirable properties
          Nonlinear separable problems handling
          Automatic detection of the number of clusters
          Application domain independent

  VINCENZO RUSSO      STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
ers. Finally, the MEB was used for the Support Vector Domain Descriptio
 9).
 ption
 sed for finding degli STUDI di Vector DomainSupportof classification (Tax, 2001; Tax an
 rtunately, the SupportisNAPOLI FEDERICO classclass finding such(Tax, 2001; Tax and
   n SVM formulation not the one II Description
  DD), UNIVERSITÀ the MEB for enough. The process Vector Clustering a
          an SVM formulation for the one classification
9a,b, smallestdetect 6SVDD firstthe(Tax,called Clusterthe the SVC and allows descri
 re is the able toclass classification
  or 1999a,b, 6 The the clusteris iswas firstlystep ofand toand allows describ-
         2004). 2004). The SVDD basic 2001;modeled SVC by
       only one                       boundaries which are Tax mapping
   n, sphere to the data space. Thissphere the basic of Description
    the
  the                  enclosing        phase was
                                                    step presented
   Support Vector Clustering: the idea
D isetboundaries of clusters. determiningallows describ-
 Hur the(2001). clusters. the SVC and the membership of points
 undaries of Astep of (Vapnik, 1995). Later it was used
   the al. basic second stage
 nenkis (VC) dimension for
x1 ,of 2a ·high-dimensional called this points, with X though,Rd ,to adata space. W
 eX = {xis,needed.·be a}Mapping n of nCluster Labeling, ⊆ Rd it
 rtclusters · · x2 , n } Thenauthors distributiona(Schölkopf et al., thethe higher We
      x , Nonlinear dataset of from points,space X ⊆
            1   , x · · , x be a dataset step data with
                                5
                                                                         data space.
 ly does a cluster assignment.
 e following subsections with X : Xoverview of the space. input
  aset usedpoints, wefeaturean φ →XF→ F from the Wespace X X to some hig
         of dimensional provide⊆ Rd , thefromSupport Vector Clus-
                                                  data the input
   a nonlinear transformationVector Domain Description space to some high
 was transformation φ space
  inear n for the Support                  :
mensionalas originallyclass input Ben-Hur look(2001).theTax and enclosing sphere
nal feature spacespaceclassification etEnclosingthe smallest enclosing sphe
 φalgorithm F we find the F,by space Xal. look for high (MEB), i.e. the
 ation → feature the Minimumweto some smallest
            In from proposed wherein
 g : X for the one F, wherein we (Tax,for                2001; Ball
R. This weformalized asof all follows allows describ- having the
 e SVDDsphereis enclosingsmallest enclosing sphere and
 herein isThis basic step follows and
 adius R.is the formalizedthe feature-space images
                  look for the         as SVC
  s Cluster cluster labeling probably descends from the originally proposed algorithm which
               description
sters. minimum radius
 5 follows
   The name
meacluster labeling probably descends fromthe spherepresented to algorithm which is
            Mapping smallest enclosing of, thedata space. in
  e onformulation for the back on                      originally proposed
 SVM dataset then points, with X ⊆ sphere was firstly algorithmscontours the connect
                 of connected componentsRd:a graph: the splitsWe for finding
  d     finding
 ding the connected originally proposed algorithm which is usedfinding the connected
  in the Vapnik-Chervonenkis (VC) the input graph:1995).algorithms for
               from the components of a of Supportthe vertices.
                                  dimension (Vapnik, the Later it was
   descends The assign the “components labels” totoVectors high and describe
mation φusually contours constist space X
  ponents : X → F from                                     some (SV),
 6nents of assign the of algorithms for finding the (Schölkopf
   usually a support “components labels” to the vertices.
 stimating thegraph: the a high-dimensional distribution connectedet al.,
 eAn alternative clusters for thefor the same task, called One Class SVM, can be found
  . F, wherein SVM formulation Support Vector called One Class SVM, can be found in
    Finally,SVM formulation for the smallest enclosing sphere
 rnative
                    we look
               the to the vertices.
             the MEB was used for the same task, Domain Description
 onents labels”
 ölkopf et al. (2000b) (seefor the one class classification (Tax, 2001; Tax and
  D), an SVM formulation Appendix A).
  lized as follows
   al. (2000b) (see Appendix A). Class SVM,
 for the same 6task,SVDD is the basic step of the can be found describ-
 , 1999a,b, 2004). The called One                 SVC and allows in
  A).
 he boundaries of clusters. the originally proposed algorithm which is
  robably descends from
                            be a dataset of n points, with X ⊆ Rd , the data space. We                                                     9
 d components of
   = {x1 , x2 , · · · , xn }a graph: the algorithms for finding the connected
                                                                                                                                       91
  nonlinear transformation φthe vertices. the input space X φ−1some high
   “components labels” to : X → F from                                   to : F → X
 nsional feature space F, wherein we look for the smallest enclosing sphere
  ulation for the same task, called One Class SVM, can be found in
                                                                              91
 dius R. This is formalized as follows
 ppendix A).
 he name cluster labeling probably descends from the originally proposed algorithm which is
  on finding the connected components of a graph: the algorithms for finding the connected
 onents usually assign the “components labels” to the vertices.
  n alternative SVM formulation for the same task, called One Class SVM, can be found in
                                                                                                  91
  opf et al. (2000b) (see Appendix A).
         VINCENZO RUSSO         STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
i    j    i,j=1,2,··· ,n
                                                                                   cluster labeling. To calculat
rnel.    UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II         Support Vector Clustering
                                                                                   analyze each step separately
                     subject to
 Phase I: Cluster K(·) φ(xk ) −
he kernel functiondescription a
                        defines                     an2
                                                      ≤explicit 1, 2, · · · , n if φ is known, othe
                                                         R2 , k = mapping
                                                                            6.1.5.1 Cluster description
apping thesaid toof the sphere. In the majority ofincorporatedfunction φ is u
            is center be implicit. Soft constraints are cases, the by adding
here a is Finding the Minimum Enclosing Ball (MEB)
 we can implicitly Class
 ck variables ξk              perform an inner product in the feature space F clus
                                                                            The complexity of the by
                                                                               Class
               Nonlinear Support Vector Domain Description (SVDD) we have
                               -1                                           (see-1Equation 6.3)
kernel K.            CHAPTER 6. SUPPORT VECTOR CLUSTERING
               QP problem; computational complexity O(n3 ) worst-case running ti
sing nonlinear kernel transformations, we have a chance to transform a
                             n                                              the QP problem can be sol
able problem + Cdata (Squared Feature Space Distance) . in be a Parameters Optmiza
            min R2 Definition ξkspace to a separable oneLet Sequential Minimal(see Fig
                      in         6.1                                        x feature space        (6.2)
                                                                                  data point. We define
            R,a,ξ
PORT VECTOR CLUSTERINGA nonlinear separablespace F, φ(x), from the center = kernel width metho
            Figure the distance of its image in feature problem in the data space Xsphere, a, as
                      2.3: k=1                                                      of the that becomes line
                                                                            tionqmethods. These
                     follows
            the feature space F.                                            timeC = soft margin (approx
                                                                                   complexity to
                       subject to
6.1.1 Valid Mercer kernels in R2 (x) = φ(x) − a 2 reduced to O(1)(6.13)
                                                     dR subspaces
                                                       n
                                                      2      2
                                                                                                  (Ben-Hur e
Squared Feature Space Distance) . φ(xx ) − a point. We ξk , k = 1, 2, · · · , n
                                        Let k be a data ≤ R + define
here are severalF, φ(x), from thewhich the k = 1,of the ·kernelsatisfythe kernelized cond
  image in feature spaceview of Equation 6.6 and the are known to we have Mercer’s
                     In    functions center of definition a, as· , n
                                                  ξk ≥ 0, sphere, 2, ·
       n In polynomial kernels, the parameter k is 6.1.5.2 Cluster labeling co
                     version of the above distance
                                                                            the degree. In the expon
 ⊆ R . Some of them are
  solve this problem φ(x)introduce the−parameter q is called kernel width. isThe k
           kernels=(and − 2 (x) = K(x, x) Lagrangianx) +
               d2 (x)
                          we others), the 2 β K(x , (6.13)
                                dR 2
                                 a
                                                         n                 n    n
                                                                        TheK(xk , xl ) labeling comp
                                                                         βk βl cluster     (6.14)
                R                                     k   k
   • Linear kernel: K(x, y) = xy meaning depending on the kernel: in th
           has different mathematical k=1                        k=1 l=1jacency matrix A (see Equa
 tion 6.6 and the definition solutionkernel functioni.e.kernelized Lagrangian multipliers associ-undirect
           Gaussian kernel, 2it vector we have the only the
                    Since the of the is a β is sparse, of the variance  components of the
L(R,distance µ) kernel need to + vectors= (xy− Gaussianrewritekthe aboveusedkone n = n
 ove Polynomial kernel: K(x,knormalized. a r) ,kr ≥ 0, k most equation(6.3)
   • a, ξ; β, =ated − the support ξ y) are non-zero, we can is theµk +N n × n, where
            The R to    2
                                (R be − φ(xk ) + )β −     2k
                                                                         ξ ∈ C
                                                                        matrix is        ξ as
                  n follows k       n  n                2            1 In the first ksub-step we hav
                                                                     k
 ) • K(x, x) − 2 kernel: K(x, y)β=l K(xk , xl )
   = Gaussian βk K(xk , x) +                β e n         , qq >n 0 n 2 (s) where s is anyone of
                                               −q x−y
                                                               =
                                                            (6.14)
                                                      ≥ 0 x) + = dRβl · · · , n.
                                          k              sv                sv   sv
 th Lagrangian multipliers(x)k=l=1 0 x) − 2 µk βk K(xk ,for k 2σ 1,k2,K(xk, xl ) The posi-
                k=1
                                 2 β   ≥ and
                                dR k=1 K(x,                              β                 (6.15)
 e• Exponential kernel: K(x, y) = k=1 x−y , q k=1 l=1point y sampled along the p
                                                 e−q
    real constant C provides a way to control outliers> 0
                            13
 n vector βKernel width is aLagrangian term SUPPORTassoci-         percentage, allowing the
            is sparse, i.e. only the general multipliersindicate theMINIMUM BREGMANwhich data is
      VINCENZO RUSSO        STATE-OF-THE-ART CLUSTERING TECHNIQUES: to VECTOR METHODS AND scale at INFORMATION PRINCIPLE
en a pair of data points that belong to different clusters, any β ← clusterD    5:        path that c
        UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II Support Vector Clustering  6:       results ← clu
ts them must exit from the sphere in the feature space. Therefore, such a p
                                                                                7:       choose new q
tains a segment of points y such that dR (y) > R. This leads to the definition
  Phase II: Cluster labeling                                                    8:    end while
 adjacency matrix A between all pairs of points whose images lie in or on       9:    return results
ere in feature space.                                                          10: end procedure
           The Phase I only describes the clusters’ boundaries
 Sij be the line segment connecting xi and xj , such that Sij = {xi+1 , xi+2 , ·
2 , xj−1 } Phasei,II:= 1, 2, · · · , n, connected components of the graph
           for all j finding the then                                           6.1.5 Complexity
          induced by the matrix A
                                                                                              We recall that the SVC
                                            1 if ∀y ∈ Sij , dR (y) ≤ R                        cluster labeling. To calcu
                             Aij =                                                                                   (6
                                            0 otherwise.                                      analyze each step separa

sters are now defined as the connected components of the graph induced
                       Sij = {xsegment is · · , xj−2 , xj−16.1.5.1 Cluster a num
matrix A. Checking the line     i+1 , xi+2 , · implemented } sampling
                                                            by
                                                                            descrip

f points between the starting point and the ending point. The exactness ofc
                                                           The complexity of the
        Each component is a cluster                        (see Equation 6.3) we h
 ends on the number m.                                     O(n3 ) worst-case runnin
        Original Phase II is a bottleneck (caso peggiore their problem can be
                                                                 )
arly, the BSVs are unclassified by this procedure sincethe QP feature space
        Alternatives
s lie outside the enclosing sphere. One may decide either to leave them
                                                           Sequential Minimal Optm
sified orCone Cluster Labeling:cluster that they are closest to. Generally,
           to assign them to the best performance/accuracy methods. These me
                                                           tion rate
                                                           time complexity to (app
          Gradient Descent
er is the most appropriate choice.
                                                                                              reduced to O(1) (Ben-Hu
      VINCENZO RUSSO     STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II           Support Vector Clustering


Pseudo-hierarchical execution

      Parameters exploration
       The greater the kernel width q, the greater the number
       of support vectors (and so of clusters)
       C rules the number of outliers and allows to deal with
       strongly overlapping clusters
      Brute force approach unfeasible
      Approaches proposed in literature
       Secant-like algorithm for q exploration
       No theoretical-rooted method for C exploration
      Data analysis is performed at different levels of detail
       Pseudo-hierarchical: strict hierarchy not guaranteed
       when ‘C < 1’, due to the Bounded Support Vectors

   VINCENZO RUSSO      STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II             Support Vector Clustering


Proposed improvements
       Soft Margin C parameter selection
        Heuristics: successfully applied in 90% of cases
        Only 10 tests out of 100 needed further tuning
                   10 datasets had a high percentage of missing values
       New robust stop criterion
        Based upon Relative evaluation criteria (C-index, Dunn
        Index, ad hoc)
       Kernel width (q) selection
        SVC integration         O(Qn3 )         O(n2 )
                                                    sv
        Softening strategy heuristics
        For all normalized kernels
       More kernels
        Exponential (  K(x, y) = e−q x−y ), Laplace (K(x, y) = e−q|x−y| )
  VINCENZO RUSSO        STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II              Support Vector Clustering


Improvements - Stop criterion
                 Detected clusters          Actual clusters                            Validity index
                        1                              3                                  1,00E-06
  Breast Iris



                        3                              3                                    0,13
                        4                              3                                    0,05
                        1                              2                                     1,00E-05
                        2                              2                                       0,80
                        4                              2                                       0,27


                The bigger the Validity index the better the clustering
                found
                The stop criterion halt the process when the index value
                start to decrease
                  The idea: the SVC outputs quality-increasing clusterings
                  before reaching the optimal clustering. After that, it
                  provides quality-decreasing partitionings.
  VINCENZO RUSSO            STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II                     Support Vector Clustering


Improvements - Kernel width selection
                               Algorithm            Accuracy           Macroaveraging # iter # potential “q”
                                   SVC               88,00%                    87,69%                   2               9
  Iris



                               + softening           94,00%                   93,99%                    1              13
                                 K-means             85,33%                    85,11%                        not applicable
                                   SVC               87,07%                    87,55%                   3               7
  B. Cancer Syn03 Syn02 Wine




                               + softening           93,26%                   93,91%                    2               6
                                 K-means             50,00%                    51,78%                        not applicable
                                   SVC               88,80%                   100,00%                   8              18
                               + softening           88,00%                  100,00%                    4              15
                                 K-means             68,40%                    63,84%                        not applicable
                                   SVC               87,30%                   100,00%                  17              36
                               + softening           87,30%                  100,00%                   6               31
                                 K-means             39,47%                    39,90%                        not applicable
                                                                             Benign
                                                                          Contamination
                                  SVC                91,85%                    11,00%                   3              11
                               + softening           96,71%                    2,82%                    3              13
                                K-means              60,23%                    32,00%                        not applicable
   VINCENZO RUSSO                         STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II              Support Vector Clustering


Improvements - non-Gaussian kernels
         Exponential Kernel: improves the cluster separation in several cases
               Algorithm             Accuracy           Macroaveraging # iter # potential “q”
              SVC + softening         94,00%           93,99%            1             13
  Iris




              + Exp Kernel            97,33%           97,33%            1             15
                 K-means              85,33%           85,11%                not applicable
              SVC + softening               Failed - only one class out of 3 separated
  CLA3




              + Exp Kernel            94,00%           93,99%            1             11
                 K-means              85,33%           85,11%                not applicable
          Laplace Kernel: improves/allows the cluster separation with
          normalized data
                 Algorithm                       Accuracy                 # iter             # potential “q”
                SVC + softening                                 Failed - no class separated
  SG03 Quad




              + Laplace Kernel                    99,94%                  1                  17
                  K-means                         83,00%                          not applicable
                SVC + softening                   73,15%                   3                 19
              + Laplace Kernel                    91,04%                  1                  16
                  K-means                         50,24%                          not applicable
  VINCENZO RUSSO           STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
φ        2          1     C, expectation of 2 its interior2
efine the relative 1interior of the set the denoted1 ri(C), as random variable X.
                                           2
                                                         the
                                                                  2

(C) the gradient of STUDI di is the dot product, and ri(S) is the relative interior of
φ is UNIVERSITÀ degli φ, · NAPOLI FEDERICO II Minimum Bregman Information Principle
                                    Proposition 5.1 Let X be a random variable that takes values in X
 ri(C) = {x ∈ C : B(x, r) ∩ aff(C) ⊆ C following r > 0},
 Bregman Co-clustering (BCC)        Rd for some a positive probability distribution measure ν such that
                                                                    (5.2)
                                    Given a Bregman divergence dφ : S × ri(S) → [0, ∞), the problem
) is the ball of radius r and center x (Boyd and Vandenberghe, 2004,
e 5.1 (Squared Euclidean Distance) clustering of both distance is perhaps
       Co-clustering: simultaneous Squared Euclidean rowsEand columns  min     [d (X, s)]
lest and most widely used Bregman divergence. The underlying ri(S) ν φ φ(x) =
       of a data matrix                                               s∈ function
strictly convex, differentiable in Rd and
       Bregman framework            has a unique solution given by s∗ = µ = Eν [X].
gman divergences
            Generalizes K-meansUsing the proposition above, we can now give a more direct
                               strategy
e Bregman divergences (Bregman, 1967), which form a large class of
         dLargexclass of ,divergences: Bregman 2 ,(BI). 2 ) =
                                   Bregman Information
           φ (x1 , 2 ) = x1 x1 − x2 , x2 − x1 − x divergences
                                                        φ(x
d loss functions with a number of desirable properties.
         Minima Bregman1 Information (MBI) principle=                         (5.4)
                    = x1 , x − x2 , x2 − 5.2 (Bregman2Information) Let X be a random variab
                                   Definition x1 − x2 , 2x
1 (Bregman divergence) Let φ be a in X = {xiconvex function of Leg- a positive probability distrib
                                   real-valued }n ⊂ S 2 Rd following
         Meta-algorithmdom(φ)1 Let R2.=The[X] − x2n ⊆ν x ∈ ri(S) and let d : S × ri(S) → [0,
                    = Sx1 − x2 , x ⊆ xd = x1
fined on the convex set ≡           −
                                       µ
                                                i=1
                                            E Bregman divergence
                                                   =        ν            i=1   i i                      φ
 → R+ is defined asconsists of all
interior of a set C divergence points of C that arethe Bregmannot on the “edge” in terms of dφ is de
                                          divergence. Then intuitively Information of X of C
        Bregman                                                    Bregman Information
d Vandenberghe, 2004, app. A).                                                             n
       d (x1 , x2 = φ(x1 − φ(x2 ) φ x − x to be 2 )             Iφ (X) = (5.3) (i) int(dom(φ))φ (xi , µ)
roper, φclosed,) convex )function − is 1 said 2 , φ(xof Legendre type νif: φ (X, µ)] =
                                                                         E [d                 νi d
mpty, with φ convex, real, dot product, and ri(S) is theon int(dom(φ)),ofand (iii) ∀zb ∈
he gradient of is strictly convex and differentiable relative interior
        (ii)     φ, · is the differentiable                                               i=1
φ)), limz∈dom(φ)→zb        φ(z) → ∞, where dom(φ) is the domain of the φ application, d
                                          Example 5.3 (Variance) Let X = {xi }n be a set in R , and con
                                                                                  i=1
φ)) is the interior of the domain of φ measure over X , i.e.isνthe boundary of the domain of of X with
                Divergence                and bd(dom(φ)) iMBI1/n. The Bregman Information
                                      Information                =                 Algorithm
ee et al., 2005c).
                 Euclidean             Variance as Bregman divergence is actually K-means
                                          distance        Least Squares            the variance
(Squared Euclidean Distance) Squared Euclidean distance is perhaps
nd most widely usedEntropy divergence. The underlying function φ(x) =
           Relative Bregman Mutual Information Maximum Entropy   n
                                                                             unnamed n
                                                                                    1
ly convex, differentiable in R and
                              d
                                                       Iφ (X) =    νi dφ (xi , µ) =     57 xi − µ                          2

             Itakura-Saito         unnamed           unnamed i=1 Lindo-Buzo-Gray    n i=1
                                              where
  dφ (x1 , x2 ) = x1 , x1 −STATE-OF-THE-ART CLUSTERING2 , φ(x2SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
       VINCENZO RUSSO        x2 , x2 − x1 − x TECHNIQUES: ) =                            n             n
UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II             Other experiments


Sparse data and missing-valued data
   Star/Galaxy data with missing values

       Dataset                 SVC                BCC              K-means            # attr. affected      % obj. affected
  MV5000 (25D)             99,02%               94,00%              71,08%                   10                 27,0%
 MV10000 (25D)             96,10%               95,60%              75,12%                   10                 29,0%
 AMV5000 (15D)             91,76%               79,46%              74,90%                    6                 30,0%
 AMV10000 (15D)            90,31%               83,51%              68,20%                    6                 30,0%

   Textual document data: sparsity and high “dimensionality”

                    Dataset                               SVC                       BCC                     K-means

         CLASSIC3 (3303D)                               99,80%                 100,00%                       49,80%
             SCI3 (9456D)                                failed                 89,39%                       39,15%
            PORE (13821D)                                failed                 82,68%                       45,91%

   VINCENZO RUSSO        STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II           Other experiments


Outliers
    Dataset            SVC             Best BCC               K-means              # objects            # outliers

  SynDECA 02        100,00%              94,18%               68,04%                  1000                   112
  SynDECA 03        100,00%              49,00%               39,47%                 10000                 1.270
                                                                                  9.8. MISSING VALUES IN ASTROPHYSICS:


                    SynDECA 02                                                   SynDECA 03




   VINCENZO RUSSO      STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II           Conclusions and future works


Conclusions
     Support Vector Clustering achieves the goals

                                                                Goals           Application domain
                Robustness w.r.t. Missing-valued Data                               Astrophysics
                         Robustness w.r.t. Sparse Data                           Textual documents
             Robustness w.r.t. High “dimensionality”                             Textual documents
                     Robustness w.r.t. Noise/Outliers                              Synthetic data
                                    Other properties                            Application domain
           Automatic discovering of the number of clusters
                          Application domain independent
                                                           Whole experimental stage
                 Nonlinear separable problems handling
                        Arbitrary-shaped clusters handling

    Bregman Co-clustering achieves same goals, but the following still hold
     the problem of estimating the number of clusters
     outliers handling problem
  VINCENZO RUSSO      STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II           Conclusions and future works


Contribution


    SVC was made applicable in practice
     Complexity reduction for the kernel width selection
     Soft margin C parameter estimation
     New effective stop criterion
    non-Gaussian Kernels
     The kernel width selection was shown to be applicable
     to all normalized kernels
     Exponential and Laplacian kernel successfully used
    Improved accuracy
     Softening strategy for the kernel width selection


  VINCENZO RUSSO      STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II                                                         Conclusion and future works


   Future works                                                                                10.3. FUTURE WORK




                            Itakura-Saito                                                 Minimum Enclosing Bregman Ball (MEBB)
                                                                                           Generalization of the Minimum Enclosing
                                                                      10.3. FUTURE WORK
                                                                                           Ball (MEB) problem and the Bâdoiu-
                                                                                           Clarkson (BC) algorithm with Bregman
    Bregman Balls




                                                                   L2

                                                                                           divergences
                                     Itakura-Saito                  2                Kullbach-Leibler


                    Squared Euclidean
                         Fig. 2. Examples of Bregman Balls, for d = 2. Blue dots are the centers of the balls.
                    Figure 10.1: Examples of Bregman balls. The two ones on the left are balls obtained by means of

                                                                                          Core Vector Machines (CVM)
                    the Itakura-Saito distance. The middle one is a classic Euclidean ball. The other two are obtained
                    by employing F isKullback-Leibler distance F . A Bregman divergence has the following
                          Here, the the gradient operator of (Nock and Nielsen, 2005, fig. 2).
                          properties: it is convex in x’, always non negative, and zero iff x = x’. Whenever

                                                                                           SVM reformulated as MEB problem
                                     d
                          F (x) = i=1 x2 = x 2 , the corresponding divergence is the squared Euclidean
                                           i       2
                          distancedata, therefore we can take2 ,itwith which is associated the common the
                    tion of the    (L2 ): DF (x’, x) = x − x’ 2 into account for much research in
                                       2
                          definition of a ball in an Euclidean metric space:
                    SVC and generally in the SVM. In fact, we wish to recall that the classical BC al-
            Itakura-Saito                 L22               Kullbach-Leibler
                                                    B algorithm exploited 2 ≤ r} ,
                                                                             2
                    The CVMs reformulate the SVMs as a MEB problem and we already expressedThey make use of the BC algorithm
                    gorithm is the optimizationc,r = {x ∈ X : x − c by the already mentioned CVMs.       (2)


                      Kullback-Leibler
                         with c ∈ S the center of the ball, and r ≥ 0 its (squared) radius. Eq. (2)
Fig. 2. Examples of Bregman Balls, for d = 2. machines WORKcenters of the balls.
                                             10.3. FUTUREare the cluster description
                    our will of testing such Blue dots left are balls obtained by means ofstage of the SVC (see
                                                            for the

                                                                                          MEBB + CVM = Bregman Vector Machines
 e 10.1: Examples of Bregman balls.natural generalization to the definition of balls for arbitrary Bregman
                         suggests a The two ones on the
                    section 6.12). Since the Euclidean ball. The other twogeneralized to Bregmanany
                                                   BC algorithm has been areusually not symmetric, diver-
 kura-Saito distance. The middle one is a classic since a Bregman divergence is obtained
                         divergences. However,
ploying F isKullback-Leibler distance Fr. ≥about vector 2005, fig.dual Bregman balls:
Here,    the the gradient∈ S and any(NockBregman divergence has the following about the SVC) could
                    gences, the research 0and Nielsen, machines2).
                         c operator of     A define actually two
                                                                       (and therefore
                    have very interesting implications. We definitely intend to explore this way.

                                                                                           New implications for vector machines
 roperties: it is convex in x’, always non negative, and zero iff x = x’. Whenever
            d                                           Bc,r = {x ∈ X : DF (c, x) ≤ r} ,                (3)
  (x) = i=1 x2 = x 2 , the corresponding divergence is the squared Euclidean
                 i          2
 istance (L2 ): DF10.3.2 = can take2 ,itand Bc,r = {x for much research in
                                     x − x’ 2 into extend the :SVC software
                                                                   ∈ X DF (x, c) ≤ r} .
of the data, therefore we Improve with which is associated the common the
              2       (x’, x)                            account
                                                                                                        (4)


                                                                                           New implications for SVC
 efinition of a ball in an Euclidean metric space:
 and generally in theRemark In fact,F (c, wish always convex thecclassicalFBC al- is not always, but
                             SVM. that D we x) is to recall that in while D (x, c)
                     For the boundaryaccuracy not2 always convexperform more x, given comparisons with
                           thealgorithmX :c,r is and≤the already (it depends on robust c), while ∂Bc,r
                                sake of                      in order to
hm is the optimizationc,r = {x ∈ ∂B x − c by r} ,
                             B                exploited 2                 mentioned CVMs.
                                                                                      (2)
                     other clustering algorithms, an improved and extended software,r because of
                                                                                               for the Support
CVMs reformulate thealways convex. In this paper, we we already interested in Bc
                           is SVMs as a MEB problem and are mainly expressed
  ith c ∈ S the center ofconvexity of(SVC)≥ 0needed. More of the paper extends
                     Vector Clusteringand r is The (squared) radius. Eq. and reliability is necessary.
                                                            its conclusion stability (2)

                                                                                                Adapting cluster labeling algorithms to
                           the the ball,
will of testing such machines for the DF in c.description stage of the SVC (see some results to
                                                 cluster
 uggests a natural Moreover, it,r to the definition2 presents some examples of Bregman to this promising
                      generalization as well. Figure implement arbitrary Bregman
                           build Bc is important to of balls for all the key contributions balls for three
  n 6.12). Since the BC algorithm has been generalized to Bregmanany                   diver-
 ivergences. However, since a proposed all around the world. In fact, all analytic expressions of the
                     technique Bregman divergence is usually not symmetric, the tests have been currently
                           popular Bregman divergences (see Table 1 for the
es, S and any r ≥about vector machines (and thereforeof m points SVC)were sampled from X . A
  ∈ the research performed by exploiting only some ofabout the that could
                       0 define actually two dual Bregmanset the characteristic and/or special contribu-

                                                                                                the Bregman divergences
                           divergences). Let S ⊆ X be a balls:
 very interesting implications. We definitely intend to explore this way.                                  ∗
                     tion smallest {x ∈ X : DBregman ball ,(SEBB) for S is a Bregman ball B c∗ ,r∗ with r
                            at time. enclosing (c, x) ≤ r}
                             Bc,r =                                                   (3)
                                                    F
                           the minimal real such that S ⊆ Bc∗ ,r∗ . With a slight abuse of language, we will
                   L2      refer to {x ∈ X : DF (x, c) ≤ r} .
                             Bc,r = rKullbach-Leibler
                                       ∗                                              (4)
 2 Improve and extend as the radius of the ball. Our objective is to approximate as best as
                    2
                                          the SVC software
                           possible the SEBB of S, which amounts to minimizing the radius of the enclosing
Remark that DF (c, x) is always convex in c while DF (x, c) is not always, but
he boundaryaccuracy not always convexperform matterx, givenindeed, the SEBB is unique.
 he sake of for d c,r2. Blue dots order to (it depends on robust comparisons,r
 man Balls, ∂B = is and in are the centers of the balls.
                           ball we build. As a simple more of fact
                                                                          c), while ∂Bc with
 man balls. The two ones on an improved and extended software for the Support
   always convex. In this
                                 the left are balls obtained by means of
  clustering algorithms, paper, we are mainlyenclosing Bregman ball Bc∗ ,r∗ of S is unique.
                           Lemma 1. The smallest interested in Bc,r because of
                          Euclidean ball. The other two are obtained
 rmiddle one is ofclassicin c. The conclusion stability and reliability is necessary.
                a
 he Clustering (SVC) is needed. More of the paper extends some results to
     convexity
 over, it,r of
                    DF
 ibler distance F . A Bregman divergence has the following
 t operatoras well. Figure implement fig. the key contributions to this promising
                (Nock and Nielsen, 2005,           2).
 uild Bc is important to 2 presents some examples of Bregman balls for three
                                               all
                                                                                                                                                                      The End
 n x’, always non all around the world. =1x’. Whenevertestsexpressions of the
 opular Bregmannegative, and(see Table In fact, all analytic have been currently
                VINCENZO zero iff x for the theSTATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR
 ique proposed divergences RUSSO
 , the corresponding divergence is the squared Euclidean
                                                                                                                                      METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE

Contenu connexe

Tendances

Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain홍배 김
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors지현 백
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC datatuxette
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors지현 백
 
Harvard_University_-_Linear_Al
Harvard_University_-_Linear_AlHarvard_University_-_Linear_Al
Harvard_University_-_Linear_Alramiljayureta
 
Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function InterpolationJesse Bettencourt
 
NIPS読み会2013: One-shot learning by inverting a compositional causal process
NIPS読み会2013: One-shot learning by inverting  a compositional causal processNIPS読み会2013: One-shot learning by inverting  a compositional causal process
NIPS読み会2013: One-shot learning by inverting a compositional causal processnozyh
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
CommunicationComplexity1_jieren
CommunicationComplexity1_jierenCommunicationComplexity1_jieren
CommunicationComplexity1_jierenjie ren
 
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Usatyuk Vasiliy
 
Intro to threp
Intro to threpIntro to threp
Intro to threpHong Wu
 
Algoritma fuzzy c means fcm java c++ contoh program
Algoritma fuzzy c means fcm java c++   contoh programAlgoritma fuzzy c means fcm java c++   contoh program
Algoritma fuzzy c means fcm java c++ contoh programym.ygrex@comp
 
Graphcompression handout
Graphcompression handoutGraphcompression handout
Graphcompression handoutcsedays
 
On solving coverage problems in a wireless sensor networks using diagrams
On solving coverage problems in a wireless sensor networks using diagrams On solving coverage problems in a wireless sensor networks using diagrams
On solving coverage problems in a wireless sensor networks using diagrams marwaeng
 

Tendances (20)

Machine learning applications in aerospace domain
Machine learning applications in aerospace domainMachine learning applications in aerospace domain
Machine learning applications in aerospace domain
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
 
Single shot multiboxdetectors
Single shot multiboxdetectorsSingle shot multiboxdetectors
Single shot multiboxdetectors
 
Harvard_University_-_Linear_Al
Harvard_University_-_Linear_AlHarvard_University_-_Linear_Al
Harvard_University_-_Linear_Al
 
Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function Interpolation
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
 
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
 
NIPS読み会2013: One-shot learning by inverting a compositional causal process
NIPS読み会2013: One-shot learning by inverting  a compositional causal processNIPS読み会2013: One-shot learning by inverting  a compositional causal process
NIPS読み会2013: One-shot learning by inverting a compositional causal process
 
Segmentation arxiv search_paper
Segmentation arxiv search_paperSegmentation arxiv search_paper
Segmentation arxiv search_paper
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
 
CommunicationComplexity1_jieren
CommunicationComplexity1_jierenCommunicationComplexity1_jieren
CommunicationComplexity1_jieren
 
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)
Recurrent Neural Networks (D2L2 2017 UPC Deep Learning for Computer Vision)
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
Intro to threp
Intro to threpIntro to threp
Intro to threp
 
Algoritma fuzzy c means fcm java c++ contoh program
Algoritma fuzzy c means fcm java c++   contoh programAlgoritma fuzzy c means fcm java c++   contoh program
Algoritma fuzzy c means fcm java c++ contoh program
 
Graphcompression handout
Graphcompression handoutGraphcompression handout
Graphcompression handout
 
Causal Bayesian Networks
Causal Bayesian NetworksCausal Bayesian Networks
Causal Bayesian Networks
 
On solving coverage problems in a wireless sensor networks using diagrams
On solving coverage problems in a wireless sensor networks using diagrams On solving coverage problems in a wireless sensor networks using diagrams
On solving coverage problems in a wireless sensor networks using diagrams
 

En vedette

Exploiting clustering techniques
Exploiting clustering techniquesExploiting clustering techniques
Exploiting clustering techniquesmejjagiri
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clusteringtim_hare
 
Orc For Market Making
Orc  For Market MakingOrc  For Market Making
Orc For Market MakingLMessi10
 
Orc Trading For Arbitrage
Orc Trading For ArbitrageOrc Trading For Arbitrage
Orc Trading For ArbitrageLMessi10
 
Marketing Your Small Business With LinkedIn
Marketing Your Small Business With LinkedInMarketing Your Small Business With LinkedIn
Marketing Your Small Business With LinkedInChristopher T. Lawson
 
PyCon mini JP LT "take me to a conference"
PyCon mini JP LT "take me to a conference"PyCon mini JP LT "take me to a conference"
PyCon mini JP LT "take me to a conference"Naotaka Jay HOTTA
 
Family Jewels
Family JewelsFamily Jewels
Family Jewelsvicente21
 

En vedette (20)

AmericanTrip!!
AmericanTrip!!AmericanTrip!!
AmericanTrip!!
 
Exploiting clustering techniques
Exploiting clustering techniquesExploiting clustering techniques
Exploiting clustering techniques
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
 
Disaster Resilient Megacities
Disaster Resilient MegacitiesDisaster Resilient Megacities
Disaster Resilient Megacities
 
POST-SANDY RESPONSE AND RECOVERY Part , V
POST-SANDY RESPONSE AND RECOVERY Part , VPOST-SANDY RESPONSE AND RECOVERY Part , V
POST-SANDY RESPONSE AND RECOVERY Part , V
 
Most powerful storm in history strikes Philippines
Most powerful storm in history strikes PhilippinesMost powerful storm in history strikes Philippines
Most powerful storm in history strikes Philippines
 
Orc For Market Making
Orc  For Market MakingOrc  For Market Making
Orc For Market Making
 
Chef 11概要-osct
Chef 11概要-osctChef 11概要-osct
Chef 11概要-osct
 
Orc Trading For Arbitrage
Orc Trading For ArbitrageOrc Trading For Arbitrage
Orc Trading For Arbitrage
 
Marketing Your Small Business With LinkedIn
Marketing Your Small Business With LinkedInMarketing Your Small Business With LinkedIn
Marketing Your Small Business With LinkedIn
 
Presentacio 1 Aleix
Presentacio 1 AleixPresentacio 1 Aleix
Presentacio 1 Aleix
 
Haiti earthquake: A two year update
Haiti earthquake: A two year updateHaiti earthquake: A two year update
Haiti earthquake: A two year update
 
PyCon mini JP LT "take me to a conference"
PyCon mini JP LT "take me to a conference"PyCon mini JP LT "take me to a conference"
PyCon mini JP LT "take me to a conference"
 
Our legacy and global climate change: Part I
Our legacy and global climate change: Part IOur legacy and global climate change: Part I
Our legacy and global climate change: Part I
 
Cpsp asset utilisation
Cpsp asset utilisationCpsp asset utilisation
Cpsp asset utilisation
 
A continental assessment of global climate change
A continental assessment of global climate changeA continental assessment of global climate change
A continental assessment of global climate change
 
Family Jewels
Family JewelsFamily Jewels
Family Jewels
 
Neil - Testimonials - 2015
Neil - Testimonials - 2015Neil - Testimonials - 2015
Neil - Testimonials - 2015
 
Don't be fooled by lack if hurricane activity in Atlantic basin
Don't be fooled by lack if hurricane activity in Atlantic basinDon't be fooled by lack if hurricane activity in Atlantic basin
Don't be fooled by lack if hurricane activity in Atlantic basin
 
Iss gestao resposta_sociais
Iss gestao resposta_sociaisIss gestao resposta_sociais
Iss gestao resposta_sociais
 

Similaire à State-of-the-art Clustering Techniques: Support Vector Methods and Minimum Bregman Information Principle

(MS word document)
(MS word document)(MS word document)
(MS word document)butest
 
IGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptIGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptgrssieee
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkshesnasuneer
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkshesnasuneer
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionJordan McBain
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Pami meanshift
Pami meanshiftPami meanshift
Pami meanshiftirisshicat
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methodsKrish_ver2
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine LearningPavithra Thippanaik
 
Introduction to Support Vector Machines
Introduction to Support Vector MachinesIntroduction to Support Vector Machines
Introduction to Support Vector MachinesSilicon Mentor
 
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms ijcseit
 
Speech Processing with deep learning
Speech Processing  with deep learningSpeech Processing  with deep learning
Speech Processing with deep learningMohamed Essam
 
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...IJNSA Journal
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowOswald Campesato
 
Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yuzukun
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 

Similaire à State-of-the-art Clustering Techniques: Support Vector Methods and Minimum Bregman Information Principle (20)

(MS word document)
(MS word document)(MS word document)
(MS word document)
 
IGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptIGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.ppt
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
My8clst
My8clstMy8clst
My8clst
 
Pami meanshift
Pami meanshiftPami meanshift
Pami meanshift
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
 
Introduction to Support Vector Machines
Introduction to Support Vector MachinesIntroduction to Support Vector Machines
Introduction to Support Vector Machines
 
dm_clustering2.ppt
dm_clustering2.pptdm_clustering2.ppt
dm_clustering2.ppt
 
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
 
Speech Processing with deep learning
Speech Processing  with deep learningSpeech Processing  with deep learning
Speech Processing with deep learning
 
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlow
 
Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yu
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 

Dernier

Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 

Dernier (20)

Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 

State-of-the-art Clustering Techniques: Support Vector Methods and Minimum Bregman Information Principle

  • 1. SUPERVISOR prof. Anna CORAZZA UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II CO-SUPERVISOR prof. Ezio CATANZARITI State-of-the-art Clustering Techniques Support Vector Methods and Minimum Bregman Information Principle by VINCENZO RUSSO VINCENZO RUSSO STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
  • 2. UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II Introduction What is the clustering? Non-structured data Unsupervised learning: groups a set of objects in subsets called clusters The objects are represented as points in a subspace of Rd d is the number of point CLUSTERING components, also called attributes or features 3-clusters structure Several application domains: information retrieval, bioinformatics, cheminformatics, image retrieval, astrophsics, market segmentation, etc. VINCENZO RUSSO STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
  • 3. UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II Goals Two state-of-the-art approaches Support Vector Clustering (SVC) Bregman Co-clustering Goals Application domain Robustness w.r.t. Missing-valued Data Astrophysics Robustness w.r.t. Sparse Data Textual documents Robustness w.r.t. High “dimensionality” Textual documents Robustness w.r.t. Noise/Outliers t Synthetic data t Other desirable properties Nonlinear separable problems handling Automatic detection of the number of clusters Application domain independent VINCENZO RUSSO STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
  • 4. ers. Finally, the MEB was used for the Support Vector Domain Descriptio 9). ption sed for finding degli STUDI di Vector DomainSupportof classification (Tax, 2001; Tax an rtunately, the SupportisNAPOLI FEDERICO classclass finding such(Tax, 2001; Tax and n SVM formulation not the one II Description DD), UNIVERSITÀ the MEB for enough. The process Vector Clustering a an SVM formulation for the one classification 9a,b, smallestdetect 6SVDD firstthe(Tax,called Clusterthe the SVC and allows descri re is the able toclass classification or 1999a,b, 6 The the clusteris iswas firstlystep ofand toand allows describ- 2004). 2004). The SVDD basic 2001;modeled SVC by only one boundaries which are Tax mapping n, sphere to the data space. Thissphere the basic of Description the the enclosing phase was step presented Support Vector Clustering: the idea D isetboundaries of clusters. determiningallows describ- Hur the(2001). clusters. the SVC and the membership of points undaries of Astep of (Vapnik, 1995). Later it was used the al. basic second stage nenkis (VC) dimension for x1 ,of 2a ·high-dimensional called this points, with X though,Rd ,to adata space. W eX = {xis,needed.·be a}Mapping n of nCluster Labeling, ⊆ Rd it rtclusters · · x2 , n } Thenauthors distributiona(Schölkopf et al., thethe higher We x , Nonlinear dataset of from points,space X ⊆ 1 , x · · , x be a dataset step data with 5 data space. ly does a cluster assignment. e following subsections with X : Xoverview of the space. input aset usedpoints, wefeaturean φ →XF→ F from the Wespace X X to some hig of dimensional provide⊆ Rd , thefromSupport Vector Clus- data the input a nonlinear transformationVector Domain Description space to some high was transformation φ space inear n for the Support : mensionalas originallyclass input Ben-Hur look(2001).theTax and enclosing sphere nal feature spacespaceclassification etEnclosingthe smallest enclosing sphe φalgorithm F we find the F,by space Xal. look for high (MEB), i.e. the ation → feature the Minimumweto some smallest In from proposed wherein g : X for the one F, wherein we (Tax,for 2001; Ball R. This weformalized asof all follows allows describ- having the e SVDDsphereis enclosingsmallest enclosing sphere and herein isThis basic step follows and adius R.is the formalizedthe feature-space images look for the as SVC s Cluster cluster labeling probably descends from the originally proposed algorithm which description sters. minimum radius 5 follows The name meacluster labeling probably descends fromthe spherepresented to algorithm which is Mapping smallest enclosing of, thedata space. in e onformulation for the back on originally proposed SVM dataset then points, with X ⊆ sphere was firstly algorithmscontours the connect of connected componentsRd:a graph: the splitsWe for finding d finding ding the connected originally proposed algorithm which is usedfinding the connected in the Vapnik-Chervonenkis (VC) the input graph:1995).algorithms for from the components of a of Supportthe vertices. dimension (Vapnik, the Later it was descends The assign the “components labels” totoVectors high and describe mation φusually contours constist space X ponents : X → F from some (SV), 6nents of assign the of algorithms for finding the (Schölkopf usually a support “components labels” to the vertices. stimating thegraph: the a high-dimensional distribution connectedet al., eAn alternative clusters for thefor the same task, called One Class SVM, can be found . F, wherein SVM formulation Support Vector called One Class SVM, can be found in Finally,SVM formulation for the smallest enclosing sphere rnative we look the to the vertices. the MEB was used for the same task, Domain Description onents labels” ölkopf et al. (2000b) (seefor the one class classification (Tax, 2001; Tax and D), an SVM formulation Appendix A). lized as follows al. (2000b) (see Appendix A). Class SVM, for the same 6task,SVDD is the basic step of the can be found describ- , 1999a,b, 2004). The called One SVC and allows in A). he boundaries of clusters. the originally proposed algorithm which is robably descends from be a dataset of n points, with X ⊆ Rd , the data space. We 9 d components of = {x1 , x2 , · · · , xn }a graph: the algorithms for finding the connected 91 nonlinear transformation φthe vertices. the input space X φ−1some high “components labels” to : X → F from to : F → X nsional feature space F, wherein we look for the smallest enclosing sphere ulation for the same task, called One Class SVM, can be found in 91 dius R. This is formalized as follows ppendix A). he name cluster labeling probably descends from the originally proposed algorithm which is on finding the connected components of a graph: the algorithms for finding the connected onents usually assign the “components labels” to the vertices. n alternative SVM formulation for the same task, called One Class SVM, can be found in 91 opf et al. (2000b) (see Appendix A). VINCENZO RUSSO STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
  • 5. i j i,j=1,2,··· ,n cluster labeling. To calculat rnel. UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II Support Vector Clustering analyze each step separately subject to Phase I: Cluster K(·) φ(xk ) − he kernel functiondescription a defines an2 ≤explicit 1, 2, · · · , n if φ is known, othe R2 , k = mapping 6.1.5.1 Cluster description apping thesaid toof the sphere. In the majority ofincorporatedfunction φ is u is center be implicit. Soft constraints are cases, the by adding here a is Finding the Minimum Enclosing Ball (MEB) we can implicitly Class ck variables ξk perform an inner product in the feature space F clus The complexity of the by Class Nonlinear Support Vector Domain Description (SVDD) we have -1 (see-1Equation 6.3) kernel K. CHAPTER 6. SUPPORT VECTOR CLUSTERING QP problem; computational complexity O(n3 ) worst-case running ti sing nonlinear kernel transformations, we have a chance to transform a n the QP problem can be sol able problem + Cdata (Squared Feature Space Distance) . in be a Parameters Optmiza min R2 Definition ξkspace to a separable oneLet Sequential Minimal(see Fig in 6.1 x feature space (6.2) data point. We define R,a,ξ PORT VECTOR CLUSTERINGA nonlinear separablespace F, φ(x), from the center = kernel width metho Figure the distance of its image in feature problem in the data space Xsphere, a, as 2.3: k=1 of the that becomes line tionqmethods. These follows the feature space F. timeC = soft margin (approx complexity to subject to 6.1.1 Valid Mercer kernels in R2 (x) = φ(x) − a 2 reduced to O(1)(6.13) dR subspaces n 2 2 (Ben-Hur e Squared Feature Space Distance) . φ(xx ) − a point. We ξk , k = 1, 2, · · · , n Let k be a data ≤ R + define here are severalF, φ(x), from thewhich the k = 1,of the ·kernelsatisfythe kernelized cond image in feature spaceview of Equation 6.6 and the are known to we have Mercer’s In functions center of definition a, as· , n ξk ≥ 0, sphere, 2, · n In polynomial kernels, the parameter k is 6.1.5.2 Cluster labeling co version of the above distance the degree. In the expon ⊆ R . Some of them are solve this problem φ(x)introduce the−parameter q is called kernel width. isThe k kernels=(and − 2 (x) = K(x, x) Lagrangianx) + d2 (x) we others), the 2 β K(x , (6.13) dR 2 a n n n TheK(xk , xl ) labeling comp βk βl cluster (6.14) R k k • Linear kernel: K(x, y) = xy meaning depending on the kernel: in th has different mathematical k=1 k=1 l=1jacency matrix A (see Equa tion 6.6 and the definition solutionkernel functioni.e.kernelized Lagrangian multipliers associ-undirect Gaussian kernel, 2it vector we have the only the Since the of the is a β is sparse, of the variance components of the L(R,distance µ) kernel need to + vectors= (xy− Gaussianrewritekthe aboveusedkone n = n ove Polynomial kernel: K(x,knormalized. a r) ,kr ≥ 0, k most equation(6.3) • a, ξ; β, =ated − the support ξ y) are non-zero, we can is theµk +N n × n, where The R to 2 (R be − φ(xk ) + )β − 2k ξ ∈ C matrix is ξ as n follows k n n 2 1 In the first ksub-step we hav k ) • K(x, x) − 2 kernel: K(x, y)β=l K(xk , xl ) = Gaussian βk K(xk , x) + β e n , qq >n 0 n 2 (s) where s is anyone of −q x−y = (6.14) ≥ 0 x) + = dRβl · · · , n. k sv sv sv th Lagrangian multipliers(x)k=l=1 0 x) − 2 µk βk K(xk ,for k 2σ 1,k2,K(xk, xl ) The posi- k=1 2 β ≥ and dR k=1 K(x, β (6.15) e• Exponential kernel: K(x, y) = k=1 x−y , q k=1 l=1point y sampled along the p e−q real constant C provides a way to control outliers> 0 13 n vector βKernel width is aLagrangian term SUPPORTassoci- percentage, allowing the is sparse, i.e. only the general multipliersindicate theMINIMUM BREGMANwhich data is VINCENZO RUSSO STATE-OF-THE-ART CLUSTERING TECHNIQUES: to VECTOR METHODS AND scale at INFORMATION PRINCIPLE
  • 6. en a pair of data points that belong to different clusters, any β ← clusterD 5: path that c UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II Support Vector Clustering 6: results ← clu ts them must exit from the sphere in the feature space. Therefore, such a p 7: choose new q tains a segment of points y such that dR (y) > R. This leads to the definition Phase II: Cluster labeling 8: end while adjacency matrix A between all pairs of points whose images lie in or on 9: return results ere in feature space. 10: end procedure The Phase I only describes the clusters’ boundaries Sij be the line segment connecting xi and xj , such that Sij = {xi+1 , xi+2 , · 2 , xj−1 } Phasei,II:= 1, 2, · · · , n, connected components of the graph for all j finding the then 6.1.5 Complexity induced by the matrix A We recall that the SVC 1 if ∀y ∈ Sij , dR (y) ≤ R cluster labeling. To calcu Aij = (6 0 otherwise. analyze each step separa sters are now defined as the connected components of the graph induced Sij = {xsegment is · · , xj−2 , xj−16.1.5.1 Cluster a num matrix A. Checking the line i+1 , xi+2 , · implemented } sampling by descrip f points between the starting point and the ending point. The exactness ofc The complexity of the Each component is a cluster (see Equation 6.3) we h ends on the number m. O(n3 ) worst-case runnin Original Phase II is a bottleneck (caso peggiore their problem can be ) arly, the BSVs are unclassified by this procedure sincethe QP feature space Alternatives s lie outside the enclosing sphere. One may decide either to leave them Sequential Minimal Optm sified orCone Cluster Labeling:cluster that they are closest to. Generally, to assign them to the best performance/accuracy methods. These me tion rate time complexity to (app Gradient Descent er is the most appropriate choice. reduced to O(1) (Ben-Hu VINCENZO RUSSO STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
  • 7. UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II Support Vector Clustering Pseudo-hierarchical execution Parameters exploration The greater the kernel width q, the greater the number of support vectors (and so of clusters) C rules the number of outliers and allows to deal with strongly overlapping clusters Brute force approach unfeasible Approaches proposed in literature Secant-like algorithm for q exploration No theoretical-rooted method for C exploration Data analysis is performed at different levels of detail Pseudo-hierarchical: strict hierarchy not guaranteed when ‘C < 1’, due to the Bounded Support Vectors VINCENZO RUSSO STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
  • 8. UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II Support Vector Clustering Proposed improvements Soft Margin C parameter selection Heuristics: successfully applied in 90% of cases Only 10 tests out of 100 needed further tuning 10 datasets had a high percentage of missing values New robust stop criterion Based upon Relative evaluation criteria (C-index, Dunn Index, ad hoc) Kernel width (q) selection SVC integration O(Qn3 ) O(n2 ) sv Softening strategy heuristics For all normalized kernels More kernels Exponential ( K(x, y) = e−q x−y ), Laplace (K(x, y) = e−q|x−y| ) VINCENZO RUSSO STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
  • 9. UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II Support Vector Clustering Improvements - Stop criterion Detected clusters Actual clusters Validity index 1 3 1,00E-06 Breast Iris 3 3 0,13 4 3 0,05 1 2 1,00E-05 2 2 0,80 4 2 0,27 The bigger the Validity index the better the clustering found The stop criterion halt the process when the index value start to decrease The idea: the SVC outputs quality-increasing clusterings before reaching the optimal clustering. After that, it provides quality-decreasing partitionings. VINCENZO RUSSO STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
  • 10. UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II Support Vector Clustering Improvements - Kernel width selection Algorithm Accuracy Macroaveraging # iter # potential “q” SVC 88,00% 87,69% 2 9 Iris + softening 94,00% 93,99% 1 13 K-means 85,33% 85,11% not applicable SVC 87,07% 87,55% 3 7 B. Cancer Syn03 Syn02 Wine + softening 93,26% 93,91% 2 6 K-means 50,00% 51,78% not applicable SVC 88,80% 100,00% 8 18 + softening 88,00% 100,00% 4 15 K-means 68,40% 63,84% not applicable SVC 87,30% 100,00% 17 36 + softening 87,30% 100,00% 6 31 K-means 39,47% 39,90% not applicable Benign Contamination SVC 91,85% 11,00% 3 11 + softening 96,71% 2,82% 3 13 K-means 60,23% 32,00% not applicable VINCENZO RUSSO STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
  • 11. UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II Support Vector Clustering Improvements - non-Gaussian kernels Exponential Kernel: improves the cluster separation in several cases Algorithm Accuracy Macroaveraging # iter # potential “q” SVC + softening 94,00% 93,99% 1 13 Iris + Exp Kernel 97,33% 97,33% 1 15 K-means 85,33% 85,11% not applicable SVC + softening Failed - only one class out of 3 separated CLA3 + Exp Kernel 94,00% 93,99% 1 11 K-means 85,33% 85,11% not applicable Laplace Kernel: improves/allows the cluster separation with normalized data Algorithm Accuracy # iter # potential “q” SVC + softening Failed - no class separated SG03 Quad + Laplace Kernel 99,94% 1 17 K-means 83,00% not applicable SVC + softening 73,15% 3 19 + Laplace Kernel 91,04% 1 16 K-means 50,24% not applicable VINCENZO RUSSO STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
  • 12. φ 2 1 C, expectation of 2 its interior2 efine the relative 1interior of the set the denoted1 ri(C), as random variable X. 2 the 2 (C) the gradient of STUDI di is the dot product, and ri(S) is the relative interior of φ is UNIVERSITÀ degli φ, · NAPOLI FEDERICO II Minimum Bregman Information Principle Proposition 5.1 Let X be a random variable that takes values in X ri(C) = {x ∈ C : B(x, r) ∩ aff(C) ⊆ C following r > 0}, Bregman Co-clustering (BCC) Rd for some a positive probability distribution measure ν such that (5.2) Given a Bregman divergence dφ : S × ri(S) → [0, ∞), the problem ) is the ball of radius r and center x (Boyd and Vandenberghe, 2004, e 5.1 (Squared Euclidean Distance) clustering of both distance is perhaps Co-clustering: simultaneous Squared Euclidean rowsEand columns min [d (X, s)] lest and most widely used Bregman divergence. The underlying ri(S) ν φ φ(x) = of a data matrix s∈ function strictly convex, differentiable in Rd and Bregman framework has a unique solution given by s∗ = µ = Eν [X]. gman divergences Generalizes K-meansUsing the proposition above, we can now give a more direct strategy e Bregman divergences (Bregman, 1967), which form a large class of dLargexclass of ,divergences: Bregman 2 ,(BI). 2 ) = Bregman Information φ (x1 , 2 ) = x1 x1 − x2 , x2 − x1 − x divergences φ(x d loss functions with a number of desirable properties. Minima Bregman1 Information (MBI) principle= (5.4) = x1 , x − x2 , x2 − 5.2 (Bregman2Information) Let X be a random variab Definition x1 − x2 , 2x 1 (Bregman divergence) Let φ be a in X = {xiconvex function of Leg- a positive probability distrib real-valued }n ⊂ S 2 Rd following Meta-algorithmdom(φ)1 Let R2.=The[X] − x2n ⊆ν x ∈ ri(S) and let d : S × ri(S) → [0, = Sx1 − x2 , x ⊆ xd = x1 fined on the convex set ≡ − µ i=1 E Bregman divergence = ν i=1 i i φ → R+ is defined asconsists of all interior of a set C divergence points of C that arethe Bregmannot on the “edge” in terms of dφ is de divergence. Then intuitively Information of X of C Bregman Bregman Information d Vandenberghe, 2004, app. A). n d (x1 , x2 = φ(x1 − φ(x2 ) φ x − x to be 2 ) Iφ (X) = (5.3) (i) int(dom(φ))φ (xi , µ) roper, φclosed,) convex )function − is 1 said 2 , φ(xof Legendre type νif: φ (X, µ)] = E [d νi d mpty, with φ convex, real, dot product, and ri(S) is theon int(dom(φ)),ofand (iii) ∀zb ∈ he gradient of is strictly convex and differentiable relative interior (ii) φ, · is the differentiable i=1 φ)), limz∈dom(φ)→zb φ(z) → ∞, where dom(φ) is the domain of the φ application, d Example 5.3 (Variance) Let X = {xi }n be a set in R , and con i=1 φ)) is the interior of the domain of φ measure over X , i.e.isνthe boundary of the domain of of X with Divergence and bd(dom(φ)) iMBI1/n. The Bregman Information Information = Algorithm ee et al., 2005c). Euclidean Variance as Bregman divergence is actually K-means distance Least Squares the variance (Squared Euclidean Distance) Squared Euclidean distance is perhaps nd most widely usedEntropy divergence. The underlying function φ(x) = Relative Bregman Mutual Information Maximum Entropy n unnamed n 1 ly convex, differentiable in R and d Iφ (X) = νi dφ (xi , µ) = 57 xi − µ 2 Itakura-Saito unnamed unnamed i=1 Lindo-Buzo-Gray n i=1 where dφ (x1 , x2 ) = x1 , x1 −STATE-OF-THE-ART CLUSTERING2 , φ(x2SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE VINCENZO RUSSO x2 , x2 − x1 − x TECHNIQUES: ) = n n
  • 13. UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II Other experiments Sparse data and missing-valued data Star/Galaxy data with missing values Dataset SVC BCC K-means # attr. affected % obj. affected MV5000 (25D) 99,02% 94,00% 71,08% 10 27,0% MV10000 (25D) 96,10% 95,60% 75,12% 10 29,0% AMV5000 (15D) 91,76% 79,46% 74,90% 6 30,0% AMV10000 (15D) 90,31% 83,51% 68,20% 6 30,0% Textual document data: sparsity and high “dimensionality” Dataset SVC BCC K-means CLASSIC3 (3303D) 99,80% 100,00% 49,80% SCI3 (9456D) failed 89,39% 39,15% PORE (13821D) failed 82,68% 45,91% VINCENZO RUSSO STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
  • 14. UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II Other experiments Outliers Dataset SVC Best BCC K-means # objects # outliers SynDECA 02 100,00% 94,18% 68,04% 1000 112 SynDECA 03 100,00% 49,00% 39,47% 10000 1.270 9.8. MISSING VALUES IN ASTROPHYSICS: SynDECA 02 SynDECA 03 VINCENZO RUSSO STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
  • 15. UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II Conclusions and future works Conclusions Support Vector Clustering achieves the goals Goals Application domain Robustness w.r.t. Missing-valued Data Astrophysics Robustness w.r.t. Sparse Data Textual documents Robustness w.r.t. High “dimensionality” Textual documents Robustness w.r.t. Noise/Outliers Synthetic data Other properties Application domain Automatic discovering of the number of clusters Application domain independent Whole experimental stage Nonlinear separable problems handling Arbitrary-shaped clusters handling Bregman Co-clustering achieves same goals, but the following still hold the problem of estimating the number of clusters outliers handling problem VINCENZO RUSSO STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
  • 16. UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II Conclusions and future works Contribution SVC was made applicable in practice Complexity reduction for the kernel width selection Soft margin C parameter estimation New effective stop criterion non-Gaussian Kernels The kernel width selection was shown to be applicable to all normalized kernels Exponential and Laplacian kernel successfully used Improved accuracy Softening strategy for the kernel width selection VINCENZO RUSSO STATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE
  • 17. UNIVERSITÀ degli STUDI di NAPOLI FEDERICO II Conclusion and future works Future works 10.3. FUTURE WORK Itakura-Saito Minimum Enclosing Bregman Ball (MEBB) Generalization of the Minimum Enclosing 10.3. FUTURE WORK Ball (MEB) problem and the Bâdoiu- Clarkson (BC) algorithm with Bregman Bregman Balls L2 divergences Itakura-Saito 2 Kullbach-Leibler Squared Euclidean Fig. 2. Examples of Bregman Balls, for d = 2. Blue dots are the centers of the balls. Figure 10.1: Examples of Bregman balls. The two ones on the left are balls obtained by means of Core Vector Machines (CVM) the Itakura-Saito distance. The middle one is a classic Euclidean ball. The other two are obtained by employing F isKullback-Leibler distance F . A Bregman divergence has the following Here, the the gradient operator of (Nock and Nielsen, 2005, fig. 2). properties: it is convex in x’, always non negative, and zero iff x = x’. Whenever SVM reformulated as MEB problem d F (x) = i=1 x2 = x 2 , the corresponding divergence is the squared Euclidean i 2 distancedata, therefore we can take2 ,itwith which is associated the common the tion of the (L2 ): DF (x’, x) = x − x’ 2 into account for much research in 2 definition of a ball in an Euclidean metric space: SVC and generally in the SVM. In fact, we wish to recall that the classical BC al- Itakura-Saito L22 Kullbach-Leibler B algorithm exploited 2 ≤ r} , 2 The CVMs reformulate the SVMs as a MEB problem and we already expressedThey make use of the BC algorithm gorithm is the optimizationc,r = {x ∈ X : x − c by the already mentioned CVMs. (2) Kullback-Leibler with c ∈ S the center of the ball, and r ≥ 0 its (squared) radius. Eq. (2) Fig. 2. Examples of Bregman Balls, for d = 2. machines WORKcenters of the balls. 10.3. FUTUREare the cluster description our will of testing such Blue dots left are balls obtained by means ofstage of the SVC (see for the MEBB + CVM = Bregman Vector Machines e 10.1: Examples of Bregman balls.natural generalization to the definition of balls for arbitrary Bregman suggests a The two ones on the section 6.12). Since the Euclidean ball. The other twogeneralized to Bregmanany BC algorithm has been areusually not symmetric, diver- kura-Saito distance. The middle one is a classic since a Bregman divergence is obtained divergences. However, ploying F isKullback-Leibler distance Fr. ≥about vector 2005, fig.dual Bregman balls: Here, the the gradient∈ S and any(NockBregman divergence has the following about the SVC) could gences, the research 0and Nielsen, machines2). c operator of A define actually two (and therefore have very interesting implications. We definitely intend to explore this way. New implications for vector machines roperties: it is convex in x’, always non negative, and zero iff x = x’. Whenever d Bc,r = {x ∈ X : DF (c, x) ≤ r} , (3) (x) = i=1 x2 = x 2 , the corresponding divergence is the squared Euclidean i 2 istance (L2 ): DF10.3.2 = can take2 ,itand Bc,r = {x for much research in x − x’ 2 into extend the :SVC software ∈ X DF (x, c) ≤ r} . of the data, therefore we Improve with which is associated the common the 2 (x’, x) account (4) New implications for SVC efinition of a ball in an Euclidean metric space: and generally in theRemark In fact,F (c, wish always convex thecclassicalFBC al- is not always, but SVM. that D we x) is to recall that in while D (x, c) For the boundaryaccuracy not2 always convexperform more x, given comparisons with thealgorithmX :c,r is and≤the already (it depends on robust c), while ∂Bc,r sake of in order to hm is the optimizationc,r = {x ∈ ∂B x − c by r} , B exploited 2 mentioned CVMs. (2) other clustering algorithms, an improved and extended software,r because of for the Support CVMs reformulate thealways convex. In this paper, we we already interested in Bc is SVMs as a MEB problem and are mainly expressed ith c ∈ S the center ofconvexity of(SVC)≥ 0needed. More of the paper extends Vector Clusteringand r is The (squared) radius. Eq. and reliability is necessary. its conclusion stability (2) Adapting cluster labeling algorithms to the the ball, will of testing such machines for the DF in c.description stage of the SVC (see some results to cluster uggests a natural Moreover, it,r to the definition2 presents some examples of Bregman to this promising generalization as well. Figure implement arbitrary Bregman build Bc is important to of balls for all the key contributions balls for three n 6.12). Since the BC algorithm has been generalized to Bregmanany diver- ivergences. However, since a proposed all around the world. In fact, all analytic expressions of the technique Bregman divergence is usually not symmetric, the tests have been currently popular Bregman divergences (see Table 1 for the es, S and any r ≥about vector machines (and thereforeof m points SVC)were sampled from X . A ∈ the research performed by exploiting only some ofabout the that could 0 define actually two dual Bregmanset the characteristic and/or special contribu- the Bregman divergences divergences). Let S ⊆ X be a balls: very interesting implications. We definitely intend to explore this way. ∗ tion smallest {x ∈ X : DBregman ball ,(SEBB) for S is a Bregman ball B c∗ ,r∗ with r at time. enclosing (c, x) ≤ r} Bc,r = (3) F the minimal real such that S ⊆ Bc∗ ,r∗ . With a slight abuse of language, we will L2 refer to {x ∈ X : DF (x, c) ≤ r} . Bc,r = rKullbach-Leibler ∗ (4) 2 Improve and extend as the radius of the ball. Our objective is to approximate as best as 2 the SVC software possible the SEBB of S, which amounts to minimizing the radius of the enclosing Remark that DF (c, x) is always convex in c while DF (x, c) is not always, but he boundaryaccuracy not always convexperform matterx, givenindeed, the SEBB is unique. he sake of for d c,r2. Blue dots order to (it depends on robust comparisons,r man Balls, ∂B = is and in are the centers of the balls. ball we build. As a simple more of fact c), while ∂Bc with man balls. The two ones on an improved and extended software for the Support always convex. In this the left are balls obtained by means of clustering algorithms, paper, we are mainlyenclosing Bregman ball Bc∗ ,r∗ of S is unique. Lemma 1. The smallest interested in Bc,r because of Euclidean ball. The other two are obtained rmiddle one is ofclassicin c. The conclusion stability and reliability is necessary. a he Clustering (SVC) is needed. More of the paper extends some results to convexity over, it,r of DF ibler distance F . A Bregman divergence has the following t operatoras well. Figure implement fig. the key contributions to this promising (Nock and Nielsen, 2005, 2). uild Bc is important to 2 presents some examples of Bregman balls for three all The End n x’, always non all around the world. =1x’. Whenevertestsexpressions of the opular Bregmannegative, and(see Table In fact, all analytic have been currently VINCENZO zero iff x for the theSTATE-OF-THE-ART CLUSTERING TECHNIQUES: SUPPORT VECTOR ique proposed divergences RUSSO , the corresponding divergence is the squared Euclidean METHODS AND MINIMUM BREGMAN INFORMATION PRINCIPLE