SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
UNIVERSITA’ POLITECNICA DELLE MARCHE
                       DIIGA – Dipartimento di Ingegneria Informatica,
                               Gestionale e dell’Automazione
                                       Ancona, Italy




              Ontology-Driven
          KDD Process Composition

            Claudia Diamantini, Domenico Potena, Emanuele Storti
                  {diamantini, potena, storti}@diiga.univpm.it
                              www.diiga.univpm.it




IDA'09, Lyon, Aug 31
Introduction

   Knowledge Discovery in Databases is the non-trivial
    process of identifying valid, novel, potentially useful, and
    ultimately understandable patterns in data. [Fayyad et al., 1996]
   Many sources of complexity:
            iterative/interactive process
            many tasks and phases
            several algorithms available for each
             phase, with specific:
                characteristics, interfaces
                preconditions/postconditions
                performances




IDA'09, Lyon, Aug 31                 Emanuele Storti
Introduction

   Knowledge Discovery in Databases is the non-trivial
    process of identifying valid, novel, potentially useful, and
    ultimately understandable patterns in data. [Fayyad et al., 1996]
   Many sources of complexity:
            iterative/interactive process
            many tasks and phases
            several algorithms available for each
             phase, with specific:
                characteristics, interfaces
                preconditions/postconditions
                performances

Need of systems for supporting users in composing algorithm for producing valid
and useful KDD processes

IDA'09, Lyon, Aug 31                 Emanuele Storti
Aim of the work

   Idea: adding semantics to KDD algorithms for
    supporting an automatic KDD process
    composition procedure




IDA'09, Lyon, Aug 31   Emanuele Storti
Aim of the work

   Idea: adding semantics to KDD algorithms for
    supporting an automatic KDD process
    composition procedure
    Formalizing knowledge of KDD experts into an
     ontology for describing algorithms, their interfaces
     and their relations




IDA'09, Lyon, Aug 31             Emanuele Storti
Aim of the work

   Idea: adding semantics to KDD algorithms for
    supporting an automatic KDD process
    composition procedure
    Formalizing knowledge of KDD experts into an
     ontology for describing algorithms, their interfaces
     and their relations

    Defining techniques for matching algorithms with
     compatible interfaces




IDA'09, Lyon, Aug 31             Emanuele Storti
Aim of the work

   Idea: adding semantics to KDD algorithms for
    supporting an automatic KDD process
    composition procedure
    Formalizing knowledge of KDD experts into an
     ontology for describing algorithms, their interfaces
     and their relations

    Defining techniques for matching algorithms with
     compatible interfaces

    Defining a goal-oriented composition procedure
     which starts from user requests and produces a list
     of valid processes ranked according to some criteria


IDA'09, Lyon, Aug 31             Emanuele Storti
Aim of the work

   Idea: adding semantics to KDD algorithms for
    supporting an automatic KDD process
    composition procedure
    Formalizing knowledge of KDD experts into an
     ontology for describing algorithms, their interfaces
     and their relations

    Defining techniques for matching algorithms with
     compatible interfaces

    Defining a goal-oriented composition procedure
                                                                  goal
     which starts from user requests and produces a list       dataset
     of valid processes ranked according to some criteria   constraints



IDA'09, Lyon, Aug 31             Emanuele Storti
Aim of the work

   Idea: adding semantics to KDD algorithms for
    supporting an automatic KDD process
    composition procedure
    Formalizing knowledge of KDD experts into an
     ontology for describing algorithms, their interfaces
     and their relations

    Defining techniques for matching algorithms with
     compatible interfaces

    Defining a goal-oriented composition procedure
                                                                  goal
     which starts from user requests and produces a list       dataset    processes
     of valid processes ranked according to some criteria   constraints



IDA'09, Lyon, Aug 31             Emanuele Storti
Framework
   KDDVM project: service-oriented system for
    sharing, discovering, accessing, executing Data
    Mining and KDD tools

   Separation of information in 3 logical layer:

    KDD Algorithm       abstract algorithm

       KDD Tool         specific implementation of an algorithm

     KDD Service        tool running on a specific machine

Algorithm level  output = prototype KDD processes


IDA'09, Lyon, Aug 31         Emanuele Storti
Framework
   KDDVM project: service-oriented system for
    sharing, discovering, accessing, executing Data
    Mining and KDD tools

   Separation of information in 3 logical layer:

    KDD Algorithm       abstract algorithm

       KDD Tool         specific implementation of an algorithm

     KDD Service        tool running on a specific machine

Algorithm level  output = prototype KDD processes


IDA'09, Lyon, Aug 31         Emanuele Storti
KDD Ontology (1)

   KDDONTO is an ontology formalizing the
    domain of KDD algorithms:
       developed following a formal methodology [Noy, 2002]
    (concept definition  logic modeling  translation in OWL  evaluation)

       taking into account quality requirements [Gruber, 1995]

    Main classes and relations:
       Algorithm, Method
       Task, Phase
       Data, DataFeature
       Performance
       has_input/has_output
       ...


IDA'09, Lyon, Aug 31             Emanuele Storti
KDD Ontology (2)

   KDDONTO is coinceived for supporting process
    composition
       Properties useful for representing algorithm's interfaces:
           has_condition        pre/postcondition for some input/output data
           in_module/out_module suggestions about composable algorithms
           not_with/not_before  explicit incompatibilities between methods

       Properties useful for representing relations among data:
           part_of/has_part        relations between a compound datum and
                                     its subcomponents
           in_constrast            explicit incompatibilities between conditions




IDA'09, Lyon, Aug 31              Emanuele Storti
Algorithm Matchmaking
   Linking algorithms with compatible interfaces
Exact Match                           Approximate Match
Interfaces share the same data        Interfaces share similar data
 - equivalence only                   - is-a and part-of relations
                                      - inferential reasoning on KDDONTO




matchE({A 1 , A2 } ,B):             matchA({A 1 , A2 } ,B):




IDA'09, Lyon, Aug 31        Emanuele Storti
Algorithm Matchmaking
   Linking algorithms with compatible interfaces
Exact Match                           Approximate Match
Interfaces share the same data        Interfaces share similar data
 - equivalence only                   - is-a and part-of relations
                                      - inferential reasoning on KDDONTO




matchE({A 1 , A2 } ,B):             matchA({A 1 , A2 } ,B):
         1
in1 ≡o outA1
    B


IDA'09, Lyon, Aug 31        Emanuele Storti
Algorithm Matchmaking
       Linking algorithms with compatible interfaces
Exact Match                               Approximate Match
Interfaces share the same data            Interfaces share similar data
 - equivalence only                       - is-a and part-of relations
                                          - inferential reasoning on KDDONTO




matchE({A 1 , A2 } ,B):                 matchA({A 1 , A2 } ,B):
    1       1    2        2
in ≡o outA1     inB ≡o outA1
    B


IDA'09, Lyon, Aug 31            Emanuele Storti
Algorithm Matchmaking
       Linking algorithms with compatible interfaces
Exact Match                                     Approximate Match
Interfaces share the same data                  Interfaces share similar data
 - equivalence only                             - is-a and part-of relations
                                                - inferential reasoning on KDDONTO




matchE({A 1 , A2 } ,B):                       matchA({A 1 , A2 } ,B):
    1       1    2        2    3      1
in ≡o outA1     inB ≡o outA1 inB ≡o outA2
    B


IDA'09, Lyon, Aug 31                  Emanuele Storti
Algorithm Matchmaking
       Linking algorithms with compatible interfaces
Exact Match                                     Approximate Match
Interfaces share the same data                  Interfaces share similar data
 - equivalence only                             - is-a and part-of relations
                                                - inferential reasoning on KDDONTO




matchE({A 1 , A2 } ,B):                       matchA({A 1 , A2 } ,B):
    1       1    2        2    3      1
in ≡o outA1     inB ≡o outA1 inB ≡o outA2      VQ part_of LVQ
                                                  B             A1
    B


IDA'09, Lyon, Aug 31                  Emanuele Storti
Algorithm Matchmaking
       Linking algorithms with compatible interfaces
Exact Match                                     Approximate Match
Interfaces share the same data                  Interfaces share similar data
 - equivalence only                             - is-a and part-of relations
                                                - inferential reasoning on KDDONTO




matchE({A 1 , A2 } ,B):                       matchA({A 1 , A2 } ,B):
    1       1    2        2    3      1
in ≡o outA1     inB ≡o outA1 inB ≡o outA2      VQ part_of LVQ
                                                  B             A1
                                                                     DATASET ≡o DATASETA2
                                                                            B
    B


IDA'09, Lyon, Aug 31                  Emanuele Storti
Composition Procedure (1)
   Goal-driven procedure for composing KDD processes,
    exploiting KDDONTO and matching functionalities
     produces a subset of all possible valid processes


Three phases:
I. Definition of dataset , goal and user constraints




IDA'09, Lyon, Aug 31     Emanuele Storti
Composition Procedure (1)
   Goal-driven procedure for composing KDD processes,
    exploiting KDDONTO and matching functionalities
     produces a subset of all possible valid processes


Three phases:
I. Definition of dataset , goal and user constraints

A Dataset type and set of
instances of DataFeature
class
e.g.: LabeledDataset
{float, balanced,
normalized,
missing_values}

IDA'09, Lyon, Aug 31        Emanuele Storti
Composition Procedure (1)
   Goal-driven procedure for composing KDD processes,
    exploiting KDDONTO and matching functionalities
     produces a subset of all possible valid processes


Three phases:
I. Definition of dataset , goal and user constraints

A Dataset type and set of         An instance of Task class
instances of DataFeature
                                  e.g.: CLASSIFICATION
class
e.g.: LabeledDataset
{float, balanced,
normalized,
missing_values}

IDA'09, Lyon, Aug 31        Emanuele Storti
Composition Procedure (1)
   Goal-driven procedure for composing KDD processes,
    exploiting KDDONTO and matching functionalities
     produces a subset of all possible valid processes


Three phases:
I. Definition of dataset , goal and user constraints

A Dataset type and set of         An instance of Task class
instances of DataFeature
                                  e.g.: CLASSIFICATION
class
e.g.: LabeledDataset
                                  Pruning Criteria
{float, balanced,                • max number of algorithms in a process;
normalized,                      • max cost of a process;
missing_values}                  • max computational complexity

IDA'09, Lyon, Aug 31        Emanuele Storti
Composition Procedure (2)

II. Process building
Starts from task and goes backwards iteratively
 A
   iteration, algorithms
are added to processes                                              task
by exploiting matching        ds
functionalities

Stop conditions: - no process can be further expanded
                 - some process constrains are violated
Output: only valid processes: - satisfying the user goal (task)
                              - compatible with the given dataset




IDA'09, Lyon, Aug 31           Emanuele Storti
Composition Procedure (2)

II. Process building
Starts from task and goes backwards iteratively
 A
   iteration, algorithms
are added to processes                                              task
by exploiting matching        ds
functionalities

Stop conditions: - no process can be further expanded
                 - some process constrains are violated
Output: only valid processes: - satisfying the user goal (task)
                              - compatible with the given dataset




IDA'09, Lyon, Aug 31           Emanuele Storti
Composition Procedure (2)

II. Process building
Starts from task and goes backwards iteratively
 A
   iteration, algorithms
are added to processes                                              task
by exploiting matching        ds
functionalities

Stop conditions: - no process can be further expanded
                 - some process constrains are violated
Output: only valid processes: - satisfying the user goal (task)
                              - compatible with the given dataset




IDA'09, Lyon, Aug 31           Emanuele Storti
Composition Procedure (2)

II. Process building
Starts from task and goes backwards iteratively
 A
   iteration, algorithms
are added to processes                                              task
by exploiting matching        ds
functionalities

Stop conditions: - no process can be further expanded
                 - some process constrains are violated
Output: only valid processes: - satisfying the user goal (task)
                              - compatible with the given dataset




IDA'09, Lyon, Aug 31           Emanuele Storti
Composition Procedure (2)

II. Process building
Starts from task and goes backwards iteratively
 A
   iteration, algorithms
are added to processes                                              task
by exploiting matching        ds
functionalities

Stop conditions: - no process can be further expanded
                 - some process constrains are violated
Output: only valid processes: - satisfying the user goal (task)
                              - compatible with the given dataset




IDA'09, Lyon, Aug 31           Emanuele Storti
Composition Procedure (2)

II. Process building
Starts from task and goes backwards iteratively
 A
   iteration, algorithms
are added to processes                                              task
by exploiting matching        ds
functionalities

Stop conditions: - no process can be further expanded
                 - some process constrains are violated
Output: only valid processes: - satisfying the user goal (task)
                              - compatible with the given dataset




IDA'09, Lyon, Aug 31           Emanuele Storti
Composition Procedure (2)

II. Process building
Starts from task and goes backwards iteratively
 A
   iteration, algorithms
are added to processes                                                   task
by exploiting matching        ds
functionalities

Stop conditions: - no process can be further expanded
                 - some process constrains are violated
Output: only valid processes: - satisfying the user goal (task)
                              - compatible with the given dataset

III. Process ranking
Cost function takes into account: kind of match (exact / approximate),
precondition relaxation, algorithm performances, ...

IDA'09, Lyon, Aug 31           Emanuele Storti
KDDComposer
   A prototype implementing the composition
    procedure
Example scenario:
Task:   CLASSIFICATION
Dataset: LabeledDataset
Dataset features:
    {float, normalized,
    missing_values,...}
Constraints: max 5 algorithms, etc.




IDA'09, Lyon, Aug 31                  Emanuele Storti
KDDComposer
   A prototype implementing the composition
    procedure
Example scenario:
Task:   CLASSIFICATION
Dataset: LabeledDataset
Dataset features:
    {float, normalized,
    missing_values,...}
Constraints: max 5 algorithms, etc.




IDA'09, Lyon, Aug 31                  Emanuele Storti
KDDComposer
   A prototype implementing the composition
    procedure
Example scenario:
Task:   CLASSIFICATION
Dataset: LabeledDataset
Dataset features:
    {float, normalized,
    missing_values,...}
Constraints: max 5 algorithms, etc.




IDA'09, Lyon, Aug 31                  Emanuele Storti
KDDComposer
   A prototype implementing the composition
    procedure
Example scenario:
Task:   CLASSIFICATION
Dataset: LabeledDataset
Dataset features:
    {float, normalized,
    missing_values,...}
Constraints: max 5 algorithms, etc.




IDA'09, Lyon, Aug 31                  Emanuele Storti
KDDComposer
   A prototype implementing the composition
    procedure
Example scenario:
Task:   CLASSIFICATION
Dataset: LabeledDataset
Dataset features:
    {float, normalized,
    missing_values,...}
Constraints: max 5 algorithms, etc.




IDA'09, Lyon, Aug 31                  Emanuele Storti
KDDComposer
   A prototype implementing the composition
    procedure
Example scenario:
Task:   CLASSIFICATION
Dataset: LabeledDataset
Dataset features:
    {float, normalized,
    missing_values,...}
Constraints: max 5 algorithms, etc.

Results
a ranked list of many valid processes
Compared to a non-ontological approach  more valid processes (inference)
                                        less invalid processes (ontological and
                                                            non-ontological pruning)

IDA'09, Lyon, Aug 31                  Emanuele Storti
Conclusion
   Procedure for composing valid KDD processes
       semantic representation of algorithms and data

Advantages
   KDDONTO  resulting processes are valid
                  supports complex pruning strategies
   Approximate Match more valid results (novel w.r.t other works in the Literature)
   Ranking according to both ontological and non-ontological criteria
   Prototype processes can be themselves considered as valid, unknown and useful
    knowledge, valuable for both novice and experts users



Future works
   translating each prototype process in a concrete workflow of KDD Web Services



IDA'09, Lyon, Aug 31               Emanuele Storti
Project website




                         Project website: http://boole.diiga.univpm.it



IDA'09, Lyon, Aug 31   Emanuele Storti
UNIVERSITA’ POLITECNICA DELLE MARCHE
                       DIIGA – Dipartimento di Ingegneria Informatica,
                               Gestionale e dell’Automazione
                                       Ancona, Italy




              Ontology-Driven
          KDD Process Composition

            Claudia Diamantini, Domenico Potena, Emanuele Storti
                  {diamantini, potena, storti}@diiga.univpm.it
                              www.diiga.univpm.it




IDA'09, Lyon, Aug 31

Contenu connexe

En vedette

Role of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data WarehouseRole of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data WarehouseRamakant Soni
 
WEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedWEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedShareek Ahamed
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Wekaweka Content
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with RYanchang Zhao
 
Weka presentation
Weka presentationWeka presentation
Weka presentationSaeed Iqbal
 
Introduction to R for Data Mining (Feb 2013)
Introduction to R for Data Mining (Feb 2013)Introduction to R for Data Mining (Feb 2013)
Introduction to R for Data Mining (Feb 2013)Revolution Analytics
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data modeljagdish_93
 
ChainerによるRNN翻訳モデルの実装+@
ChainerによるRNN翻訳モデルの実装+@ChainerによるRNN翻訳モデルの実装+@
ChainerによるRNN翻訳モデルの実装+@Yusuke Oda
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Pythonによる機械学習の最前線
Pythonによる機械学習の最前線Pythonによる機械学習の最前線
Pythonによる機械学習の最前線Kimikazu Kato
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 

En vedette (18)

14.machine learning
14.machine learning14.machine learning
14.machine learning
 
26.docking
26.docking26.docking
26.docking
 
Role of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data WarehouseRole of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data Warehouse
 
WEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek AhamedWEKA - A Data Mining Tool - by Shareek Ahamed
WEKA - A Data Mining Tool - by Shareek Ahamed
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Weka
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
 
Data Cleaning Process
Data Cleaning ProcessData Cleaning Process
Data Cleaning Process
 
Datacube
DatacubeDatacube
Datacube
 
Introduction to R for Data Mining (Feb 2013)
Introduction to R for Data Mining (Feb 2013)Introduction to R for Data Mining (Feb 2013)
Introduction to R for Data Mining (Feb 2013)
 
Multidimentional data model
Multidimentional data modelMultidimentional data model
Multidimentional data model
 
ChainerによるRNN翻訳モデルの実装+@
ChainerによるRNN翻訳モデルの実装+@ChainerによるRNN翻訳モデルの実装+@
ChainerによるRNN翻訳モデルの実装+@
 
Data cubes
Data cubesData cubes
Data cubes
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data Cleaning Techniques
Data Cleaning TechniquesData Cleaning Techniques
Data Cleaning Techniques
 
Pythonによる機械学習の最前線
Pythonによる機械学習の最前線Pythonによる機械学習の最前線
Pythonによる機械学習の最前線
 
Data mining
Data miningData mining
Data mining
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 

Similaire à Ontology-driven KDD Process Composition

Supporting Users in KDD Processes Design: a Semantic Similarity Matching Appr...
Supporting Users in KDD Processes Design: a Semantic Similarity Matching Appr...Supporting Users in KDD Processes Design: a Semantic Similarity Matching Appr...
Supporting Users in KDD Processes Design: a Semantic Similarity Matching Appr...Emanuele Storti
 
Digital Twins, Virtual Devices, and Augmentations for Self-Organising Cyber-P...
Digital Twins, Virtual Devices, and Augmentations for Self-Organising Cyber-P...Digital Twins, Virtual Devices, and Augmentations for Self-Organising Cyber-P...
Digital Twins, Virtual Devices, and Augmentations for Self-Organising Cyber-P...Roberto Casadei
 
Semantic-Driven Design and Management of KDD Processes
Semantic-Driven Design and Management of KDD ProcessesSemantic-Driven Design and Management of KDD Processes
Semantic-Driven Design and Management of KDD ProcessesEmanuele Storti
 
Plagiarism introduction
Plagiarism introductionPlagiarism introduction
Plagiarism introductionMerin Paul
 
Finding Commonalities: from Description Logics to the Web of Data
Finding Commonalities: from Description Logics to the Web of DataFinding Commonalities: from Description Logics to the Web of Data
Finding Commonalities: from Description Logics to the Web of DataSilvia Giannini
 
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGFEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGIJCI JOURNAL
 
Computational model for artificial learning using formal concept analysis
Computational model for artificial learning using formal concept analysisComputational model for artificial learning using formal concept analysis
Computational model for artificial learning using formal concept analysisAboul Ella Hassanien
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Lec2_Information Integration.ppt
 Lec2_Information Integration.ppt Lec2_Information Integration.ppt
Lec2_Information Integration.pptNaglaaFathy42
 
Cs583 information-integration
Cs583 information-integrationCs583 information-integration
Cs583 information-integrationBorseshweta
 
FScaFi: A Core Calculus for Collective Adaptive Systems Programming
FScaFi: A Core Calculus for Collective Adaptive Systems ProgrammingFScaFi: A Core Calculus for Collective Adaptive Systems Programming
FScaFi: A Core Calculus for Collective Adaptive Systems ProgrammingRoberto Casadei
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET Journal
 
Summary of "Amazon's Dynamo" for the 2nd nosql summer reading in Tokyo
Summary of "Amazon's Dynamo" for the 2nd nosql summer reading in TokyoSummary of "Amazon's Dynamo" for the 2nd nosql summer reading in Tokyo
Summary of "Amazon's Dynamo" for the 2nd nosql summer reading in TokyoCLOUDIAN KK
 
Crosscutting Specification Interference Detection at Aspect Oriented UML-Base...
Crosscutting Specification Interference Detection at Aspect Oriented UML-Base...Crosscutting Specification Interference Detection at Aspect Oriented UML-Base...
Crosscutting Specification Interference Detection at Aspect Oriented UML-Base...IJERA Editor
 
Aula 9 21032013 sii-v0
Aula 9   21032013 sii-v0Aula 9   21032013 sii-v0
Aula 9 21032013 sii-v0Aneesh Zutshi
 
Coreon - Making Sure IoT Devices Understand Each Other!
Coreon - Making Sure IoT Devices Understand Each Other!Coreon - Making Sure IoT Devices Understand Each Other!
Coreon - Making Sure IoT Devices Understand Each Other!Jochen Hummel
 
Algorithm Specification and Data Abstraction
Algorithm Specification and Data Abstraction Algorithm Specification and Data Abstraction
Algorithm Specification and Data Abstraction Ashutosh Satapathy
 
Fractal analysis of good programming style
Fractal analysis of good programming styleFractal analysis of good programming style
Fractal analysis of good programming stylecsandit
 

Similaire à Ontology-driven KDD Process Composition (20)

Supporting Users in KDD Processes Design: a Semantic Similarity Matching Appr...
Supporting Users in KDD Processes Design: a Semantic Similarity Matching Appr...Supporting Users in KDD Processes Design: a Semantic Similarity Matching Appr...
Supporting Users in KDD Processes Design: a Semantic Similarity Matching Appr...
 
Digital Twins, Virtual Devices, and Augmentations for Self-Organising Cyber-P...
Digital Twins, Virtual Devices, and Augmentations for Self-Organising Cyber-P...Digital Twins, Virtual Devices, and Augmentations for Self-Organising Cyber-P...
Digital Twins, Virtual Devices, and Augmentations for Self-Organising Cyber-P...
 
Semantic-Driven Design and Management of KDD Processes
Semantic-Driven Design and Management of KDD ProcessesSemantic-Driven Design and Management of KDD Processes
Semantic-Driven Design and Management of KDD Processes
 
Plagiarism introduction
Plagiarism introductionPlagiarism introduction
Plagiarism introduction
 
Presentation at MTSR 2012
Presentation at MTSR 2012Presentation at MTSR 2012
Presentation at MTSR 2012
 
Finding Commonalities: from Description Logics to the Web of Data
Finding Commonalities: from Description Logics to the Web of DataFinding Commonalities: from Description Logics to the Web of Data
Finding Commonalities: from Description Logics to the Web of Data
 
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGFEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
 
Computational model for artificial learning using formal concept analysis
Computational model for artificial learning using formal concept analysisComputational model for artificial learning using formal concept analysis
Computational model for artificial learning using formal concept analysis
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Lec2_Information Integration.ppt
 Lec2_Information Integration.ppt Lec2_Information Integration.ppt
Lec2_Information Integration.ppt
 
Cs583 information-integration
Cs583 information-integrationCs583 information-integration
Cs583 information-integration
 
FScaFi: A Core Calculus for Collective Adaptive Systems Programming
FScaFi: A Core Calculus for Collective Adaptive Systems ProgrammingFScaFi: A Core Calculus for Collective Adaptive Systems Programming
FScaFi: A Core Calculus for Collective Adaptive Systems Programming
 
Cs8391 notes rejinpaul
Cs8391 notes rejinpaulCs8391 notes rejinpaul
Cs8391 notes rejinpaul
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
 
Summary of "Amazon's Dynamo" for the 2nd nosql summer reading in Tokyo
Summary of "Amazon's Dynamo" for the 2nd nosql summer reading in TokyoSummary of "Amazon's Dynamo" for the 2nd nosql summer reading in Tokyo
Summary of "Amazon's Dynamo" for the 2nd nosql summer reading in Tokyo
 
Crosscutting Specification Interference Detection at Aspect Oriented UML-Base...
Crosscutting Specification Interference Detection at Aspect Oriented UML-Base...Crosscutting Specification Interference Detection at Aspect Oriented UML-Base...
Crosscutting Specification Interference Detection at Aspect Oriented UML-Base...
 
Aula 9 21032013 sii-v0
Aula 9   21032013 sii-v0Aula 9   21032013 sii-v0
Aula 9 21032013 sii-v0
 
Coreon - Making Sure IoT Devices Understand Each Other!
Coreon - Making Sure IoT Devices Understand Each Other!Coreon - Making Sure IoT Devices Understand Each Other!
Coreon - Making Sure IoT Devices Understand Each Other!
 
Algorithm Specification and Data Abstraction
Algorithm Specification and Data Abstraction Algorithm Specification and Data Abstraction
Algorithm Specification and Data Abstraction
 
Fractal analysis of good programming style
Fractal analysis of good programming styleFractal analysis of good programming style
Fractal analysis of good programming style
 

Dernier

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Ontology-driven KDD Process Composition

  • 1. UNIVERSITA’ POLITECNICA DELLE MARCHE DIIGA – Dipartimento di Ingegneria Informatica, Gestionale e dell’Automazione Ancona, Italy Ontology-Driven KDD Process Composition Claudia Diamantini, Domenico Potena, Emanuele Storti {diamantini, potena, storti}@diiga.univpm.it www.diiga.univpm.it IDA'09, Lyon, Aug 31
  • 2. Introduction  Knowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. [Fayyad et al., 1996]  Many sources of complexity:  iterative/interactive process  many tasks and phases  several algorithms available for each phase, with specific:  characteristics, interfaces  preconditions/postconditions  performances IDA'09, Lyon, Aug 31 Emanuele Storti
  • 3. Introduction  Knowledge Discovery in Databases is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. [Fayyad et al., 1996]  Many sources of complexity:  iterative/interactive process  many tasks and phases  several algorithms available for each phase, with specific:  characteristics, interfaces  preconditions/postconditions  performances Need of systems for supporting users in composing algorithm for producing valid and useful KDD processes IDA'09, Lyon, Aug 31 Emanuele Storti
  • 4. Aim of the work  Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure IDA'09, Lyon, Aug 31 Emanuele Storti
  • 5. Aim of the work  Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure  Formalizing knowledge of KDD experts into an ontology for describing algorithms, their interfaces and their relations IDA'09, Lyon, Aug 31 Emanuele Storti
  • 6. Aim of the work  Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure  Formalizing knowledge of KDD experts into an ontology for describing algorithms, their interfaces and their relations  Defining techniques for matching algorithms with compatible interfaces IDA'09, Lyon, Aug 31 Emanuele Storti
  • 7. Aim of the work  Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure  Formalizing knowledge of KDD experts into an ontology for describing algorithms, their interfaces and their relations  Defining techniques for matching algorithms with compatible interfaces  Defining a goal-oriented composition procedure which starts from user requests and produces a list of valid processes ranked according to some criteria IDA'09, Lyon, Aug 31 Emanuele Storti
  • 8. Aim of the work  Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure  Formalizing knowledge of KDD experts into an ontology for describing algorithms, their interfaces and their relations  Defining techniques for matching algorithms with compatible interfaces  Defining a goal-oriented composition procedure goal which starts from user requests and produces a list dataset of valid processes ranked according to some criteria constraints IDA'09, Lyon, Aug 31 Emanuele Storti
  • 9. Aim of the work  Idea: adding semantics to KDD algorithms for supporting an automatic KDD process composition procedure  Formalizing knowledge of KDD experts into an ontology for describing algorithms, their interfaces and their relations  Defining techniques for matching algorithms with compatible interfaces  Defining a goal-oriented composition procedure goal which starts from user requests and produces a list dataset processes of valid processes ranked according to some criteria constraints IDA'09, Lyon, Aug 31 Emanuele Storti
  • 10. Framework  KDDVM project: service-oriented system for sharing, discovering, accessing, executing Data Mining and KDD tools  Separation of information in 3 logical layer: KDD Algorithm abstract algorithm KDD Tool specific implementation of an algorithm KDD Service tool running on a specific machine Algorithm level  output = prototype KDD processes IDA'09, Lyon, Aug 31 Emanuele Storti
  • 11. Framework  KDDVM project: service-oriented system for sharing, discovering, accessing, executing Data Mining and KDD tools  Separation of information in 3 logical layer: KDD Algorithm abstract algorithm KDD Tool specific implementation of an algorithm KDD Service tool running on a specific machine Algorithm level  output = prototype KDD processes IDA'09, Lyon, Aug 31 Emanuele Storti
  • 12. KDD Ontology (1)  KDDONTO is an ontology formalizing the domain of KDD algorithms:  developed following a formal methodology [Noy, 2002] (concept definition  logic modeling  translation in OWL  evaluation)  taking into account quality requirements [Gruber, 1995] Main classes and relations:  Algorithm, Method  Task, Phase  Data, DataFeature  Performance  has_input/has_output  ... IDA'09, Lyon, Aug 31 Emanuele Storti
  • 13. KDD Ontology (2)  KDDONTO is coinceived for supporting process composition  Properties useful for representing algorithm's interfaces:  has_condition  pre/postcondition for some input/output data  in_module/out_module suggestions about composable algorithms  not_with/not_before  explicit incompatibilities between methods  Properties useful for representing relations among data:  part_of/has_part  relations between a compound datum and its subcomponents  in_constrast  explicit incompatibilities between conditions IDA'09, Lyon, Aug 31 Emanuele Storti
  • 14. Algorithm Matchmaking  Linking algorithms with compatible interfaces Exact Match Approximate Match Interfaces share the same data Interfaces share similar data - equivalence only - is-a and part-of relations - inferential reasoning on KDDONTO matchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B): IDA'09, Lyon, Aug 31 Emanuele Storti
  • 15. Algorithm Matchmaking  Linking algorithms with compatible interfaces Exact Match Approximate Match Interfaces share the same data Interfaces share similar data - equivalence only - is-a and part-of relations - inferential reasoning on KDDONTO matchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B): 1 in1 ≡o outA1 B IDA'09, Lyon, Aug 31 Emanuele Storti
  • 16. Algorithm Matchmaking  Linking algorithms with compatible interfaces Exact Match Approximate Match Interfaces share the same data Interfaces share similar data - equivalence only - is-a and part-of relations - inferential reasoning on KDDONTO matchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B): 1 1 2 2 in ≡o outA1 inB ≡o outA1 B IDA'09, Lyon, Aug 31 Emanuele Storti
  • 17. Algorithm Matchmaking  Linking algorithms with compatible interfaces Exact Match Approximate Match Interfaces share the same data Interfaces share similar data - equivalence only - is-a and part-of relations - inferential reasoning on KDDONTO matchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B): 1 1 2 2 3 1 in ≡o outA1 inB ≡o outA1 inB ≡o outA2 B IDA'09, Lyon, Aug 31 Emanuele Storti
  • 18. Algorithm Matchmaking  Linking algorithms with compatible interfaces Exact Match Approximate Match Interfaces share the same data Interfaces share similar data - equivalence only - is-a and part-of relations - inferential reasoning on KDDONTO matchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B): 1 1 2 2 3 1 in ≡o outA1 inB ≡o outA1 inB ≡o outA2 VQ part_of LVQ B A1 B IDA'09, Lyon, Aug 31 Emanuele Storti
  • 19. Algorithm Matchmaking  Linking algorithms with compatible interfaces Exact Match Approximate Match Interfaces share the same data Interfaces share similar data - equivalence only - is-a and part-of relations - inferential reasoning on KDDONTO matchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B): 1 1 2 2 3 1 in ≡o outA1 inB ≡o outA1 inB ≡o outA2 VQ part_of LVQ B A1 DATASET ≡o DATASETA2 B B IDA'09, Lyon, Aug 31 Emanuele Storti
  • 20. Composition Procedure (1)  Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities  produces a subset of all possible valid processes Three phases: I. Definition of dataset , goal and user constraints IDA'09, Lyon, Aug 31 Emanuele Storti
  • 21. Composition Procedure (1)  Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities  produces a subset of all possible valid processes Three phases: I. Definition of dataset , goal and user constraints A Dataset type and set of instances of DataFeature class e.g.: LabeledDataset {float, balanced, normalized, missing_values} IDA'09, Lyon, Aug 31 Emanuele Storti
  • 22. Composition Procedure (1)  Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities  produces a subset of all possible valid processes Three phases: I. Definition of dataset , goal and user constraints A Dataset type and set of An instance of Task class instances of DataFeature e.g.: CLASSIFICATION class e.g.: LabeledDataset {float, balanced, normalized, missing_values} IDA'09, Lyon, Aug 31 Emanuele Storti
  • 23. Composition Procedure (1)  Goal-driven procedure for composing KDD processes, exploiting KDDONTO and matching functionalities  produces a subset of all possible valid processes Three phases: I. Definition of dataset , goal and user constraints A Dataset type and set of An instance of Task class instances of DataFeature e.g.: CLASSIFICATION class e.g.: LabeledDataset Pruning Criteria {float, balanced, • max number of algorithms in a process; normalized, • max cost of a process; missing_values} • max computational complexity IDA'09, Lyon, Aug 31 Emanuele Storti
  • 24. Composition Procedure (2) II. Process building Starts from task and goes backwards iteratively A iteration, algorithms are added to processes task by exploiting matching ds functionalities Stop conditions: - no process can be further expanded - some process constrains are violated Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset IDA'09, Lyon, Aug 31 Emanuele Storti
  • 25. Composition Procedure (2) II. Process building Starts from task and goes backwards iteratively A iteration, algorithms are added to processes task by exploiting matching ds functionalities Stop conditions: - no process can be further expanded - some process constrains are violated Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset IDA'09, Lyon, Aug 31 Emanuele Storti
  • 26. Composition Procedure (2) II. Process building Starts from task and goes backwards iteratively A iteration, algorithms are added to processes task by exploiting matching ds functionalities Stop conditions: - no process can be further expanded - some process constrains are violated Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset IDA'09, Lyon, Aug 31 Emanuele Storti
  • 27. Composition Procedure (2) II. Process building Starts from task and goes backwards iteratively A iteration, algorithms are added to processes task by exploiting matching ds functionalities Stop conditions: - no process can be further expanded - some process constrains are violated Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset IDA'09, Lyon, Aug 31 Emanuele Storti
  • 28. Composition Procedure (2) II. Process building Starts from task and goes backwards iteratively A iteration, algorithms are added to processes task by exploiting matching ds functionalities Stop conditions: - no process can be further expanded - some process constrains are violated Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset IDA'09, Lyon, Aug 31 Emanuele Storti
  • 29. Composition Procedure (2) II. Process building Starts from task and goes backwards iteratively A iteration, algorithms are added to processes task by exploiting matching ds functionalities Stop conditions: - no process can be further expanded - some process constrains are violated Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset IDA'09, Lyon, Aug 31 Emanuele Storti
  • 30. Composition Procedure (2) II. Process building Starts from task and goes backwards iteratively A iteration, algorithms are added to processes task by exploiting matching ds functionalities Stop conditions: - no process can be further expanded - some process constrains are violated Output: only valid processes: - satisfying the user goal (task) - compatible with the given dataset III. Process ranking Cost function takes into account: kind of match (exact / approximate), precondition relaxation, algorithm performances, ... IDA'09, Lyon, Aug 31 Emanuele Storti
  • 31. KDDComposer  A prototype implementing the composition procedure Example scenario: Task: CLASSIFICATION Dataset: LabeledDataset Dataset features: {float, normalized, missing_values,...} Constraints: max 5 algorithms, etc. IDA'09, Lyon, Aug 31 Emanuele Storti
  • 32. KDDComposer  A prototype implementing the composition procedure Example scenario: Task: CLASSIFICATION Dataset: LabeledDataset Dataset features: {float, normalized, missing_values,...} Constraints: max 5 algorithms, etc. IDA'09, Lyon, Aug 31 Emanuele Storti
  • 33. KDDComposer  A prototype implementing the composition procedure Example scenario: Task: CLASSIFICATION Dataset: LabeledDataset Dataset features: {float, normalized, missing_values,...} Constraints: max 5 algorithms, etc. IDA'09, Lyon, Aug 31 Emanuele Storti
  • 34. KDDComposer  A prototype implementing the composition procedure Example scenario: Task: CLASSIFICATION Dataset: LabeledDataset Dataset features: {float, normalized, missing_values,...} Constraints: max 5 algorithms, etc. IDA'09, Lyon, Aug 31 Emanuele Storti
  • 35. KDDComposer  A prototype implementing the composition procedure Example scenario: Task: CLASSIFICATION Dataset: LabeledDataset Dataset features: {float, normalized, missing_values,...} Constraints: max 5 algorithms, etc. IDA'09, Lyon, Aug 31 Emanuele Storti
  • 36. KDDComposer  A prototype implementing the composition procedure Example scenario: Task: CLASSIFICATION Dataset: LabeledDataset Dataset features: {float, normalized, missing_values,...} Constraints: max 5 algorithms, etc. Results a ranked list of many valid processes Compared to a non-ontological approach  more valid processes (inference)  less invalid processes (ontological and non-ontological pruning) IDA'09, Lyon, Aug 31 Emanuele Storti
  • 37. Conclusion  Procedure for composing valid KDD processes  semantic representation of algorithms and data Advantages  KDDONTO  resulting processes are valid supports complex pruning strategies  Approximate Match more valid results (novel w.r.t other works in the Literature)  Ranking according to both ontological and non-ontological criteria  Prototype processes can be themselves considered as valid, unknown and useful knowledge, valuable for both novice and experts users Future works  translating each prototype process in a concrete workflow of KDD Web Services IDA'09, Lyon, Aug 31 Emanuele Storti
  • 38. Project website Project website: http://boole.diiga.univpm.it IDA'09, Lyon, Aug 31 Emanuele Storti
  • 39. UNIVERSITA’ POLITECNICA DELLE MARCHE DIIGA – Dipartimento di Ingegneria Informatica, Gestionale e dell’Automazione Ancona, Italy Ontology-Driven KDD Process Composition Claudia Diamantini, Domenico Potena, Emanuele Storti {diamantini, potena, storti}@diiga.univpm.it www.diiga.univpm.it IDA'09, Lyon, Aug 31