Full paper: http://boole.diiga.univpm.it/paper/ida09.pdf
One of the most interesting challenges in Knowledge Discovery in Databases (KDD) eld is giving support to users in the composition of tools for forming a valid and useful KDD process. Such an activity implies that users have both to choose tools suitable to their knowledge discovery problem, and to compose them for designing the KDD process.
To this end, they need expertise and knowledge about functionalities and
properties of all KDD algorithms implemented in available tools. In order to support users in this heavy activity, in this paper we introduce
a goal-driven procedure for automatically compose algorithms. The proposed procedure is based on the exploitation of KDDONTO, an ontology formalizing the domain of KDD algorithms, allowing us to generate valid and non-trivial processes.
08448380779 Call Girls In Civil Lines Women Seeking Men
Ontology-driven KDD Process Composition
1. UNIVERSITA’ POLITECNICA DELLE MARCHE
DIIGA – Dipartimento di Ingegneria Informatica,
Gestionale e dell’Automazione
Ancona, Italy
Ontology-Driven
KDD Process Composition
Claudia Diamantini, Domenico Potena, Emanuele Storti
{diamantini, potena, storti}@diiga.univpm.it
www.diiga.univpm.it
IDA'09, Lyon, Aug 31
2. Introduction
Knowledge Discovery in Databases is the non-trivial
process of identifying valid, novel, potentially useful, and
ultimately understandable patterns in data. [Fayyad et al., 1996]
Many sources of complexity:
iterative/interactive process
many tasks and phases
several algorithms available for each
phase, with specific:
characteristics, interfaces
preconditions/postconditions
performances
IDA'09, Lyon, Aug 31 Emanuele Storti
3. Introduction
Knowledge Discovery in Databases is the non-trivial
process of identifying valid, novel, potentially useful, and
ultimately understandable patterns in data. [Fayyad et al., 1996]
Many sources of complexity:
iterative/interactive process
many tasks and phases
several algorithms available for each
phase, with specific:
characteristics, interfaces
preconditions/postconditions
performances
Need of systems for supporting users in composing algorithm for producing valid
and useful KDD processes
IDA'09, Lyon, Aug 31 Emanuele Storti
4. Aim of the work
Idea: adding semantics to KDD algorithms for
supporting an automatic KDD process
composition procedure
IDA'09, Lyon, Aug 31 Emanuele Storti
5. Aim of the work
Idea: adding semantics to KDD algorithms for
supporting an automatic KDD process
composition procedure
Formalizing knowledge of KDD experts into an
ontology for describing algorithms, their interfaces
and their relations
IDA'09, Lyon, Aug 31 Emanuele Storti
6. Aim of the work
Idea: adding semantics to KDD algorithms for
supporting an automatic KDD process
composition procedure
Formalizing knowledge of KDD experts into an
ontology for describing algorithms, their interfaces
and their relations
Defining techniques for matching algorithms with
compatible interfaces
IDA'09, Lyon, Aug 31 Emanuele Storti
7. Aim of the work
Idea: adding semantics to KDD algorithms for
supporting an automatic KDD process
composition procedure
Formalizing knowledge of KDD experts into an
ontology for describing algorithms, their interfaces
and their relations
Defining techniques for matching algorithms with
compatible interfaces
Defining a goal-oriented composition procedure
which starts from user requests and produces a list
of valid processes ranked according to some criteria
IDA'09, Lyon, Aug 31 Emanuele Storti
8. Aim of the work
Idea: adding semantics to KDD algorithms for
supporting an automatic KDD process
composition procedure
Formalizing knowledge of KDD experts into an
ontology for describing algorithms, their interfaces
and their relations
Defining techniques for matching algorithms with
compatible interfaces
Defining a goal-oriented composition procedure
goal
which starts from user requests and produces a list dataset
of valid processes ranked according to some criteria constraints
IDA'09, Lyon, Aug 31 Emanuele Storti
9. Aim of the work
Idea: adding semantics to KDD algorithms for
supporting an automatic KDD process
composition procedure
Formalizing knowledge of KDD experts into an
ontology for describing algorithms, their interfaces
and their relations
Defining techniques for matching algorithms with
compatible interfaces
Defining a goal-oriented composition procedure
goal
which starts from user requests and produces a list dataset processes
of valid processes ranked according to some criteria constraints
IDA'09, Lyon, Aug 31 Emanuele Storti
10. Framework
KDDVM project: service-oriented system for
sharing, discovering, accessing, executing Data
Mining and KDD tools
Separation of information in 3 logical layer:
KDD Algorithm abstract algorithm
KDD Tool specific implementation of an algorithm
KDD Service tool running on a specific machine
Algorithm level output = prototype KDD processes
IDA'09, Lyon, Aug 31 Emanuele Storti
11. Framework
KDDVM project: service-oriented system for
sharing, discovering, accessing, executing Data
Mining and KDD tools
Separation of information in 3 logical layer:
KDD Algorithm abstract algorithm
KDD Tool specific implementation of an algorithm
KDD Service tool running on a specific machine
Algorithm level output = prototype KDD processes
IDA'09, Lyon, Aug 31 Emanuele Storti
12. KDD Ontology (1)
KDDONTO is an ontology formalizing the
domain of KDD algorithms:
developed following a formal methodology [Noy, 2002]
(concept definition logic modeling translation in OWL evaluation)
taking into account quality requirements [Gruber, 1995]
Main classes and relations:
Algorithm, Method
Task, Phase
Data, DataFeature
Performance
has_input/has_output
...
IDA'09, Lyon, Aug 31 Emanuele Storti
13. KDD Ontology (2)
KDDONTO is coinceived for supporting process
composition
Properties useful for representing algorithm's interfaces:
has_condition pre/postcondition for some input/output data
in_module/out_module suggestions about composable algorithms
not_with/not_before explicit incompatibilities between methods
Properties useful for representing relations among data:
part_of/has_part relations between a compound datum and
its subcomponents
in_constrast explicit incompatibilities between conditions
IDA'09, Lyon, Aug 31 Emanuele Storti
14. Algorithm Matchmaking
Linking algorithms with compatible interfaces
Exact Match Approximate Match
Interfaces share the same data Interfaces share similar data
- equivalence only - is-a and part-of relations
- inferential reasoning on KDDONTO
matchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B):
IDA'09, Lyon, Aug 31 Emanuele Storti
15. Algorithm Matchmaking
Linking algorithms with compatible interfaces
Exact Match Approximate Match
Interfaces share the same data Interfaces share similar data
- equivalence only - is-a and part-of relations
- inferential reasoning on KDDONTO
matchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B):
1
in1 ≡o outA1
B
IDA'09, Lyon, Aug 31 Emanuele Storti
16. Algorithm Matchmaking
Linking algorithms with compatible interfaces
Exact Match Approximate Match
Interfaces share the same data Interfaces share similar data
- equivalence only - is-a and part-of relations
- inferential reasoning on KDDONTO
matchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B):
1 1 2 2
in ≡o outA1 inB ≡o outA1
B
IDA'09, Lyon, Aug 31 Emanuele Storti
17. Algorithm Matchmaking
Linking algorithms with compatible interfaces
Exact Match Approximate Match
Interfaces share the same data Interfaces share similar data
- equivalence only - is-a and part-of relations
- inferential reasoning on KDDONTO
matchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B):
1 1 2 2 3 1
in ≡o outA1 inB ≡o outA1 inB ≡o outA2
B
IDA'09, Lyon, Aug 31 Emanuele Storti
18. Algorithm Matchmaking
Linking algorithms with compatible interfaces
Exact Match Approximate Match
Interfaces share the same data Interfaces share similar data
- equivalence only - is-a and part-of relations
- inferential reasoning on KDDONTO
matchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B):
1 1 2 2 3 1
in ≡o outA1 inB ≡o outA1 inB ≡o outA2 VQ part_of LVQ
B A1
B
IDA'09, Lyon, Aug 31 Emanuele Storti
19. Algorithm Matchmaking
Linking algorithms with compatible interfaces
Exact Match Approximate Match
Interfaces share the same data Interfaces share similar data
- equivalence only - is-a and part-of relations
- inferential reasoning on KDDONTO
matchE({A 1 , A2 } ,B): matchA({A 1 , A2 } ,B):
1 1 2 2 3 1
in ≡o outA1 inB ≡o outA1 inB ≡o outA2 VQ part_of LVQ
B A1
DATASET ≡o DATASETA2
B
B
IDA'09, Lyon, Aug 31 Emanuele Storti
20. Composition Procedure (1)
Goal-driven procedure for composing KDD processes,
exploiting KDDONTO and matching functionalities
produces a subset of all possible valid processes
Three phases:
I. Definition of dataset , goal and user constraints
IDA'09, Lyon, Aug 31 Emanuele Storti
21. Composition Procedure (1)
Goal-driven procedure for composing KDD processes,
exploiting KDDONTO and matching functionalities
produces a subset of all possible valid processes
Three phases:
I. Definition of dataset , goal and user constraints
A Dataset type and set of
instances of DataFeature
class
e.g.: LabeledDataset
{float, balanced,
normalized,
missing_values}
IDA'09, Lyon, Aug 31 Emanuele Storti
22. Composition Procedure (1)
Goal-driven procedure for composing KDD processes,
exploiting KDDONTO and matching functionalities
produces a subset of all possible valid processes
Three phases:
I. Definition of dataset , goal and user constraints
A Dataset type and set of An instance of Task class
instances of DataFeature
e.g.: CLASSIFICATION
class
e.g.: LabeledDataset
{float, balanced,
normalized,
missing_values}
IDA'09, Lyon, Aug 31 Emanuele Storti
23. Composition Procedure (1)
Goal-driven procedure for composing KDD processes,
exploiting KDDONTO and matching functionalities
produces a subset of all possible valid processes
Three phases:
I. Definition of dataset , goal and user constraints
A Dataset type and set of An instance of Task class
instances of DataFeature
e.g.: CLASSIFICATION
class
e.g.: LabeledDataset
Pruning Criteria
{float, balanced, • max number of algorithms in a process;
normalized, • max cost of a process;
missing_values} • max computational complexity
IDA'09, Lyon, Aug 31 Emanuele Storti
24. Composition Procedure (2)
II. Process building
Starts from task and goes backwards iteratively
A
iteration, algorithms
are added to processes task
by exploiting matching ds
functionalities
Stop conditions: - no process can be further expanded
- some process constrains are violated
Output: only valid processes: - satisfying the user goal (task)
- compatible with the given dataset
IDA'09, Lyon, Aug 31 Emanuele Storti
25. Composition Procedure (2)
II. Process building
Starts from task and goes backwards iteratively
A
iteration, algorithms
are added to processes task
by exploiting matching ds
functionalities
Stop conditions: - no process can be further expanded
- some process constrains are violated
Output: only valid processes: - satisfying the user goal (task)
- compatible with the given dataset
IDA'09, Lyon, Aug 31 Emanuele Storti
26. Composition Procedure (2)
II. Process building
Starts from task and goes backwards iteratively
A
iteration, algorithms
are added to processes task
by exploiting matching ds
functionalities
Stop conditions: - no process can be further expanded
- some process constrains are violated
Output: only valid processes: - satisfying the user goal (task)
- compatible with the given dataset
IDA'09, Lyon, Aug 31 Emanuele Storti
27. Composition Procedure (2)
II. Process building
Starts from task and goes backwards iteratively
A
iteration, algorithms
are added to processes task
by exploiting matching ds
functionalities
Stop conditions: - no process can be further expanded
- some process constrains are violated
Output: only valid processes: - satisfying the user goal (task)
- compatible with the given dataset
IDA'09, Lyon, Aug 31 Emanuele Storti
28. Composition Procedure (2)
II. Process building
Starts from task and goes backwards iteratively
A
iteration, algorithms
are added to processes task
by exploiting matching ds
functionalities
Stop conditions: - no process can be further expanded
- some process constrains are violated
Output: only valid processes: - satisfying the user goal (task)
- compatible with the given dataset
IDA'09, Lyon, Aug 31 Emanuele Storti
29. Composition Procedure (2)
II. Process building
Starts from task and goes backwards iteratively
A
iteration, algorithms
are added to processes task
by exploiting matching ds
functionalities
Stop conditions: - no process can be further expanded
- some process constrains are violated
Output: only valid processes: - satisfying the user goal (task)
- compatible with the given dataset
IDA'09, Lyon, Aug 31 Emanuele Storti
30. Composition Procedure (2)
II. Process building
Starts from task and goes backwards iteratively
A
iteration, algorithms
are added to processes task
by exploiting matching ds
functionalities
Stop conditions: - no process can be further expanded
- some process constrains are violated
Output: only valid processes: - satisfying the user goal (task)
- compatible with the given dataset
III. Process ranking
Cost function takes into account: kind of match (exact / approximate),
precondition relaxation, algorithm performances, ...
IDA'09, Lyon, Aug 31 Emanuele Storti
31. KDDComposer
A prototype implementing the composition
procedure
Example scenario:
Task: CLASSIFICATION
Dataset: LabeledDataset
Dataset features:
{float, normalized,
missing_values,...}
Constraints: max 5 algorithms, etc.
IDA'09, Lyon, Aug 31 Emanuele Storti
32. KDDComposer
A prototype implementing the composition
procedure
Example scenario:
Task: CLASSIFICATION
Dataset: LabeledDataset
Dataset features:
{float, normalized,
missing_values,...}
Constraints: max 5 algorithms, etc.
IDA'09, Lyon, Aug 31 Emanuele Storti
33. KDDComposer
A prototype implementing the composition
procedure
Example scenario:
Task: CLASSIFICATION
Dataset: LabeledDataset
Dataset features:
{float, normalized,
missing_values,...}
Constraints: max 5 algorithms, etc.
IDA'09, Lyon, Aug 31 Emanuele Storti
34. KDDComposer
A prototype implementing the composition
procedure
Example scenario:
Task: CLASSIFICATION
Dataset: LabeledDataset
Dataset features:
{float, normalized,
missing_values,...}
Constraints: max 5 algorithms, etc.
IDA'09, Lyon, Aug 31 Emanuele Storti
35. KDDComposer
A prototype implementing the composition
procedure
Example scenario:
Task: CLASSIFICATION
Dataset: LabeledDataset
Dataset features:
{float, normalized,
missing_values,...}
Constraints: max 5 algorithms, etc.
IDA'09, Lyon, Aug 31 Emanuele Storti
36. KDDComposer
A prototype implementing the composition
procedure
Example scenario:
Task: CLASSIFICATION
Dataset: LabeledDataset
Dataset features:
{float, normalized,
missing_values,...}
Constraints: max 5 algorithms, etc.
Results
a ranked list of many valid processes
Compared to a non-ontological approach more valid processes (inference)
less invalid processes (ontological and
non-ontological pruning)
IDA'09, Lyon, Aug 31 Emanuele Storti
37. Conclusion
Procedure for composing valid KDD processes
semantic representation of algorithms and data
Advantages
KDDONTO resulting processes are valid
supports complex pruning strategies
Approximate Match more valid results (novel w.r.t other works in the Literature)
Ranking according to both ontological and non-ontological criteria
Prototype processes can be themselves considered as valid, unknown and useful
knowledge, valuable for both novice and experts users
Future works
translating each prototype process in a concrete workflow of KDD Web Services
IDA'09, Lyon, Aug 31 Emanuele Storti