2. Assistant Professor,
Computer Science Department,
Faculty of Science,
Al-Tahadi University,
P.O. Box 727,
Sirt ,Libya,
Dr. Zakaria Suliman ZubiDr. Zakaria Suliman Zubi
ByBy
3. 3
I- Extended DatabasesI- Extended Databases
Abstract .
Introduction of the Indicative Databases .
I-Extended Databases (IE) motivation.
I-Extended Databases (IE) and KDD
processes .
Example .
Conclusions and Remarks .
Questions.
4. 4
AbstractAbstract (1)
How we can handle generalizations in a very large
database using Association Rules (AR), and
inclusion Functional Dependencies (FD)?
The answer is Inductive database.
I- Extended database has a similar property to
inductive databases.
I- Extended database contain exceedingly defined
generalizations about the data .
5. 5
AbstractAbstract (2)
It can be used in the process of Data Mining.
It was proposed in ODBC_KDD(2) Model.
The query will uses normal database terminology.
The main aim of I-Extended database is to interact
with a spatial Data Mining query called Knowledge
Discovery Query Language (KDQL) described in
[22].
The KDQL was demonstrated and introduced as a
query in the ODBC_KDD (2) model in [22].
6. 6
Introduction of the Indicative DatabasesIntroduction of the Indicative Databases
KDD process, contains several steps: understanding the domain,
preparing the data set, discovering patterns (i.e., computing a
theory), post-processing of discovered patterns, and putting the
results into use.
KDD, we need a query language that not only enables the user
to select subsets of the data, but also to specify DM tasks and
select patterns from the corresponding theories.
Considering the KDQL rules operator which was described in [
21] as a possible querying language on mining association rules
for i-extended database.
Query should be an object of a similar type than its arguments.
7. 7
The model was introduced at the Institute of Mathematics andThe model was introduced at the Institute of Mathematics and
Informatics at Debrecen University, Debrecen, Hungary 2002.Informatics at Debrecen University, Debrecen, Hungary 2002.
I-Extended Databases Motivation
Gateway
8. 8
I-Extended database is a pair R = (R, (PR, e, V))
Where :
–R is a database schema.
–PRis a collection of patterns.
–V is a set of result values .
– e is the evaluation function that defines pattern semantics.
This function maps each pair (r, θi) to an element of V, where
r is a database over R and θi P∊ R is a pattern.
An instance of the schema, i-extended database (r, s) over the
schema R consists of a database r over the schema R and a
subset s ⊆ PR.
I-Extended Databases MotivationI-Extended Databases Motivation
continuecontinue
9. 9
Example :
If the patterns are Boolean formulae about the database, V is
{true, false},
And the evaluation function e(r, θ) has value true
iff the formula θ is true about r.
In practice, a user might be interested in selecting from the
intentionally defined collection of all Boolean formulas, the
formulas which are true or the formulas which are false.
I-Extended Databases MotivationI-Extended Databases Motivation
continuecontinue
10. 10
I-Extended Databases MotivationI-Extended Databases Motivation
continuecontinue
I-Extended Database : Is a database that in
addition to data also contain exceedingly defined
generalizations about the data. First we illustrate
the Association Rules, and then we Generalize the
approach and point out key issues for query
evaluation in general.
I-Extended database is a database that has similar
properties that are in inductive database that
shows how it can be used throughout the whole
process of DM due to the closure property of the
framework.
11. 11
I-Extended Databases MotivationI-Extended Databases Motivation
continuecontinue
The aim of I-Extended Database is as follow:The aim of I-Extended Database is as follow:
– I-extended database consists of a normal database
associated to a subset of patterns from a class of
patterns, and an evaluation function that tells how the
patterns occur in the data.
– I-extended database can be queried (in principle) just
by using normal relational algebra or SQL, with the
added property of being able to refer to the values of
the evaluation function on the patterns.
– Modeling KDD processes as a sequence of queries on
i-extended database gives rise to chances for
reasoning and optimizing these processes
12. 12
I-Extended Databases (IE) and KDD
processes
KDD consists of several steps one of these steps is Data Mining.
In Data Mining process we are concerned with unique class of
patterns for a real life mining processes presented in a dynamic
nature of knowledge acquisition scenario.
These interesting patterns will be presented in I-Extended
Databases based on there captured frequency, confidence and
support values.
Knowledge gathered often affects the search process, giving
rise to new goals in addition to the original ones.
13. 13
I-Extended Databases (IE) and KDD processI-Extended Databases (IE) and KDD process
continuecontinue
KDD processes can be described by sequences of
operations, i.e., queries over relevant i-extended database.
Sequences of queries are abstract and concise descriptions
of DM processes.
These descriptions can even be annotated by statistical
information about the size of selected dataset, the size of
intermediate collection of patterns etc..
Providing knowledge for further use of these relevant
sequences.
14. 14
Example/
Patterns in three instances of I-Extended
Database
Schema R = {A1,…..,An} of attributes with
domain {0, 1}.
Relation r over R, an association rule about r is
an expression of the form X⇒B where X ⊆ R
and B ∊R X.
The intuitive meaning of the rule is that if a
row of the matrix r has a 1 in each column of
X, then the row tends to have a 1 also in
column B.
This semantics is captured by frequency and
confidence values. Given W ⊆ R, support (W, r)
denotes the fraction of rows of r that have a 1
in each column of W.
The frequency of X ⇒ B in r is defined to be
support(X ⋃{B}, r) while its confidence is
support(X ⋃ {B}, r)/ support(X , r). Typically,
we are interested in association rules for which
the frequency and the confidence are greater
15. 15
Conclusions and RemarksConclusions and Remarks
I-Extended Databases enables the definition of mining process
as a sequences of queries by using a closure property.
I-Extended Databases is a mandatory step towards to a
general purpose query languages for KDD applications.
I-Extended Databases supports pattern generation, pattern
filtering and pattern combining operations.
I-Extended Databases can uses standard database
terminology to carry out any significant patterns without
introducing any additional concepts .
16. 16
Importance ReferencesImportance References
[20] T. Imielinski and H. Mannila. A database
perspective on knowledge discovery. Communications
of ACM, 39:58-64, 1996.
[21] Zakaria S. Zubi, Knowledge Discovery in Remote
Access Database, Ch. 9 , PhD dissertation, Debrecen
University, Hungary, 2002.
[22] Zakaria S. Zubi, Fazekas Gábor, On ODBC_KDD
models, paper,5th International Conference on Applied
Informatics, , 28 January -3 February 2001, Eger,
Hungary,2001.