IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
1. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1342-1351
Attribute Ontological Relational Weights (AORW) For Coherent
Association Rules: A Post Mining Process For Association Rules
M.Saritha1, T.Venkata Ramana2,Kareemunnisa3, Y.Sowjanya4,
M.Nishantha5
1
(Asst. Professor, Department of Computer Science, GuruNanak Institutions of Technical Campus, India)
2
(Professor, Head of the Department of Computer Science, SLC‟s IET, India)
3, 5
(Asst. Professors, Department of Information Technology, GuruNanak Institutions of Technical Campus,
India)
4
(M.tech student, Department of Computer Science and Engg, SLC‟s IET, India)
ABSTRACT "Coding of data as binary if data is not binary” ->
Association rules present one of the most “Rule generation” -> “Post-mining". This survey
impressive techniques for the analysis of focused on post mining. It was a century after the
attribute associations in a given dataset related to
applications related to retail, bioinformatics, and introduction of association rules (associations
sociology. In the area of data mining, the initially discussed in 1902), it is still continuing that
importance of the rule management in the absence of items from transactions is often
associating rule mining is rapidly growing. ignored in analyses.
Usually, if datasets are large, the induced rules
are large in volume. The density of the rule A. Association rule mining
volume leads to the obtained knowledge hard to Given a set I that is non-empty, a rule of
be understood and analyze. One better way of association is a statement of the form X ⇒ Y, where
minimizing the rule set size is eliminating X, Y ⊂ I such that X ≠ ∅, Y ≠ ∅, and X ∩ Y = ∅.
redundant rules from rule base. Many efforts The dataset X is called the antecedent of the rule,
have been made and various competent and the set Y is called the consequent of the rule, and
excellent algorithms have been proposed. But all we shall call I the master itemset. Association rules
of these models relying either on closed itemset are generated over a large set of transactions,
mining or expert’s evaluation. None of these denoted by T. Xn association rule can be deemed
models are proven best in all data set contexts. interesting if the items involved occur together often
Closed itemset model is missing adaptability and and there are suggestions that one of the sets might
expert’s evaluation process is resulting different in some cases lead to the presence of the other set. X
significance for same rule under different association rules are characterised as interesting, or
expert’s view. To overcome these limits here we not, based on mathematical notions called „support‟,
proposed a post mining process called AORW as „confidence‟ and „lift‟. Although there are now a
an extension to closed itemset mining algorithms. multitude of measures of interestingness available to
the analyst, many of them are still based on these
Keywords: post mining, association rule mining, three functions.
closed itemset, PEPP, Inference analysis, rule In many applications, it is not only the
pruning. presence of items in the antecedent and the
consequent parts of an association rule that may be
1. INTRODUCTION of interest. Consideration, in many cases, should be
In general, association rules tend to deliver given to the relationship between the absence of
an efficient method of analysing binary or items from the antecedent part and the presence or
discredited data sets that are large in volume. One absence of items from the consequent part. Further,
common practice is to determine relationships the presence of items in the antecedent part can be
between binary variables in transaction databases, related to the absence of items from the consequent
which is known as „market basket analyses. In the part; for example, a rule such as {margarine} ⇒
case of non-binary data, initially data being coded as {not butter}, which might be referred to as a
binary and then association rules will be used to „replacement rule‟. One way to incorporate the
analyse. Association rules having their impact on absence of items into the association rule mining
analysing large binary datasets and considered as paradigm is to consider rules of the form X⇒/ Y
versatile approach for modern applications such as [20]. Another is to think in terms of negations.
detection of bio-terrorist attacks [1] and the analysis Suppose X ⊂ I, then write ¬X to denote the absence
of gene expression data [2], to the analysis of Irish or negation of the item, or items, in X from a
third level education applications [4]. The steps transaction. Considering X as a binary {0, 1}
involved in a typical association rule analysis are variable, the presence of the items in X is equivalent
1342 | P a g e
2. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1342-1351
to X = 1, while the negation ¬X is equivalent to popular association rules though the interest of
XX`€=X`€0. The concept of considering association frequent sets goes further.
rules involving negations, or “negative Based on the proposals [14, 12, 10, 3]
implications”, is due to Silverstein [20]. recently cited in literature and their motivations, it is
observable that the process of rule pruning will opt
to one of the two models.
B. Post mining (1) Rule pruning under post mining process
Pruning rules and detection of rule that demands domain experts observation
interestingness are employed in the post-mining (2) Rule pruning under post mining process
stage of the association rule mining paradigm. that aims to avoid domain expert‟s role in pruning
However, there are a host of other techniques used process.
in the post-mining stage that do not naturally fall As in the first case rule pruning accuracy
under either of these headings. Some such depends on domain expert‟s awareness on attribute
techniques are in the area of redundancy- removal. relations. In this case it is always obvious to prune
There are often a huge number of association rules the rules under reliable domain expert‟s observation.
to contend with in the post-mining stage and it can In second case the models prune the rules based on
be very difficult for the user to identify which ones dynamically determined attribute relations. This
are interesting. Therefore, it is important to remove limits the solution to specific data models. Hence it
insignificant rules, prune redundancy and do further is not adaptable for all contexts.
post-mining on the discovered rules [18, 11]. Liu The Rest of the paper organized as; in
[11] proposed a technique for pruning and section II we discussed the most frequently cited
summarising discovered association rules by first post mining models to improve rule accuracy.
removing those insignificant associations and then Section III briefs the post mining process that we
forming direction-setting rules, which provide a sort opted. Section IV briefs the approach of closed
of summary of the discovered association rules. itemset mining, and inference approach for itemset
Lent [19] proposed clustering the discovered pruning. Section V explores the process of Attribute
association rules. Relational weights analysis for rule pruning. Results
discussion and comparative study will be in Section
C. Influences of Input formats VI that fallowed by conclusion and references.
The input formats that influence the post
mining methodologies are binary data, text data and 2. RELATED WORK
streaming data. In association rule mining, much of Huawen Liu, [14] proposed post processing
recent research work has focused on the difficult approach for rule reduction using closed set to filter
problem of mining data streams such as click stream superfluous rules from knowledge base in a post-
analysis, intrusion detection, and web-purchase processing manner that can be well discovered by a
recommendation systems. In the case of streaming closed set mining technique. Most of these methods
data, it is not possible to perform mining on cached are based on the rule structure analysis where the
and fixed data records. The attempt of caching data relations between the rules have been analysed using
leads to memory usage issues, and the attempt of corresponding problem. This procedure claims to
mining static data leads to worst time complexity eliminate noise and redundant rules in order to
since continuous dataset update leads to continuous provide users with compact and precise knowledge
passes through dataset. From the data mining point derived from databases by data mining method.
of view, texts are complex data giving raise to Further, a fast rule reduction algorithm using closed
interesting challenges. First, texts may be set is introduced. Other endeavours have been
considered as weakly structured, compared with attempted to prune the rule bases directly. The
databases that rely on a predefined schema. typical cases have been elaborated and illustrated by
Moreover, texts are written in natural language, eminent people from all over.
carrying out implicit knowledge, and ambiguities. Modest number of proposals is addressed
Hence, the representation of the content of a text is on pre-pruning and post-pruning. In a line, the
often only partial and possibly noisy. One solution pruning operation occurs at the phase of generation
for handling a text or a collection of texts in a of significant rules. To add to this, the post pruning
satisfying way is to take advantage of a knowledge technique mainly concerns primarily emphasises
model of the domain of the texts, for guiding the that pruning operation occurs after the rule
extraction of knowledge units from the texts. One of generation, among which the rule cover is a
the obvious hot topics of data mining research in the representative case. To extract interesting rules,
last few years has been rule discovery from binary aprori knowledge has also been taken into account
data. It concerns the discovery of set of attributes in literatures and a template denotes which attributes
from large binary records such that these attributes should occur in the antecedent or the consequent
are true within a same line often enough. It is then part of the rule.
easy to derive rules that describe the data, the
1343 | P a g e
3. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1342-1351
An association rule is an implication processing of rule mining. The empirical study
expression illustrates a kind of association relation. proved that the discovery of dependent relations
A rule is said to be interesting or valid if its support from closed set helps to eliminate redundant rules.
and confidence are user specified minimum support Hetal Thakkar [12]. In the case of stream data, the
and confidence thresholds respectively. The post-mining of association is more challenging and
association rule primarily comprises two phases continuous post mining of association rules is an
based on the identification of frequent item sets unavoidable requirement, which is discussed by this
from the given data mining contexts. However, the author. He presented a technique for continuous
problem of massive real world data transactions can post-mining of association rules in a data stream
be rounded off by adopting other alternatives, which management system. He described the architecture
in turn benefits in lossless representation of data. and techniques used to achieve this advanced
Theoretically, transaction database and relation functionality in the Stream Mill Miner (SMM)
database are two different inter-transformable prototype, an SQL-based DSMS designed to support
representations of data. continuous mining queries, which is impressive.
The production of association mining is a Hacene Cherfi [10] discussed a post association rule
rule base with hundreds to millions of rules. In order mining approach for text mining that combines data
to highlight the important and key ones, certain mining, semantic techniques for post-mining and
other rules are proposed which are Second –order selection of association rules. To focus on the result
rule which states that if the cover of the item set is analysis and to find new knowledge units,
known, then the corresponding relation can be easily classification of association rules according to
derived. All the technical definitions given hence qualitative criteria using domain model as
forth deal with the transaction of the data through background knowledge has been introduced. The
item-sets in association mining. The equivalent authors carried out an empirical study on molecular
property significantly states that rules and classes of biology dataset that proved the benefits of taking
the same hierarchical database support the power in into account a knowledge domain model of the data.
the content. Traditional data mining techniques are Ronaldo Cristiano Prati [3]. The Receiver Operating
implemented in order to justify the property and its Characteristics (ROC) graph is a popular way of
corresponding definition in the specified context. assessing the performance of classification rules, but
The thus identified second order rules can be used to they are inappropriate to evaluate the quality of
filter out useless rules out of the priority rule-set. association rules, as there is no class in association
The effectiveness and efficiency of the rule mining and the consequent part of different
classical methods in plating the rules is thoroughly association rules might not have any correlation at
verified under the 2.8 GHZ Pentium PC. Two group all. Chapter VIII presents a novel technique of
experiments were conducted to prune the QROC, a variation of ROC space to analyze itemset
insignificant association rules and to remove useless costs/benefits in association rules. It can be used to
association classification rules. Removal of non- help analysts to evaluate the relative interestingness
predictive rules by virtue of information gain metric among different association rules in different cost
is much similar to CHARM and CBA which also scenarios.
work on the same track. To generate association and
classification rules by pruning method of Apriori 3. ATTRIBUTE ONTOLOGICAL
software, some external tools are essentially RELATIONAL WEIGHTS FOR
required. The effectiveness of the pruning algorithm COHERENT ASSOCIATION RULES
can be inversely related to the number of rules. This The approach Attribute Ontological
along with the computational time consumed, Relational Weights in short can refer as AORW is
determine an efficient criterion of pruning. post mining process to prune the rules based on
Efficient post processing methods are attribute relational relevancy. The process of
hence proposed to remove pointless rules from rule- AORW Framework can be classified as
ways by eliminating redundancy among rules. The Closed itemset mining
dependent relation, exploitation, makes this method Describing item class descriptor
a self manageable knowledge. The pruning The input for AORW Framework is
procedure has been sliced into three stages starting 1. A set of rules
from derivation to pruning operation on rule-set by 2. An XML descriptor describes attributes, classes,
the use of close rule-sets. It is cost-effective and class properties and class relations.
consumes very little time for the transaction. Hence
Here in this proposal we considered our earlier work
forth, it can be applied to exploit sampling to find closed itemsets.
techniques and data structures, thereby increasing The process steps involved in AORW framework
the efficiency. are
Huawen Liu, [14] presented a technique on 1. Initially AORW measures the property support
post-processing for rule reduction using closed set degree for each attribute involved in given rule.
that was targeting to filter the otiose rules in a post-
1344 | P a g e
4. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1342-1351
2. By using the property support degree of the fts ( st )|st :s p ( for each p 1..|DBS |)|
attributes, Attribute Relation support of attribute
pairs of the given rule will be measured. DBS Is set of sequences
3. With the help of Attribute Relation supports of
all attribute pairs of an itemset that belongs to a
fts ( st ) : Represents the total support „ts‟ of
given rule, Attribute relation support degree of sequence st is the number of super sequences of st
that itemset will be measured. Qualified support „qs‟: The resultant coefficient of
4. Using these Attribute relation support degrees of total support divides by size of sequence database
Left Hand Side and Right Hand Side itemsets of adopt as qualified support „qs‟. Qualified support
the given rule, relation confidence of the rule will can be found by using fallowing formulation.
be determined. f (s )
5. Prunes the rules based on their attribute relation f qs ( st ) ts t
|DBS |
support degree.
Detailed explanation of each step can be found in Sub-sequence and Super-sequence: A
Section V. sequence is sub sequence for its next projected
sequence if both sequences having same total
4. CLOSED ITEMSET MINING [34] support.
A. Dataset adoption and formulation Super-sequence: A sequence is a super
Item Sets I: A set of diverse elements by sequence for a sequence from which that projected,
which the sequences generate. if both having same total support.
Sub-sequence and super-sequence can be formulated
n
I ik as
k 1 Note: „I‟ is set of diverse elements If f ts ( st ) rs where „rs‟ is required support
Sequence set „S‟: A set of sequences, where each threshold given by user
sequence contains elements each element „e‟ And st :s p for any p value where fts ( st ) fts ( s p )
belongs to „I‟ and true for a function p(e). Sequence
set can formulate as
B. Closed Itemset Discovery[34]
m As a first stage of the proposal we perform
s ei |( p (ei ),eiI )
dataset pre-processing and itemsets Database
i 1
initialization. We find itemsets with single element,
Represents a sequence„s‟ of items those belongs to
in parallel prunes itemsets with single element those
set of distinct items „I‟.
contains total support less than required support.
„m‟: total ordered items.
Forward Edge Projection:
P(ei): a transaction, where ei usage is true for that
In this phase, we select all itemsets from
transaction.
given itemset database as input in parallel. Then we
t start projecting edges from each selected itemset to
S s j
all possible elements. The first iteration includes the
j 1
pruning process in parallel, from second iteration
S: represents set of sequences onwards this pruning is not required, which we
„t‟: represents total number of sequences and its claimed as an efficient process compared to other
value is volatile similar techniques like BIDE. In first iteration, we
sj: is a sequence that belongs to S
project an itemset s p that spawned from selected
Subsequence: a sequence s p of sequence set „S‟ is
considered as subsequence of another sequence itemset si from DBS and an element ei considered
sq from „I‟. If the f ts ( s p ) is greater or equal to rs ,
of Sequence Set „S‟ if all items in sequence Sp is
belongs to sq as an ordered list. This can be then an edge will be defined between si and ei . If
formulated as
f ts ( si ) f ts ( s p ) then we prune si from DBS .
n
If ( s pisq )( s p sq ) This pruning process required and limited to first
i 1 iteration only.
n m s pS and sqS From second iteration onwards project the itemset
Then s : s
pi qj
i 1 j 1 S p that spawned from S p ' to each element ei of „I‟.
where
Total Support „ts‟ : occurrence count of a sequence An edge can be defined between S p ' and ei if
as an ordered list in all sequences in sequence set
„S‟ can adopt as total support „ts‟ of that sequence. fts ( s p ) is greater or equal to rs . In this description
Total support „ts‟ of a sequence can determine by S p ' is a projected itemset in previous iteration and
fallowing formulation.
1345 | P a g e
5. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1342-1351
eligible as a sequence. Then apply the fallowing Pattern actual coverage is pattern positive score-
validation to find closed sequence. pattern negative score
Edge pruning : Interest gain: Actual coverage of the
If any of f ts ( s p ' ) f ts ( s p ) that edge will pattern involved in association rule
Coherent rule Actual coverage of the rule‟s left side
be pruned and all disjoint graphs except s p will be pattern must be greater than or equal to actual
considered as closed sequence and moves it coverage of the right side pattern
Inference Support ias: refers actual coverage of the
into DBS and remove all disjoint graphs from
pattern
memory. fia ( st ) : Represents the inference support of the
The above process continues till the
elements available in memory those are connected sequence st
through direct or transitive edges and projecting Approach:
itemsets i.e., till graph become empty For each pattern sp of the pattern dataset, If
fia ( st ) ias
then we prune that pattern
C. Inference Analysis [35]
Inferences:
Pattern positive score is sum of no of 5. ATTRIBUTE ONTOLOGICAL
transactions in which all items in the pattern exist, RELATIONAL WEIGHTS FRAMEWORK
no of transactions in which all items in the pattern The proposed post mining process Attribute
does not exist Ontological Relational Weights described in
Pattern negative score is no of transactions detailed here. Table 1 represents the notations used
in which only few items of the pattern exist in AORW framework.
.
Table1: Notations used in Attribute Ontological Relational Weights
1 r Left side itemset of the Rule r
lhs
2 rrhs Right side itemset of the rule r
3 RS Rule set
4 cpcc Class properties count of class c
5 apca Attribute property count of attribute a
6 psd Property support degree
7 tpc Total properties of class c
8 psa Property support of attribute a of class c
apca
psa
cpcc
ps where a c
9 psd a a
tpc
10 rsc Relation support of class c is max threshold value 1.
11 ARS Attribute Relation support
12 ARSDi Attribute Relation support Degree of itemset i.
13 If attributes a i and a j belongs to ARS (ai , a j ) 1 , where {ai , a j } c
same class c
14 If attribute a i belongs to class ci and psd ai psd a j
ARS (ai , a j )
attribute a j belongs to class c j , ci rsci rsc j
and c j relation is true then
15 If ci and c j relation is false ARS (ai , a j ) 0 ,
16 ARS (ai , a j ) ARS (a j , ai ) Applicable in all cases such as both belongs to same class, both
belongs to different classes that are not having relation and both
belongs to different classes that are having relation
17 No of attribute pairs in an itemset pc
1346 | P a g e
6. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1342-1351
18 pc : is total number of pairs in a given itemset.
pc 0; n : is total number of attributes in the given itemset
n 1
n 1
pc i
i 1
(or ) pc
2
n
19 pcrlhs Pair-count of itemset, which is lhs of rule r .
20 pcrrhs Pair-count of itemset, which is rhs of rule r .
21 pcrlhs rrhs Pair-count of itemset that generated from rlhs rrhs
22 PS { p , p , ......., p } Pair set that generated from itemset i
i 1 2 m
23 ARS ( pi ) Attribute relation support of pair pi .
24 | psi | Attribute relation support degree of itemset i
ARS ( pk )
k 1 And pk psi , here pk is kth pair of pair-set ps of itemset i
ARSDi
| psi |
25 RCr Relation confidence of rule r .
|PSrlhs | Attribute relation support degree of itemset, which is lhs of
26
pk rule r
ARSDr k 1 And pk psrlhs , here pk is kth pair of pair-set ps of itemset
lhs |PS
rlhs |
lhs of rule r
|PSrrhs | Attribute relation support degree of itemset, which is rhs of
27
pk rule r
ARSDr k 1 And pk psrrhs , here pk is kth pair of pair-set ps of itemset
rhs |PSr |
rhs
rhs of rule r
|PS | Attribute relation support degree of itemset that generated from
28 rlhs rrhs
pk lhs rhs of rule r
k 1
ARSDrlhs rrhs
|PSrlhs rrhs |
And pk PS r , here pk is kth pair of pair-set ps of
lhs rrhs
itemset that generated from lhs rhs of rule r
ARSDrlhs rrhs Relation confidence of r is the coefficient emerged as result
29 rcr when 30attribute support degree of all attribute involved in rule
ARSDrlhs
r is divided by attribute support degree of rule r ‟s lhs
Property Support: No of attribute properties are Relation confidence: is a rule level measurement
matched to number class properties to which that concludes the relation strength between left hand side
attribute belongs to [Table 1 row: 8]. itemset and right hand side itemset of a given rule[see
Property Support degree: indicates the ratio of table 1 row: 29]
properties matched to class level properties [table 1 AORW algorithm:
row: 9]. Input: Rule set RS , Class Descriptor CD and relation
psa confidence threshold rct
Ex: psd a ; here a is an attribute of class '
cpcc Output: Significant Rule set RS which is subset of
c [ a c] RS RS RS'
Attribute Relation support: Indicates the strength of Begin:
the relation between two attributes of an itemset that While RS is not empty
are considered as pair for equation [see table 1 row: Begin:
11, 13]. Read a rule r from RS
Pair Count: Total number of two attributes sets; here
these attribute sets must be unique [see table 1, row Find property support psa and property support
17, 18] degree psd a of each attribute a of rlhs [Table 1
Attribute Relation support degree: is an itemset
row: 8, 9]
level measurement representing average relation
strength of the attributes those belongs to an itemset Find property support psa and property support
[see table 1 row: 24]
1347 | P a g e
7. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1342-1351
degree psd a of each attribute a of rrhs [Table 1 mining techniques. 3) There is the involvement of an
enhanced occurrence and a probability reduction in
row: 8, 9] the memory exploitation rate with the aid of the trait
Find unique two attribute pair set PS from equivalent prognosis and also rim snipping of the
rlhs
PEPP with inference analysis and AORW. This is on
rlhs [Table 1 row: 22] the basis of the surveillance done which concludes
Find Attribute relation support ARS p for each pair that AORW implementation is far more noteworthy
and important in contrast with the likes of other
p , where p PS rlhs [Table 1; row: 13, 15, 15 and notable models [10, 12, 14].
16] JAVA 1.6_ 20th build was employed for
Find Attribute relation support degree ARSDrlhs of accomplishment of the AORW along with PEPP
under inference analysis. A workstation equipped
rlhs [Table 1; row: 24, 26]. with core2duo processor, 2GB RAM and Windows
Find unique two attribute pair set PS r from XP installation was made use of for investigation of
lhs the algorithms. The parallel replica was deployed to
rlhs [Table 1 row: 22] attain the thread concept in JAVA.
Find Attribute relation support ARS p for each pair
Dataset Characteristics:
p , where pPSrrhs [Table 1; row: 13, 15, 15 and 16] Pi is supposedly found to be a very opaque
Find Attribute relation support degree ARSDrrhs of dataset, which assists in excavating enormous
quantity of recurring clogged series with a profitably
rlhs [Table 1; row: 24, 27]. high threshold somewhere close to 90%. It also has a
Find unique two attribute pair set PSr r from distinct element of being enclosed with 190 protein
lhs rhs series and 21 divergent objects. Reviewing of
rlhs rrhs [Table 1 row: 22] serviceable legacy‟s consistency has been made use of
by this dataset. Fig. 3 portrays an image depicting
Find Attribute relation support ARS p for each pair dataset series extent status.
p , where p PS rlhs rrhs [Table 1; row: 13, 15, 15 In assessment with all the other regularly
quoted forms like [14,12,10], Post-Processing for
and 16] Rule Reduction Using Closed Set[14] has made its
Find Attribute relation support degree ARSDr mark as a most preferable, superior and sealed
lhs rrhs
example of post mining copy, taking in view the
of rlhs rrhs [Table 1; row: 24, 28]. detailed study of the factors mainly, experts
Find unique two attribute pair set PSr r from involvement, memory consumption and runtime.
lhs rhs
rlhs rrhs [Table 1 row: 22]
Find Attribute relation support ARS p for each pair
p , where p PS [Table 1; row: 13, 15, 15
rrhs rlhs
and 16]
Find Attribute relation support degree ARSDr
rhs rlhs
of rlhs rrhs [Table 1; row: 24, 28].
Find Relation confidence rcr of rule r
If rcr rct then add rule r to resultant rule set RS
'
End
End
Fig 2: Attribute Ontological Relational Weights
algorithm
This segment focuses mainly on providing
evidence on asserting the claimed assumptions that 1) Fig 3: A comparison report for Runtime
The post mining framework AORW is competent
enough to momentously surpass results when
evaluated against other post mining techniques [14, 10,
12. 2) Utilization of memory and computational
complexity is less when compared to other post
1348 | P a g e
8. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1342-1351
Fig 6: Rules pruned in in % by AORW and RRUC[14]
Fig: 4: A comparison report for memory usage 6. CONCLUSION
We proposed a post mining process called
Attribute relation analysis framework (AORW) for
pruning association rules that are contextually
irrelevant. In earlier works [10, 12, 14] the contextual
irrelevancy was identified in various ways such as (1)
rule evaluation by domain expert, (2) rule evaluation
by itemset closeness. We argued that none of these
two models is significant in all contexts. In second
case adaptability to various data contexts is missing.
In the first case, rule selection highly influenced by
the experts view, that is when expert changes then
rule significance might be rated differently. To defend
these limits here we proposed a post mining process
Fig 5: Sequence length and number of sequences at as an extension to our earlier proposed closed itemset
different thresholds in Pi dataset mining algorithm PEPP with inference analysis [34,
35]. Here in this proposed post mining process
In contrast to AORW and RRUC [14], a very AORW, the experts view is not defending one to other,
intense dataset Pi is used which has petite recurrent rather it extends or refines. This become possible in
closed series whose end to end distance is less than 10, AORW because of the proposed concept called
even in the instance of high support amounting to attribute class relation descriptor. In this work we
around 90%. The diagrammatic representation consider relation confidence as bench mark for rule
displayed in Fig 3 explains that the above mentioned pruning; in future this work can be extended to prune
two algorithms execute in a similar fashion in case of the rules by opting relation confidence threshold
rules that are generated at support 90% and above. under inference analysis.
But in situations when the support case is 88% and
less, then the act of AORW surpasses RRUC [14]‟s REFERENCES
routine. The disparity in memory exploitation of [1] Fienberg, S. E. & Shmueli, G: "Statistical
AORW and RRUC [14] can be clearly observed issues and challenges associated with rapid
because of the consumption level of AORW being detection of bio-terrorist attacks. Statistics in
low than that of RRUC. Fig 5 indicates that rules that Medicine, 2005
are contextually irrelevant have been pruned by [2] Carmona-Saez, P., Chagoyen, M.,
AORW in high probable rate and stable. Apart from Rodriguez, A., Trelles, O., Carazo, J. M., &
the benefits observed, the rules identified by AORW Pascual-Montano, A: :Integrated analysis of
are more relevant to transaction consequences. It gene expression by association rules
becomes possible since there is no task of experts discovery". BMC Bioinformatics, 2006
evaluation in post mining process. Due to the concept [3] Ronaldo Cristiano Prati, "QROC: A
of attribute class relation descriptor, the relation Variation of ROC Space to Analyze Item Set
between attributes involved in rule is stable. Costs/Benefits in Association Rules", Post-
Mining of Association Rules: Techniques for
Effective Knowledge Extraction, IGIGlobal,
2009 Pages: 133-148 pp.
[4] McNicholas, P. D: "Association rule analysis
of CAO data (with discussion). Journal of the
Statistical and Social Inquiry Society of
Ireland, 2007
1349 | P a g e
9. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1342-1351
[5] Prati, R., & Flach, P. "ROCCER: an Knowledge". ACM SIGKDD Conference on
algorithm for rule learning based on ROC Knowledge Discovery in Databases, 2004
analysis". Proceeding of the 19th [16] Jaroszewicz, S., & Scheffer, T: "Fast
International Joint Conference on Artificial Discovery of Unexpected Patterns in Data,
Intelligence, 2005 Relative to a Bayesian Network". ACM
[6] Fawcett, T: "PRIE: A system to generate SIGKDD Conference on Knowledge
rulelists to maximize ROC performance". Discovery in Databases, 2005
Data Mining and Knowledge Discovery [17] Faure, C., Delprat, D., Boulicaut, J. F., &
Journal, 2008, 17(2), 207-224. Mille, A: "Iterative Bayesian Network
[7] Kawano, H., & Kawahara, M: "Extended Implementation by using Annotated
Association Algorithm Based on ROC Association Rules." 15th Int‟l Conf. on
Analysis for Visual Information Navigator". Knowledge Engineering and Knowledge
Lecture Notes In Computer Science, 2002, Management – Managing Knowledge in a
2281, 640-649. World of Networks, 2006
[8] Piatetsky-Shapiro, G., Piatetsky-Shapiro, G., [18] Liu, B., & Hsu, W: "Post-Analysis of
Frawley, W. J., Brin, S., Motwani, R., Learned Rules". AAAI/IAAI, 1996
Ullman, J. D., : "Discovery, Analysis, and [19] Lent, B., Swami, A. N., & Widom, J:
Presentation of Strong Rules". Proceedings "Clustering Association Rules". Proceedings
of the 11th international symposium on of the Thirteenth International Conference on
Applied Stochastic Models and Data Data Engineering, 1997, Birmingham U.K.,
Analysis ASMDA, 2005, 16, 191-200 IEEE Computer Society
[9] Liu, B., Hsu, W., & Ma, Y: "Pruning and [20] Savasere, A., Omiecinski, E., & Navathe, S.
summarizing the discovered associations". B: "Mining for strong negative associations
Proceedings of the fifth ACM SIGKDD in a large database of customer transactions".
international conference on Knowledge Proceedings of the 14th International
discovery and data mining 1999, (pp. 125- Conference on Data Engineering,
134) Washington DC, USA, 1998
[10] Hacene Cherfi, Amedeo Napoli, Yannick [21] Basu, S., Mooney, R. J., Pasupuleti, K. V., &
Toussaint: "A Conformity Measure Using Ghosh J: "Evaluating the Novelty of Text-
Background Knowledge for Association Mined Rules using Lexical Knowledge". 7th
Rules: Application to Text Mining", Post- ACM SIGKDD International Conference on
Mining of Association Rules: Techniques for Knowledge Discovery in Databases, 2001
Effective Knowledge Extraction, IGIGlobal, [22] Claudia Marinica and Fabrice Guillet:
2009 Pages: 100-115 pp. "Knowledge-Based Interactive Postmining of
[11] Liu, B., Hsu, W., & Ma, Y.: "Pruning and Association Rules Using Ontologies", IEEE
summarizing the discovered associations". TRANSACTIONS ON KNOWLEDGE
Proceedings of the fifth ACM SIGKDD AND DATA ENGINEERING, VOL. 22,
international conference on Knowledge NO. 6, JUNE 2010
discovery and data mining, 1999 [23] M.J. Zaki and C.J. Hsiao, “Charm: An
[12] Hetal Thakkar, Barzan Mozafari, Carlo Efficient Algorithm for Closed Itemset
Zaniolo: "Continuous Post-Mining of Mining,” Proc. Second SIAM Int‟l Conf.
Association Rules in a Data Stream Data Mining, pp. 34-43, 2002.
Management System", Post-Mining of [24] J. Pei, J. Han, and R. Mao, “Closet: An
Association Rules: Techniques for Effective Efficient Algorithm for Mining Frequent
Knowledge Extraction, IGIGlobal, 2009 Closed Itemsets,” Proc. ACM SIGMOD
Pages: 116-132 pp. Workshop Research Issues in Data Mining
[13] Kuntz, P., Guillet, F., Lehn, R., & Briand, H: and Knowledge Discovery, pp. 21-30, 2000.
"A User-Driven Process for Mining [25] M.J. Zaki and M. Ogihara, “Theoretical
Association Rules". In D. Zighed, H. Foundations of Association Rules,” Proc.
Komorowski, & J. Zytkow (Eds.), 4th Eur. Workshop Research Issues in Data Mining
Conf. on Principles of Data Mining and and Knowledge Discovery (DMKD ‟98), pp.
Knowledge Discovery 2000 1-8, June 1998.
[14] Huawen Liu, Jigui Sun, Huijie Zhang: "Post- [26] D. Burdick, M. Calimlim, J. Flannick, J.
Processing for Rule Reduction Using Closed Gehrke, and T. Yiu, “Mafia: A Maximal
Set", Post-Mining of Association Rules: Frequent Itemsgorithm,” IEEE Trans.
Techniques for Effective Knowledge Knowledge and Data Eng., vol. 17, no. 11,
Extraction, IGIGlobal,2009 Pages: 81-99 pp. pp. 1490-1504, Nov. 2005.
[15] Jaroszewicz, S., & Simovici, D. A: [27] J. Li, “On Optimal Rule Discovery,” IEEE
"Interestingness of Frequent Itemsets using Trans. Knowledge and Data Eng., vol. 18,
Bayesian networks as Background no. 4, pp. 460-471, Apr. 2006.
1350 | P a g e
10. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 5, September- October 2012, pp.1342-1351
[28] M. Hahsler, C. Buchta, and K. Hornik, SHORT BIO DATA FOR THE AUTHOR
“Selective Association Rule Generation,”
Computational Statistic, vol. 23, no. 2, pp. Ms. M. Saritha working as Assistant Professor in the
303-315, Kluwer Academic Publishers, Department of Computer Science and Engineering.
2008. Guru Nanak Institutions of Technical Campus with a
[29] H. Toivonen, M. Klemettinen, P. Ronkainen, teaching experience of 1.5 years. She is a
K. Hatonen, and H. Mannila, “Pruning and B.Tech(ECE), and M.Tech in Computer science (PC)
Grouping of Discovered Association Rules,” and Her areas of interest includes Mobile Computing,
Proc. ECML-95 Workshop Statistics, DataMining and Information Security.
Machine Learning, and Knowledge
Discovery in Databases, pp. 47-52, 1995. Professor.ThaduriVenkataRamana received the
[30] J. Bayardo, J. Roberto, and R. Agrawal, Master Degree in Computer Science and Engineering
“Mining the Most Interesting Rules,” Proc. from Osmania University, Hyderabad. Pursuing Ph.D
ACM SIGKDD, pp. 145-154, 1999.
in Computer Science and Engineering from
[31] M. Klemettinen, H. Mannila, P. Ronkainen,
H. Toivonen, and A.I. Verkamo, “Finding Jawaharlal Nehru Technological University,
Interesting Rules from Large Sets of Hyderabad (JNTUH), Hyderabad. . His area of
Discovered Association Rules,” Proc. Int‟l interest in research is Information Retrieval & Data
Conf. Information and Knowledge mining. Presently working as a Professor & Head,
Management (CIKM), pp. 401-407, 1994. Department of Computer Science and Engineering,
[32] R. Srikant and R. Agrawal, “Mining SLC‟s Institute of Engineering and Technology,
Generalized Association Rules,” Proc. 21st
Int‟l Conf. Very Large Databases, pp. 407- Hyderabad.
419,
http://citeseer.ist.psu.edu/srikant95mining.ht Ms. Kareemunnisa working as Assistant Professor in
ml, 1995. the Department of Computer Science and
[33] Barzan Mozafari, Hetal Thakkar, Carlo Engineering. Guru Nanak Institutions of Technical
Zaniolo. Verifying and Mining Frequent Campus with a teaching experience of 6 years. She is
Patterns from Large Windows over Data a B.Tech(IT), and M.Tech in Computer scienceand
Streams. In Proceedings of the 24th Engineering (CSE) and Her areas of interest includes
International Conference on Data Mobile Computing, Data Mining and Information
Engineering (ICDE 2008), Cancún, México, Security.
April 7-12, 2008
Ms. Y. Sowjanya received Bachelor Degree in
[34] Parallel Edge Projection and Pruning (PEPP)
Information Technology from Jawaharlal Nehru
Based Sequence Graph protrude approach for
Technological University, Hyderabad (JNTUH) and
Closed Itemset Mining kalli Srinivasa
pursuing Master Degree in Computer Science and
Nageswara Prasad, Sri Venkateswara
Engineering (CSE) form Jawaharlal Nehru
University, Tirupati, Andhra Pradesh, India.
Technological University, Hyderabad (JNTUH). Her
Prof. S. Ramakrishna, Department of
research interest includes Computer Networks and
Computer Science, Sri Venkateswara
Information Retrieval Systems, Data Mining.
University, Tirupati, Andhra Pradesh, India
ol. 9 No. 9 September 2011 International
Ms. M. Nishantha working as Assistant Professor in
Journal of Computer Science and
the Department of Computer Science and
Information Security Publication September
Engineering. Guru Nanak Institutions of Technical
2011, Volume 9 No. 9
Campus with a teaching experience of 6 years. She is
[35] Mining Closed Itemsets for Coherent Rules:
a B.Tech(IT), and M.Tech in Computer scienceand
An Inference Analysis Approach By Kalli
Engineering (IT) and Her areas of interest includes
Srinivasa Nageswara Prasad, Prof. S.
Mobile Computing, DataMining and Information
Ramakrishna, Global Journal of Computer
Security.
Science and Technology Volume 11 Issue 19
Version 1.0 November 2011 Type: Double
Blind Peer Reviewed International Research
Journal Publisher: Global Journals Inc.
(USA) Online ISSN: 0975-4172 & Print
ISSN: 0975-4350
1351 | P a g e