SlideShare une entreprise Scribd logo
1  sur  10
Télécharger pour lire hors ligne
M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
 Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue 5, September- October 2012, pp.1342-1351
Attribute Ontological Relational Weights (AORW) For Coherent
Association Rules: A Post Mining Process For Association Rules
         M.Saritha1, T.Venkata Ramana2,Kareemunnisa3, Y.Sowjanya4,
                                M.Nishantha5
   1
      (Asst. Professor, Department of Computer Science, GuruNanak Institutions of Technical Campus, India)
                     2
                       (Professor, Head of the Department of Computer Science, SLC‟s IET, India)
  3, 5
       (Asst. Professors, Department of Information Technology, GuruNanak Institutions of Technical Campus,
                                                        India)
                  4
                    (M.tech student, Department of Computer Science and Engg, SLC‟s IET, India)


ABSTRACT                                                  "Coding of data as binary if data is not binary” ->
         Association rules present one of the most        “Rule generation” -> “Post-mining". This survey
impressive techniques for the analysis of                 focused on post mining. It was a century after the
attribute associations in a given dataset related to
applications related to retail, bioinformatics, and       introduction of association rules (associations
sociology. In the area of data mining, the                initially discussed in 1902), it is still continuing that
importance of the rule management in                      the absence of items from transactions is often
associating rule mining is rapidly growing.               ignored in analyses.
Usually, if datasets are large, the induced rules
are large in volume. The density of the rule              A. Association rule mining
volume leads to the obtained knowledge hard to                Given a set I that is non-empty, a rule of
be understood and analyze. One better way of              association is a statement of the form X ⇒ Y, where
minimizing the rule set size is eliminating               X, Y ⊂ I such that X ≠ ∅, Y ≠ ∅, and X ∩ Y = ∅.
redundant rules from rule base. Many efforts              The dataset X is called the antecedent of the rule,
have been made and various competent and                  the set Y is called the consequent of the rule, and
excellent algorithms have been proposed. But all          we shall call I the master itemset. Association rules
of these models relying either on closed itemset          are generated over a large set of transactions,
mining or expert’s evaluation. None of these              denoted by T. Xn association rule can be deemed
models are proven best in all data set contexts.          interesting if the items involved occur together often
Closed itemset model is missing adaptability and          and there are suggestions that one of the sets might
expert’s evaluation process is resulting different        in some cases lead to the presence of the other set. X
significance for same rule under different                association rules are characterised as interesting, or
expert’s view. To overcome these limits here we           not, based on mathematical notions called „support‟,
proposed a post mining process called AORW as             „confidence‟ and „lift‟. Although there are now a
an extension to closed itemset mining algorithms.         multitude of measures of interestingness available to
                                                          the analyst, many of them are still based on these
Keywords: post mining, association rule mining,           three functions.
closed itemset, PEPP, Inference analysis, rule                      In many applications, it is not only the
pruning.                                                  presence of items in the antecedent and the
                                                          consequent parts of an association rule that may be
1. INTRODUCTION                                           of interest. Consideration, in many cases, should be
         In general, association rules tend to deliver    given to the relationship between the absence of
an efficient method of analysing binary or                items from the antecedent part and the presence or
discredited data sets that are large in volume. One       absence of items from the consequent part. Further,
common practice is to determine relationships             the presence of items in the antecedent part can be
between binary variables in transaction databases,        related to the absence of items from the consequent
which is known as „market basket analyses. In the         part; for example, a rule such as {margarine} ⇒
case of non-binary data, initially data being coded as    {not butter}, which might be referred to as a
binary and then association rules will be used to         „replacement rule‟. One way to incorporate the
analyse. Association rules having their impact on         absence of items into the association rule mining
analysing large binary datasets and considered as         paradigm is to consider rules of the form X⇒/ Y
versatile approach for modern applications such as        [20]. Another is to think in terms of negations.
detection of bio-terrorist attacks [1] and the analysis   Suppose X ⊂ I, then write ¬X to denote the absence
of gene expression data [2], to the analysis of Irish     or negation of the item, or items, in X from a
third level education applications [4]. The steps         transaction. Considering X as a binary {0, 1}
involved in a typical association rule analysis are       variable, the presence of the items in X is equivalent

                                                                                                 1342 | P a g e
M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
 Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue 5, September- October 2012, pp.1342-1351
to X = 1, while the negation ¬X is equivalent to        popular association rules though the interest of
XX`€=X`€0. The concept of considering association       frequent sets goes further.
rules    involving      negations,     or  “negative              Based on the proposals [14, 12, 10, 3]
implications”, is due to Silverstein [20].              recently cited in literature and their motivations, it is
                                                        observable that the process of rule pruning will opt
                                                        to one of the two models.
B. Post mining                                          (1)       Rule pruning under post mining process
          Pruning rules and detection of rule           that demands domain experts observation
interestingness are employed in the post-mining         (2)       Rule pruning under post mining process
stage of the association rule mining paradigm.          that aims to avoid domain expert‟s role in pruning
However, there are a host of other techniques used      process.
in the post-mining stage that do not naturally fall               As in the first case rule pruning accuracy
under either of these headings. Some such               depends on domain expert‟s awareness on attribute
techniques are in the area of redundancy- removal.      relations. In this case it is always obvious to prune
There are often a huge number of association rules      the rules under reliable domain expert‟s observation.
to contend with in the post-mining stage and it can     In second case the models prune the rules based on
be very difficult for the user to identify which ones   dynamically determined attribute relations. This
are interesting. Therefore, it is important to remove   limits the solution to specific data models. Hence it
insignificant rules, prune redundancy and do further    is not adaptable for all contexts.
post-mining on the discovered rules [18, 11]. Liu                 The Rest of the paper organized as; in
[11] proposed a technique for pruning and               section II we discussed the most frequently cited
summarising discovered association rules by first       post mining models to improve rule accuracy.
removing those insignificant associations and then      Section III briefs the post mining process that we
forming direction-setting rules, which provide a sort   opted. Section IV briefs the approach of closed
of summary of the discovered association rules.         itemset mining, and inference approach for itemset
Lent [19] proposed clustering the discovered            pruning. Section V explores the process of Attribute
association rules.                                      Relational weights analysis for rule pruning. Results
                                                        discussion and comparative study will be in Section
C. Influences of Input formats                          VI that fallowed by conclusion and references.
           The input formats that influence the post
mining methodologies are binary data, text data and     2. RELATED WORK
streaming data. In association rule mining, much of               Huawen Liu, [14] proposed post processing
recent research work has focused on the difficult       approach for rule reduction using closed set to filter
problem of mining data streams such as click stream     superfluous rules from knowledge base in a post-
analysis, intrusion detection, and web-purchase         processing manner that can be well discovered by a
recommendation systems. In the case of streaming        closed set mining technique. Most of these methods
data, it is not possible to perform mining on cached    are based on the rule structure analysis where the
and fixed data records. The attempt of caching data     relations between the rules have been analysed using
leads to memory usage issues, and the attempt of        corresponding problem. This procedure claims to
mining static data leads to worst time complexity       eliminate noise and redundant rules in order to
since continuous dataset update leads to continuous     provide users with compact and precise knowledge
passes through dataset. From the data mining point      derived from databases by data mining method.
of view, texts are complex data giving raise to         Further, a fast rule reduction algorithm using closed
interesting challenges. First, texts may be             set is introduced. Other endeavours have been
considered as weakly structured, compared with          attempted to prune the rule bases directly. The
databases that rely on a predefined schema.             typical cases have been elaborated and illustrated by
Moreover, texts are written in natural language,        eminent people from all over.
carrying out implicit knowledge, and ambiguities.                 Modest number of proposals is addressed
Hence, the representation of the content of a text is   on pre-pruning and post-pruning. In a line, the
often only partial and possibly noisy. One solution     pruning operation occurs at the phase of generation
for handling a text or a collection of texts in a       of significant rules. To add to this, the post pruning
satisfying way is to take advantage of a knowledge      technique mainly concerns primarily emphasises
model of the domain of the texts, for guiding the       that pruning operation occurs after the rule
extraction of knowledge units from the texts. One of    generation, among which the rule cover is a
the obvious hot topics of data mining research in the   representative case. To extract interesting rules,
last few years has been rule discovery from binary      aprori knowledge has also been taken into account
data. It concerns the discovery of set of attributes    in literatures and a template denotes which attributes
from large binary records such that these attributes    should occur in the antecedent or the consequent
are true within a same line often enough. It is then    part of the rule.
easy to derive rules that describe the data, the

                                                                                               1343 | P a g e
M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
 Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue 5, September- October 2012, pp.1342-1351
          An association rule is an implication            processing of rule mining. The empirical study
expression illustrates a kind of association relation.     proved that the discovery of dependent relations
A rule is said to be interesting or valid if its support   from closed set helps to eliminate redundant rules.
and confidence are user specified minimum support          Hetal Thakkar [12]. In the case of stream data, the
and confidence thresholds respectively. The                post-mining of association is more challenging and
association rule primarily comprises two phases            continuous post mining of association rules is an
based on the identification of frequent item sets          unavoidable requirement, which is discussed by this
from the given data mining contexts. However, the          author. He presented a technique for continuous
problem of massive real world data transactions can        post-mining of association rules in a data stream
be rounded off by adopting other alternatives, which       management system. He described the architecture
in turn benefits in lossless representation of data.       and techniques used to achieve this advanced
Theoretically, transaction database and relation           functionality in the Stream Mill Miner (SMM)
database are two different inter-transformable             prototype, an SQL-based DSMS designed to support
representations of data.                                   continuous mining queries, which is impressive.
          The production of association mining is a        Hacene Cherfi [10] discussed a post association rule
rule base with hundreds to millions of rules. In order     mining approach for text mining that combines data
to highlight the important and key ones, certain           mining, semantic techniques for post-mining and
other rules are proposed which are Second –order           selection of association rules. To focus on the result
rule which states that if the cover of the item set is     analysis and to find new knowledge units,
known, then the corresponding relation can be easily       classification of association rules according to
derived. All the technical definitions given hence         qualitative criteria using domain model as
forth deal with the transaction of the data through        background knowledge has been introduced. The
item-sets in association mining. The equivalent            authors carried out an empirical study on molecular
property significantly states that rules and classes of    biology dataset that proved the benefits of taking
the same hierarchical database support the power in        into account a knowledge domain model of the data.
the content. Traditional data mining techniques are        Ronaldo Cristiano Prati [3]. The Receiver Operating
implemented in order to justify the property and its       Characteristics (ROC) graph is a popular way of
corresponding definition in the specified context.         assessing the performance of classification rules, but
The thus identified second order rules can be used to      they are inappropriate to evaluate the quality of
filter out useless rules out of the priority rule-set.     association rules, as there is no class in association
          The effectiveness and efficiency of the          rule mining and the consequent part of different
classical methods in plating the rules is thoroughly       association rules might not have any correlation at
verified under the 2.8 GHZ Pentium PC. Two group           all. Chapter VIII presents a novel technique of
experiments were conducted to prune the                    QROC, a variation of ROC space to analyze itemset
insignificant association rules and to remove useless      costs/benefits in association rules. It can be used to
association classification rules. Removal of non-          help analysts to evaluate the relative interestingness
predictive rules by virtue of information gain metric      among different association rules in different cost
is much similar to CHARM and CBA which also                scenarios.
work on the same track. To generate association and
classification rules by pruning method of Apriori          3. ATTRIBUTE       ONTOLOGICAL
software, some external tools are essentially                 RELATIONAL   WEIGHTS     FOR
required. The effectiveness of the pruning algorithm          COHERENT ASSOCIATION RULES
can be inversely related to the number of rules. This                The approach Attribute Ontological
along with the computational time consumed,                Relational Weights in short can refer as AORW is
determine an efficient criterion of pruning.               post mining process to prune the rules based on
          Efficient post processing methods are            attribute relational relevancy. The process of
hence proposed to remove pointless rules from rule-        AORW Framework can be classified as
ways by eliminating redundancy among rules. The             Closed itemset mining
dependent relation, exploitation, makes this method         Describing item class descriptor
a self manageable knowledge.               The pruning     The input for AORW Framework is
procedure has been sliced into three stages starting       1. A set of rules
from derivation to pruning operation on rule-set by        2. An XML descriptor describes attributes, classes,
the use of close rule-sets. It is cost-effective and          class properties and class relations.
consumes very little time for the transaction. Hence
                                                           Here in this proposal we considered our earlier work
forth, it can be applied to exploit sampling               to find closed itemsets.
techniques and data structures, thereby increasing         The process steps involved in AORW framework
the efficiency.                                            are
          Huawen Liu, [14] presented a technique on        1. Initially AORW measures the property support
post-processing for rule reduction using closed set            degree for each attribute involved in given rule.
that was targeting to filter the otiose rules in a post-

                                                                                                1344 | P a g e
M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
 Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                     Vol. 2, Issue 5, September- October 2012, pp.1342-1351
2. By using the property support degree of the      fts ( st )|st :s p ( for each p 1..|DBS |)|
    attributes, Attribute Relation support of attribute
    pairs of the given rule will be measured.             DBS Is set of sequences
3. With the help of Attribute Relation supports of
    all attribute pairs of an itemset that belongs to a
                                                          fts ( st ) : Represents the total support „ts‟ of
    given rule, Attribute relation support degree of      sequence st is the number of super sequences of st
    that itemset will be measured.                        Qualified support „qs‟: The resultant coefficient of
4. Using these Attribute relation support degrees of      total support divides by size of sequence database
   Left Hand Side and Right Hand Side itemsets of         adopt as qualified support „qs‟. Qualified support
   the given rule, relation confidence of the rule will   can be found by using fallowing formulation.
   be determined.                                                        f (s )
5. Prunes the rules based on their attribute relation      f qs ( st )  ts t
                                                                         |DBS |
   support degree.
   Detailed explanation of each step can be found in                  Sub-sequence and Super-sequence: A
   Section V.                                             sequence is sub sequence for its next projected
                                                          sequence if both sequences having same total
4. CLOSED ITEMSET MINING [34]                             support.
A. Dataset adoption and formulation                                   Super-sequence: A sequence is a super
        Item Sets I: A set of diverse elements by         sequence for a sequence from which that projected,
which the sequences generate.                             if both having same total support.
                                                          Sub-sequence and super-sequence can be formulated
    n
I   ik                                                  as
   k 1 Note: „I‟ is set of diverse elements              If f ts ( st )  rs    where „rs‟ is required support
Sequence set „S‟: A set of sequences, where each          threshold given by user
sequence contains elements each element „e‟               And st :s p for any p value where fts ( st ) fts ( s p )
belongs to „I‟ and true for a function p(e). Sequence
set can formulate as
                                                          B. Closed Itemset Discovery[34]
      m                                                             As a first stage of the proposal we perform
 s   ei |( p (ei ),eiI )
                                                          dataset pre-processing and itemsets Database
     i 1
                                                          initialization. We find itemsets with single element,
Represents a sequence„s‟ of items those belongs to
                                                          in parallel prunes itemsets with single element those
set of distinct items „I‟.
                                                          contains total support less than required support.
„m‟: total ordered items.
                                                          Forward Edge Projection:
P(ei): a transaction, where ei usage is true for that
                                                                    In this phase, we select all itemsets from
transaction.
                                                          given itemset database as input in parallel. Then we
         t                                                start projecting edges from each selected itemset to
 S  s j
                                                          all possible elements. The first iteration includes the
       j 1
                                                          pruning process in parallel, from second iteration
S: represents set of sequences                            onwards this pruning is not required, which we
„t‟: represents total number of sequences and its         claimed as an efficient process compared to other
value is volatile                                         similar techniques like BIDE. In first iteration, we
sj: is a sequence that belongs to S
                                                          project an itemset s p that spawned from selected
Subsequence: a sequence s p of sequence set „S‟ is
considered as subsequence of another sequence             itemset si from DBS and an element ei considered
sq                                                        from „I‟. If the f ts ( s p ) is greater or equal to rs ,
   of Sequence Set „S‟ if all items in sequence Sp is
belongs to sq as an ordered list. This can be             then an edge will be defined between si and ei . If
formulated as
                                                           f ts ( si )  f ts ( s p ) then we prune si from DBS .
            n
If       (  s pisq )( s p  sq )                       This pruning process required and limited to first
          i 1                                            iteration only.
          n     m            s pS and sqS               From second iteration onwards project the itemset
Then      s :  s
                pi          qj
         i 1        j 1                                  S p that spawned from S p ' to each element ei of „I‟.
                      where
Total Support „ts‟ : occurrence count of a sequence       An edge can be defined between S p ' and ei if
as an ordered list in all sequences in sequence set
„S‟ can adopt as total support „ts‟ of that sequence.      fts ( s p ) is greater or equal to rs . In this description
Total support „ts‟ of a sequence can determine by         S p ' is a projected itemset in previous iteration and
fallowing formulation.


                                                                                                   1345 | P a g e
M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
    Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                       Vol. 2, Issue 5, September- October 2012, pp.1342-1351
eligible as a sequence. Then apply the fallowing                         Pattern actual coverage is pattern positive score-
validation to find closed sequence.                                      pattern negative score
Edge pruning       :                                                                 Interest gain: Actual coverage of the
         If any of f ts ( s p ' )  f ts ( s p ) that edge will          pattern involved in association rule
                                                                         Coherent rule Actual coverage of the rule‟s left side
be pruned and all disjoint graphs except           s p will be           pattern must be greater than or equal to actual
considered as closed sequence and moves it                               coverage of the right side pattern
                                                                         Inference Support ias: refers actual coverage of the
into DBS and remove all disjoint graphs from
                                                                         pattern
memory.                                                                   fia ( st ) : Represents the inference support of the
         The above process continues till the
elements available in memory those are connected                         sequence st
through direct or transitive edges and projecting                        Approach:
itemsets i.e., till graph become empty                                   For each pattern sp of the pattern dataset, If
                                                                          fia ( st )  ias
                                                                                             then we prune that pattern
C. Inference Analysis [35]
Inferences:
         Pattern positive score is sum of no of              5.       ATTRIBUTE             ONTOLOGICAL
transactions in which all items in the pattern exist,      RELATIONAL WEIGHTS FRAMEWORK
no of transactions in which all items in the pattern       The proposed post mining process Attribute
does not exist                                             Ontological Relational Weights described in
         Pattern negative score is no of transactions      detailed here. Table 1 represents the notations used
in which only few items of the pattern exist               in AORW framework.
                                                           .
Table1: Notations used in Attribute Ontological Relational Weights
1          r                                       Left side itemset of the Rule r
               lhs
2             rrhs                                            Right side itemset of the rule r
3             RS                                              Rule set
4             cpcc                                            Class properties count of class c
5             apca                                            Attribute property count of attribute a
6              psd                                            Property support degree
7             tpc                                             Total properties of class c
8             psa                                             Property support of attribute a of class c
                     apca
              psa 
                     cpcc
                      ps                                      where a  c
9             psd a  a
                      tpc
10            rsc                                             Relation support of class c is max threshold value 1.
11            ARS                                             Attribute Relation support
12            ARSDi                                           Attribute Relation support Degree of itemset i.
13           If attributes a i and a j       belongs to           ARS (ai , a j )  1 , where {ai , a j }  c
             same class c
14           If attribute a i belongs to class ci and                                 psd ai  psd a j
                                                                  ARS (ai , a j ) 
             attribute a j belongs to class c j , ci                                      rsci  rsc j
             and c j relation is true then
15           If ci and c j relation is false                      ARS (ai , a j )  0 ,
16            ARS (ai , a j )  ARS (a j , ai )               Applicable in all cases such as both belongs to same class, both
                                                              belongs to different classes that are not having relation and both
                                                              belongs to different classes that are having relation
17           No of attribute pairs in an itemset                  pc

                                                                                                                  1346 | P a g e
M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
 Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue 5, September- October 2012, pp.1342-1351
18                                                   pc : is total number of pairs in a given itemset.
             pc  0;                                 n : is total number of attributes in the given itemset
             n 1
                                           n 1
              pc  i
             i 1
                            (or ) pc 
                                             2
                                                n
19           pcrlhs                                  Pair-count of itemset, which is lhs of rule r .
20           pcrrhs                                  Pair-count of itemset, which is rhs of rule r .
21           pcrlhs rrhs                            Pair-count of itemset that generated from rlhs rrhs
22           PS  { p , p , ......., p }             Pair set that generated from itemset i
               i        1 2           m
23           ARS ( pi )                              Attribute relation support of pair pi .
24                   | psi |                         Attribute relation support degree of itemset i
                        ARS ( pk )
                      k 1                           And pk  psi , here pk is kth pair of pair-set ps of itemset i
             ARSDi 
                             | psi |
25           RCr                                     Relation confidence of rule r .
                      |PSrlhs |                      Attribute relation support degree of itemset, which is lhs of
26
                          pk                        rule r
             ARSDr  k 1                            And pk  psrlhs , here pk is kth pair of pair-set ps of itemset
                  lhs |PS
                           rlhs |
                                                     lhs of rule r
                      |PSrrhs |                      Attribute relation support degree of itemset, which is rhs of
27
                           pk                       rule r
             ARSDr  k 1                            And pk  psrrhs , here pk is kth pair of pair-set ps of itemset
                  rhs   |PSr |
                            rhs
                                                     rhs of rule r
                               |PS         |         Attribute relation support degree of itemset that generated from
28                              rlhs rrhs
                                            pk      lhs  rhs of rule r
                                  k 1
             ARSDrlhs rrhs 
                              |PSrlhs rrhs |
                                                     And   pk  PS r             , here pk is kth pair of pair-set ps of
                                                                     lhs rrhs

                                                     itemset that generated from lhs  rhs of rule r
                      ARSDrlhs rrhs                 Relation confidence of r is the coefficient emerged as result
29           rcr                                    when 30attribute support degree of all attribute involved in rule
                        ARSDrlhs
                                                      r is divided by attribute support degree of rule r ‟s lhs

Property Support: No of attribute properties are           Relation confidence: is a rule level measurement
matched to number class properties to which that           concludes the relation strength between left hand side
attribute belongs to [Table 1 row: 8].                     itemset and right hand side itemset of a given rule[see
Property Support degree: indicates the ratio of            table 1 row: 29]
properties matched to class level properties [table 1      AORW algorithm:
row: 9].                                                   Input: Rule set RS , Class Descriptor CD and relation
                psa                                        confidence threshold rct
Ex: psd a          ; here a is an attribute of class                                             '
               cpcc                                        Output: Significant Rule set RS which is subset of
c [ a  c]                                                 RS RS  RS'

Attribute Relation support: Indicates the strength of      Begin:
the relation between two attributes of an itemset that     While RS is not empty
are considered as pair for equation [see table 1 row:      Begin:
11, 13].                                                   Read a rule r from RS
Pair Count: Total number of two attributes sets; here
these attribute sets must be unique [see table 1, row      Find property support psa and property support
17, 18]                                                    degree psd a of each attribute a of rlhs [Table 1
Attribute Relation support degree: is an itemset
                                                           row: 8, 9]
level measurement representing average relation
strength of the attributes those belongs to an itemset     Find property support psa and property support
[see table 1 row: 24]

                                                                                                      1347 | P a g e
M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
 Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue 5, September- October 2012, pp.1342-1351
degree psd a of each attribute a of rrhs [Table 1              mining techniques. 3) There is the involvement of an
                                                               enhanced occurrence and a probability reduction in
row: 8, 9]                                                     the memory exploitation rate with the aid of the trait
Find unique two attribute pair set PS        from              equivalent prognosis and also rim snipping of the
                                                  rlhs
                                                               PEPP with inference analysis and AORW. This is on
rlhs [Table 1 row: 22]                                         the basis of the surveillance done which concludes
Find Attribute relation support ARS p for each pair            that AORW implementation is far more noteworthy
                                                               and important in contrast with the likes of other
p , where p  PS rlhs [Table 1; row: 13, 15, 15 and            notable models [10, 12, 14].
16]                                                            JAVA 1.6_ 20th build was employed for
Find Attribute relation support degree ARSDrlhs of             accomplishment of the AORW along with PEPP
                                                               under inference analysis. A workstation equipped
rlhs [Table 1; row: 24, 26].                                   with core2duo processor, 2GB RAM and Windows
Find unique two attribute pair set PS r    from                XP installation was made use of for investigation of
                                       lhs                     the algorithms. The parallel replica was deployed to
rlhs [Table 1 row: 22]                                         attain the thread concept in JAVA.
Find Attribute relation support ARS p for each pair
                                                               Dataset Characteristics:
p , where   pPSrrhs [Table 1; row: 13, 15, 15 and 16]                  Pi is supposedly found to be a very opaque
Find Attribute relation support degree ARSDrrhs of             dataset, which assists in excavating enormous
                                                               quantity of recurring clogged series with a profitably
rlhs [Table 1; row: 24, 27].                                   high threshold somewhere close to 90%. It also has a
Find unique two attribute pair set PSr r from                 distinct element of being enclosed with 190 protein
                                      lhs rhs                  series and 21 divergent objects. Reviewing of
rlhs  rrhs [Table 1 row: 22]                                  serviceable legacy‟s consistency has been made use of
                                                               by this dataset. Fig. 3 portrays an image depicting
Find Attribute relation support ARS p for each pair            dataset series extent status.
p , where p  PS rlhs  rrhs [Table 1; row: 13, 15, 15                  In assessment with all the other regularly
                                                               quoted forms like [14,12,10], Post-Processing for
and 16]                                                        Rule Reduction Using Closed Set[14] has made its
Find Attribute relation support degree ARSDr                   mark as a most preferable, superior and sealed
                                                  lhs rrhs
                                                               example of post mining copy, taking in view the
of rlhs  rrhs [Table 1; row: 24, 28].                         detailed study of the factors mainly, experts
Find unique two attribute pair set PSr r from                 involvement, memory consumption and runtime.
                                       lhs rhs
rlhs  rrhs [Table 1 row: 22]
Find Attribute relation support ARS p for each pair
p , where   p  PS                [Table 1; row: 13, 15, 15
                     rrhs rlhs
and 16]
Find Attribute relation support degree ARSDr
                                                   rhs rlhs
of rlhs  rrhs [Table 1; row: 24, 28].
Find Relation confidence rcr of rule r
If rcr  rct then add rule r to resultant rule set RS
                                                           '

End
End
Fig 2: Attribute Ontological Relational Weights
algorithm

         This segment focuses mainly on providing
evidence on asserting the claimed assumptions that 1)          Fig 3: A comparison report for Runtime
The post mining framework AORW is competent
enough to momentously surpass results when
evaluated against other post mining techniques [14, 10,
12. 2) Utilization of memory and computational
complexity is less when compared to other post


                                                                                                     1348 | P a g e
M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
 Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                    Vol. 2, Issue 5, September- October 2012, pp.1342-1351




                                                           Fig 6: Rules pruned in in % by AORW and RRUC[14]

Fig: 4: A comparison report for memory usage               6. CONCLUSION
                                                                     We proposed a post mining process called
                                                           Attribute relation analysis framework (AORW) for
                                                           pruning association rules that are contextually
                                                           irrelevant. In earlier works [10, 12, 14] the contextual
                                                           irrelevancy was identified in various ways such as (1)
                                                           rule evaluation by domain expert, (2) rule evaluation
                                                           by itemset closeness. We argued that none of these
                                                           two models is significant in all contexts. In second
                                                           case adaptability to various data contexts is missing.
                                                           In the first case, rule selection highly influenced by
                                                           the experts view, that is when expert changes then
                                                           rule significance might be rated differently. To defend
                                                           these limits here we proposed a post mining process
Fig 5: Sequence length and number of sequences at          as an extension to our earlier proposed closed itemset
different thresholds in Pi dataset                         mining algorithm PEPP with inference analysis [34,
                                                           35]. Here in this proposed post mining process
         In contrast to AORW and RRUC [14], a very         AORW, the experts view is not defending one to other,
intense dataset Pi is used which has petite recurrent      rather it extends or refines. This become possible in
closed series whose end to end distance is less than 10,   AORW because of the proposed concept called
even in the instance of high support amounting to          attribute class relation descriptor. In this work we
around 90%. The diagrammatic representation                consider relation confidence as bench mark for rule
displayed in Fig 3 explains that the above mentioned       pruning; in future this work can be extended to prune
two algorithms execute in a similar fashion in case of     the rules by opting relation confidence threshold
rules that are generated at support 90% and above.         under inference analysis.
But in situations when the support case is 88% and
less, then the act of AORW surpasses RRUC [14]‟s           REFERENCES
routine. The disparity in memory exploitation of             [1]    Fienberg, S. E. & Shmueli, G: "Statistical
AORW and RRUC [14] can be clearly observed                          issues and challenges associated with rapid
because of the consumption level of AORW being                      detection of bio-terrorist attacks. Statistics in
low than that of RRUC. Fig 5 indicates that rules that              Medicine, 2005
are contextually irrelevant have been pruned by              [2]    Carmona-Saez,       P.,     Chagoyen,        M.,
AORW in high probable rate and stable. Apart from                   Rodriguez, A., Trelles, O., Carazo, J. M., &
the benefits observed, the rules identified by AORW                 Pascual-Montano, A: :Integrated analysis of
are more relevant to transaction consequences. It                   gene expression by association rules
becomes possible since there is no task of experts                  discovery". BMC Bioinformatics, 2006
evaluation in post mining process. Due to the concept        [3]    Ronaldo Cristiano Prati, "QROC: A
of attribute class relation descriptor, the relation                Variation of ROC Space to Analyze Item Set
between attributes involved in rule is stable.                      Costs/Benefits in Association Rules", Post-
                                                                    Mining of Association Rules: Techniques for
                                                                    Effective Knowledge Extraction, IGIGlobal,
                                                                    2009 Pages: 133-148 pp.
                                                             [4]    McNicholas, P. D: "Association rule analysis
                                                                    of CAO data (with discussion). Journal of the
                                                                    Statistical and Social Inquiry Society of
                                                                    Ireland, 2007


                                                                                                   1349 | P a g e
M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                   Vol. 2, Issue 5, September- October 2012, pp.1342-1351
[5]    Prati, R., & Flach, P. "ROCCER: an                      Knowledge". ACM SIGKDD Conference on
       algorithm for rule learning based on ROC                Knowledge Discovery in Databases, 2004
       analysis".    Proceeding of the 19th             [16]   Jaroszewicz, S., & Scheffer, T: "Fast
       International Joint Conference on Artificial            Discovery of Unexpected Patterns in Data,
       Intelligence, 2005                                      Relative to a Bayesian Network". ACM
[6]    Fawcett, T: "PRIE: A system to generate                 SIGKDD Conference on Knowledge
       rulelists to maximize ROC performance".                 Discovery in Databases, 2005
       Data Mining and Knowledge Discovery              [17]   Faure, C., Delprat, D., Boulicaut, J. F., &
       Journal, 2008, 17(2), 207-224.                          Mille, A: "Iterative Bayesian Network
[7]    Kawano, H., & Kawahara, M: "Extended                    Implementation      by using       Annotated
       Association Algorithm Based on ROC                      Association Rules." 15th Int‟l Conf. on
       Analysis for Visual Information Navigator".             Knowledge Engineering and Knowledge
       Lecture Notes In Computer Science, 2002,                Management – Managing Knowledge in a
       2281, 640-649.                                          World of Networks, 2006
[8]    Piatetsky-Shapiro, G., Piatetsky-Shapiro, G.,    [18]   Liu, B., & Hsu, W: "Post-Analysis of
       Frawley, W. J., Brin, S., Motwani, R.,                  Learned Rules". AAAI/IAAI, 1996
       Ullman, J. D., : "Discovery, Analysis, and       [19]   Lent, B., Swami, A. N., & Widom, J:
       Presentation of Strong Rules". Proceedings              "Clustering Association Rules". Proceedings
       of the 11th international symposium on                  of the Thirteenth International Conference on
       Applied Stochastic Models and Data                      Data Engineering, 1997, Birmingham U.K.,
       Analysis ASMDA, 2005, 16, 191-200                       IEEE Computer Society
[9]    Liu, B., Hsu, W., & Ma, Y: "Pruning and          [20]   Savasere, A., Omiecinski, E., & Navathe, S.
       summarizing the discovered associations".               B: "Mining for strong negative associations
       Proceedings of the fifth ACM SIGKDD                     in a large database of customer transactions".
       international conference on Knowledge                   Proceedings of the 14th International
       discovery and data mining 1999, (pp. 125-               Conference       on     Data     Engineering,
       134)                                                    Washington DC, USA, 1998
[10]   Hacene Cherfi, Amedeo Napoli, Yannick            [21]   Basu, S., Mooney, R. J., Pasupuleti, K. V., &
       Toussaint: "A Conformity Measure Using                  Ghosh J: "Evaluating the Novelty of Text-
       Background Knowledge for Association                    Mined Rules using Lexical Knowledge". 7th
       Rules: Application to Text Mining", Post-               ACM SIGKDD International Conference on
       Mining of Association Rules: Techniques for             Knowledge Discovery in Databases, 2001
       Effective Knowledge Extraction, IGIGlobal,       [22]   Claudia Marinica and Fabrice Guillet:
       2009 Pages: 100-115 pp.                                 "Knowledge-Based Interactive Postmining of
[11]   Liu, B., Hsu, W., & Ma, Y.: "Pruning and                Association Rules Using Ontologies", IEEE
       summarizing the discovered associations".               TRANSACTIONS ON KNOWLEDGE
       Proceedings of the fifth ACM SIGKDD                     AND DATA ENGINEERING, VOL. 22,
       international conference on Knowledge                   NO. 6, JUNE 2010
       discovery and data mining, 1999                  [23]   M.J. Zaki and C.J. Hsiao, “Charm: An
[12]   Hetal Thakkar, Barzan Mozafari, Carlo                   Efficient Algorithm for Closed Itemset
       Zaniolo: "Continuous Post-Mining of                     Mining,” Proc. Second SIAM Int‟l Conf.
       Association Rules in a Data Stream                      Data Mining, pp. 34-43, 2002.
       Management System", Post-Mining of               [24]   J. Pei, J. Han, and R. Mao, “Closet: An
       Association Rules: Techniques for Effective             Efficient Algorithm for Mining Frequent
       Knowledge Extraction, IGIGlobal, 2009                   Closed Itemsets,” Proc. ACM SIGMOD
       Pages: 116-132 pp.                                      Workshop Research Issues in Data Mining
[13]   Kuntz, P., Guillet, F., Lehn, R., & Briand, H:          and Knowledge Discovery, pp. 21-30, 2000.
       "A User-Driven Process for Mining                [25]   M.J. Zaki and M. Ogihara, “Theoretical
       Association Rules". In D. Zighed, H.                    Foundations of Association Rules,” Proc.
       Komorowski, & J. Zytkow (Eds.), 4th Eur.                Workshop Research Issues in Data Mining
       Conf. on Principles of Data Mining and                  and Knowledge Discovery (DMKD ‟98), pp.
       Knowledge Discovery 2000                                1-8, June 1998.
[14]   Huawen Liu, Jigui Sun, Huijie Zhang: "Post-      [26]   D. Burdick, M. Calimlim, J. Flannick, J.
       Processing for Rule Reduction Using Closed              Gehrke, and T. Yiu, “Mafia: A Maximal
       Set", Post-Mining of Association Rules:                 Frequent Itemsgorithm,” IEEE Trans.
       Techniques for Effective Knowledge                      Knowledge and Data Eng., vol. 17, no. 11,
       Extraction, IGIGlobal,2009 Pages: 81-99 pp.             pp. 1490-1504, Nov. 2005.
[15]   Jaroszewicz, S., & Simovici, D. A:               [27]   J. Li, “On Optimal Rule Discovery,” IEEE
       "Interestingness of Frequent Itemsets using             Trans. Knowledge and Data Eng., vol. 18,
       Bayesian      networks      as     Background           no. 4, pp. 460-471, Apr. 2006.

                                                                                            1350 | P a g e
M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International
Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                   Vol. 2, Issue 5, September- October 2012, pp.1342-1351
 [28] M. Hahsler, C. Buchta, and K. Hornik, SHORT BIO DATA FOR THE AUTHOR
       “Selective Association Rule Generation,”
       Computational Statistic, vol. 23, no. 2, pp.     Ms. M. Saritha working as Assistant Professor in the
       303-315, Kluwer Academic Publishers,             Department of Computer Science and Engineering.
       2008.                                            Guru Nanak Institutions of Technical Campus with a
[29]   H. Toivonen, M. Klemettinen, P. Ronkainen,       teaching experience of 1.5 years. She is a
       K. Hatonen, and H. Mannila, “Pruning and         B.Tech(ECE), and M.Tech in Computer science (PC)
       Grouping of Discovered Association Rules,”       and Her areas of interest includes Mobile Computing,
       Proc. ECML-95 Workshop Statistics,               DataMining and Information Security.
       Machine       Learning,      and    Knowledge
       Discovery in Databases, pp. 47-52, 1995.         Professor.ThaduriVenkataRamana received the
[30]   J. Bayardo, J. Roberto, and R. Agrawal,          Master Degree in Computer Science and Engineering
       “Mining the Most Interesting Rules,” Proc.       from Osmania University, Hyderabad. Pursuing Ph.D
       ACM SIGKDD, pp. 145-154, 1999.
                                                        in Computer Science and Engineering from
[31]   M. Klemettinen, H. Mannila, P. Ronkainen,
       H. Toivonen, and A.I. Verkamo, “Finding          Jawaharlal     Nehru     Technological     University,
       Interesting Rules from Large Sets of             Hyderabad (JNTUH), Hyderabad. . His area of
       Discovered Association Rules,” Proc. Int‟l       interest in research is Information Retrieval & Data
       Conf.      Information       and    Knowledge    mining. Presently working as a Professor & Head,
       Management (CIKM), pp. 401-407, 1994.            Department of Computer Science and Engineering,
[32]   R. Srikant and R. Agrawal, “Mining               SLC‟s Institute of Engineering and Technology,
       Generalized Association Rules,” Proc. 21st
       Int‟l Conf. Very Large Databases, pp. 407-       Hyderabad.
       419,
       http://citeseer.ist.psu.edu/srikant95mining.ht   Ms. Kareemunnisa working as Assistant Professor in
       ml, 1995.                                        the Department of Computer Science and
[33]   Barzan Mozafari, Hetal Thakkar, Carlo            Engineering. Guru Nanak Institutions of Technical
       Zaniolo. Verifying and Mining Frequent           Campus with a teaching experience of 6 years. She is
       Patterns from Large Windows over Data            a B.Tech(IT), and M.Tech in Computer scienceand
       Streams. In Proceedings of the 24th              Engineering (CSE) and Her areas of interest includes
       International       Conference      on    Data   Mobile Computing, Data Mining and Information
       Engineering (ICDE 2008), Cancún, México,         Security.
       April 7-12, 2008
                                                        Ms. Y. Sowjanya received Bachelor Degree in
[34]   Parallel Edge Projection and Pruning (PEPP)
                                                        Information Technology from Jawaharlal Nehru
       Based Sequence Graph protrude approach for
                                                        Technological University, Hyderabad (JNTUH) and
       Closed Itemset Mining kalli Srinivasa
                                                        pursuing Master Degree in Computer Science and
       Nageswara Prasad, Sri Venkateswara
                                                        Engineering (CSE) form Jawaharlal Nehru
       University, Tirupati, Andhra Pradesh, India.
                                                        Technological University, Hyderabad (JNTUH). Her
       Prof. S. Ramakrishna, Department of
                                                        research interest includes Computer Networks and
       Computer Science, Sri Venkateswara
                                                        Information Retrieval Systems, Data Mining.
       University, Tirupati, Andhra Pradesh, India
       ol. 9 No. 9 September 2011 International
                                                        Ms. M. Nishantha working as Assistant Professor in
       Journal of Computer Science and
                                                        the Department of Computer Science and
       Information Security Publication September
                                                        Engineering. Guru Nanak Institutions of Technical
       2011, Volume 9 No. 9
                                                        Campus with a teaching experience of 6 years. She is
[35]   Mining Closed Itemsets for Coherent Rules:
                                                        a B.Tech(IT), and M.Tech in Computer scienceand
       An Inference Analysis Approach By Kalli
                                                        Engineering (IT) and Her areas of interest includes
       Srinivasa Nageswara Prasad, Prof. S.
                                                        Mobile Computing, DataMining and Information
       Ramakrishna, Global Journal of Computer
                                                        Security.
       Science and Technology Volume 11 Issue 19
       Version 1.0 November 2011 Type: Double
       Blind Peer Reviewed International Research
       Journal Publisher: Global Journals Inc.
       (USA) Online ISSN: 0975-4172 & Print
       ISSN: 0975-4350




                                                                                              1351 | P a g e

Contenu connexe

Tendances

Optimization of Mining Association Rule from XML Documents
Optimization of Mining Association Rule from XML DocumentsOptimization of Mining Association Rule from XML Documents
Optimization of Mining Association Rule from XML DocumentsIOSR Journals
 
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and PrivacyA Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and Privacyijsrd.com
 
A novel association rule mining and clustering based hybrid method for music ...
A novel association rule mining and clustering based hybrid method for music ...A novel association rule mining and clustering based hybrid method for music ...
A novel association rule mining and clustering based hybrid method for music ...eSAT Publishing House
 
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...IJCSIS Research Publications
 
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...ijsrd.com
 
Ec3212561262
Ec3212561262Ec3212561262
Ec3212561262IJMER
 
A comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detectionA comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detectioncsandit
 
Linking Behavioral Patterns to Personal Attributes through Data Re-Mining
Linking Behavioral Patterns to Personal Attributes through Data Re-MiningLinking Behavioral Patterns to Personal Attributes through Data Re-Mining
Linking Behavioral Patterns to Personal Attributes through Data Re-Miningertekg
 
Re-Mining Item Associations: Methodology and a Case Study in Apparel Retailing
Re-Mining Item Associations: Methodology and a Case Study in Apparel RetailingRe-Mining Item Associations: Methodology and a Case Study in Apparel Retailing
Re-Mining Item Associations: Methodology and a Case Study in Apparel Retailingertekg
 
A Review of Various Clustering Techniques
A Review of Various Clustering TechniquesA Review of Various Clustering Techniques
A Review of Various Clustering TechniquesIJEACS
 
Data Mining For Supermarket Sale Analysis Using Association Rule
Data Mining For Supermarket Sale Analysis Using Association RuleData Mining For Supermarket Sale Analysis Using Association Rule
Data Mining For Supermarket Sale Analysis Using Association Ruleijtsrd
 
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
Usage and Research Challenges in the Area of Frequent Pattern in Data MiningUsage and Research Challenges in the Area of Frequent Pattern in Data Mining
Usage and Research Challenges in the Area of Frequent Pattern in Data MiningIOSR Journals
 
A PROCESS OF LINK MINING
A PROCESS OF LINK MININGA PROCESS OF LINK MINING
A PROCESS OF LINK MININGcsandit
 

Tendances (13)

Optimization of Mining Association Rule from XML Documents
Optimization of Mining Association Rule from XML DocumentsOptimization of Mining Association Rule from XML Documents
Optimization of Mining Association Rule from XML Documents
 
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and PrivacyA Rule based Slicing Approach to Achieve Data Publishing and Privacy
A Rule based Slicing Approach to Achieve Data Publishing and Privacy
 
A novel association rule mining and clustering based hybrid method for music ...
A novel association rule mining and clustering based hybrid method for music ...A novel association rule mining and clustering based hybrid method for music ...
A novel association rule mining and clustering based hybrid method for music ...
 
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...
Privacy Preserving Distributed Association Rule Mining Algorithm for Vertical...
 
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...
Multiple Minimum Support Implementations with Dynamic Matrix Apriori Algorith...
 
Ec3212561262
Ec3212561262Ec3212561262
Ec3212561262
 
A comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detectionA comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detection
 
Linking Behavioral Patterns to Personal Attributes through Data Re-Mining
Linking Behavioral Patterns to Personal Attributes through Data Re-MiningLinking Behavioral Patterns to Personal Attributes through Data Re-Mining
Linking Behavioral Patterns to Personal Attributes through Data Re-Mining
 
Re-Mining Item Associations: Methodology and a Case Study in Apparel Retailing
Re-Mining Item Associations: Methodology and a Case Study in Apparel RetailingRe-Mining Item Associations: Methodology and a Case Study in Apparel Retailing
Re-Mining Item Associations: Methodology and a Case Study in Apparel Retailing
 
A Review of Various Clustering Techniques
A Review of Various Clustering TechniquesA Review of Various Clustering Techniques
A Review of Various Clustering Techniques
 
Data Mining For Supermarket Sale Analysis Using Association Rule
Data Mining For Supermarket Sale Analysis Using Association RuleData Mining For Supermarket Sale Analysis Using Association Rule
Data Mining For Supermarket Sale Analysis Using Association Rule
 
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
Usage and Research Challenges in the Area of Frequent Pattern in Data MiningUsage and Research Challenges in the Area of Frequent Pattern in Data Mining
Usage and Research Challenges in the Area of Frequent Pattern in Data Mining
 
A PROCESS OF LINK MINING
A PROCESS OF LINK MININGA PROCESS OF LINK MINING
A PROCESS OF LINK MINING
 

En vedette (20)

Ho2513601370
Ho2513601370Ho2513601370
Ho2513601370
 
Ew25914917
Ew25914917Ew25914917
Ew25914917
 
Gk2511681173
Gk2511681173Gk2511681173
Gk2511681173
 
Hi2513201329
Hi2513201329Hi2513201329
Hi2513201329
 
Gz2512721277
Gz2512721277Gz2512721277
Gz2512721277
 
Fe25959964
Fe25959964Fe25959964
Fe25959964
 
Hb2512851289
Hb2512851289Hb2512851289
Hb2512851289
 
Gg2511351142
Gg2511351142Gg2511351142
Gg2511351142
 
Fl2510031009
Fl2510031009Fl2510031009
Fl2510031009
 
He2513001307
He2513001307He2513001307
He2513001307
 
Fw2510721076
Fw2510721076Fw2510721076
Fw2510721076
 
Ge2511241129
Ge2511241129Ge2511241129
Ge2511241129
 
Ft2510561062
Ft2510561062Ft2510561062
Ft2510561062
 
Fo2510211029
Fo2510211029Fo2510211029
Fo2510211029
 
Ga2510971106
Ga2510971106Ga2510971106
Ga2510971106
 
Memoria abstracta feudos y comercio
Memoria abstracta feudos y comercioMemoria abstracta feudos y comercio
Memoria abstracta feudos y comercio
 
Social Media 2010-05
Social Media 2010-05Social Media 2010-05
Social Media 2010-05
 
TCE multa Ricardo Teobaldo
TCE multa Ricardo TeobaldoTCE multa Ricardo Teobaldo
TCE multa Ricardo Teobaldo
 
I.E. Suan
I.E. Suan I.E. Suan
I.E. Suan
 
la vida
la vidala vida
la vida
 

Similaire à Hl2513421351

An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in  DatabasesAn Effective Heuristic Approach for Hiding Sensitive Patterns in  Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in DatabasesIOSR Journals
 
A threshold free implication rule mining
A threshold free implication rule miningA threshold free implication rule mining
A threshold free implication rule miningAjay Desai
 
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...Re-Mining Association Mining Results Through Visualization, Data Envelopment ...
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...ertekg
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
 
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining IJDKP
 
A Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining MethodologiesA Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining MethodologiesIOSR Journals
 
Re-Mining Association Mining Results through Visualization, Data Envelopment ...
Re-Mining Association Mining Results through Visualization, Data Envelopment ...Re-Mining Association Mining Results through Visualization, Data Envelopment ...
Re-Mining Association Mining Results through Visualization, Data Envelopment ...Gurdal Ertek
 
MINING TRIADIC ASSOCIATION RULES
MINING TRIADIC ASSOCIATION RULESMINING TRIADIC ASSOCIATION RULES
MINING TRIADIC ASSOCIATION RULEScsandit
 
A genetic based research framework 3
A genetic based research framework 3A genetic based research framework 3
A genetic based research framework 3prj_publication
 
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...Editor IJMTER
 
Paper id 212014126
Paper id 212014126Paper id 212014126
Paper id 212014126IJRAT
 

Similaire à Hl2513421351 (20)

An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in  DatabasesAn Effective Heuristic Approach for Hiding Sensitive Patterns in  Databases
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
 
I43055257
I43055257I43055257
I43055257
 
Dy33753757
Dy33753757Dy33753757
Dy33753757
 
Dy33753757
Dy33753757Dy33753757
Dy33753757
 
A threshold free implication rule mining
A threshold free implication rule miningA threshold free implication rule mining
A threshold free implication rule mining
 
K355662
K355662K355662
K355662
 
K355662
K355662K355662
K355662
 
I1802055259
I1802055259I1802055259
I1802055259
 
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...Re-Mining Association Mining Results Through Visualization, Data Envelopment ...
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOT
 
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
 
Ae32208215
Ae32208215Ae32208215
Ae32208215
 
A Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining MethodologiesA Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining Methodologies
 
Cr4201623627
Cr4201623627Cr4201623627
Cr4201623627
 
M033059064
M033059064M033059064
M033059064
 
Re-Mining Association Mining Results through Visualization, Data Envelopment ...
Re-Mining Association Mining Results through Visualization, Data Envelopment ...Re-Mining Association Mining Results through Visualization, Data Envelopment ...
Re-Mining Association Mining Results through Visualization, Data Envelopment ...
 
MINING TRIADIC ASSOCIATION RULES
MINING TRIADIC ASSOCIATION RULESMINING TRIADIC ASSOCIATION RULES
MINING TRIADIC ASSOCIATION RULES
 
A genetic based research framework 3
A genetic based research framework 3A genetic based research framework 3
A genetic based research framework 3
 
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
 
Paper id 212014126
Paper id 212014126Paper id 212014126
Paper id 212014126
 

Hl2513421351

  • 1. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.1342-1351 Attribute Ontological Relational Weights (AORW) For Coherent Association Rules: A Post Mining Process For Association Rules M.Saritha1, T.Venkata Ramana2,Kareemunnisa3, Y.Sowjanya4, M.Nishantha5 1 (Asst. Professor, Department of Computer Science, GuruNanak Institutions of Technical Campus, India) 2 (Professor, Head of the Department of Computer Science, SLC‟s IET, India) 3, 5 (Asst. Professors, Department of Information Technology, GuruNanak Institutions of Technical Campus, India) 4 (M.tech student, Department of Computer Science and Engg, SLC‟s IET, India) ABSTRACT "Coding of data as binary if data is not binary” -> Association rules present one of the most “Rule generation” -> “Post-mining". This survey impressive techniques for the analysis of focused on post mining. It was a century after the attribute associations in a given dataset related to applications related to retail, bioinformatics, and introduction of association rules (associations sociology. In the area of data mining, the initially discussed in 1902), it is still continuing that importance of the rule management in the absence of items from transactions is often associating rule mining is rapidly growing. ignored in analyses. Usually, if datasets are large, the induced rules are large in volume. The density of the rule A. Association rule mining volume leads to the obtained knowledge hard to Given a set I that is non-empty, a rule of be understood and analyze. One better way of association is a statement of the form X ⇒ Y, where minimizing the rule set size is eliminating X, Y ⊂ I such that X ≠ ∅, Y ≠ ∅, and X ∩ Y = ∅. redundant rules from rule base. Many efforts The dataset X is called the antecedent of the rule, have been made and various competent and the set Y is called the consequent of the rule, and excellent algorithms have been proposed. But all we shall call I the master itemset. Association rules of these models relying either on closed itemset are generated over a large set of transactions, mining or expert’s evaluation. None of these denoted by T. Xn association rule can be deemed models are proven best in all data set contexts. interesting if the items involved occur together often Closed itemset model is missing adaptability and and there are suggestions that one of the sets might expert’s evaluation process is resulting different in some cases lead to the presence of the other set. X significance for same rule under different association rules are characterised as interesting, or expert’s view. To overcome these limits here we not, based on mathematical notions called „support‟, proposed a post mining process called AORW as „confidence‟ and „lift‟. Although there are now a an extension to closed itemset mining algorithms. multitude of measures of interestingness available to the analyst, many of them are still based on these Keywords: post mining, association rule mining, three functions. closed itemset, PEPP, Inference analysis, rule In many applications, it is not only the pruning. presence of items in the antecedent and the consequent parts of an association rule that may be 1. INTRODUCTION of interest. Consideration, in many cases, should be In general, association rules tend to deliver given to the relationship between the absence of an efficient method of analysing binary or items from the antecedent part and the presence or discredited data sets that are large in volume. One absence of items from the consequent part. Further, common practice is to determine relationships the presence of items in the antecedent part can be between binary variables in transaction databases, related to the absence of items from the consequent which is known as „market basket analyses. In the part; for example, a rule such as {margarine} ⇒ case of non-binary data, initially data being coded as {not butter}, which might be referred to as a binary and then association rules will be used to „replacement rule‟. One way to incorporate the analyse. Association rules having their impact on absence of items into the association rule mining analysing large binary datasets and considered as paradigm is to consider rules of the form X⇒/ Y versatile approach for modern applications such as [20]. Another is to think in terms of negations. detection of bio-terrorist attacks [1] and the analysis Suppose X ⊂ I, then write ¬X to denote the absence of gene expression data [2], to the analysis of Irish or negation of the item, or items, in X from a third level education applications [4]. The steps transaction. Considering X as a binary {0, 1} involved in a typical association rule analysis are variable, the presence of the items in X is equivalent 1342 | P a g e
  • 2. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.1342-1351 to X = 1, while the negation ¬X is equivalent to popular association rules though the interest of XX`€=X`€0. The concept of considering association frequent sets goes further. rules involving negations, or “negative Based on the proposals [14, 12, 10, 3] implications”, is due to Silverstein [20]. recently cited in literature and their motivations, it is observable that the process of rule pruning will opt to one of the two models. B. Post mining (1) Rule pruning under post mining process Pruning rules and detection of rule that demands domain experts observation interestingness are employed in the post-mining (2) Rule pruning under post mining process stage of the association rule mining paradigm. that aims to avoid domain expert‟s role in pruning However, there are a host of other techniques used process. in the post-mining stage that do not naturally fall As in the first case rule pruning accuracy under either of these headings. Some such depends on domain expert‟s awareness on attribute techniques are in the area of redundancy- removal. relations. In this case it is always obvious to prune There are often a huge number of association rules the rules under reliable domain expert‟s observation. to contend with in the post-mining stage and it can In second case the models prune the rules based on be very difficult for the user to identify which ones dynamically determined attribute relations. This are interesting. Therefore, it is important to remove limits the solution to specific data models. Hence it insignificant rules, prune redundancy and do further is not adaptable for all contexts. post-mining on the discovered rules [18, 11]. Liu The Rest of the paper organized as; in [11] proposed a technique for pruning and section II we discussed the most frequently cited summarising discovered association rules by first post mining models to improve rule accuracy. removing those insignificant associations and then Section III briefs the post mining process that we forming direction-setting rules, which provide a sort opted. Section IV briefs the approach of closed of summary of the discovered association rules. itemset mining, and inference approach for itemset Lent [19] proposed clustering the discovered pruning. Section V explores the process of Attribute association rules. Relational weights analysis for rule pruning. Results discussion and comparative study will be in Section C. Influences of Input formats VI that fallowed by conclusion and references. The input formats that influence the post mining methodologies are binary data, text data and 2. RELATED WORK streaming data. In association rule mining, much of Huawen Liu, [14] proposed post processing recent research work has focused on the difficult approach for rule reduction using closed set to filter problem of mining data streams such as click stream superfluous rules from knowledge base in a post- analysis, intrusion detection, and web-purchase processing manner that can be well discovered by a recommendation systems. In the case of streaming closed set mining technique. Most of these methods data, it is not possible to perform mining on cached are based on the rule structure analysis where the and fixed data records. The attempt of caching data relations between the rules have been analysed using leads to memory usage issues, and the attempt of corresponding problem. This procedure claims to mining static data leads to worst time complexity eliminate noise and redundant rules in order to since continuous dataset update leads to continuous provide users with compact and precise knowledge passes through dataset. From the data mining point derived from databases by data mining method. of view, texts are complex data giving raise to Further, a fast rule reduction algorithm using closed interesting challenges. First, texts may be set is introduced. Other endeavours have been considered as weakly structured, compared with attempted to prune the rule bases directly. The databases that rely on a predefined schema. typical cases have been elaborated and illustrated by Moreover, texts are written in natural language, eminent people from all over. carrying out implicit knowledge, and ambiguities. Modest number of proposals is addressed Hence, the representation of the content of a text is on pre-pruning and post-pruning. In a line, the often only partial and possibly noisy. One solution pruning operation occurs at the phase of generation for handling a text or a collection of texts in a of significant rules. To add to this, the post pruning satisfying way is to take advantage of a knowledge technique mainly concerns primarily emphasises model of the domain of the texts, for guiding the that pruning operation occurs after the rule extraction of knowledge units from the texts. One of generation, among which the rule cover is a the obvious hot topics of data mining research in the representative case. To extract interesting rules, last few years has been rule discovery from binary aprori knowledge has also been taken into account data. It concerns the discovery of set of attributes in literatures and a template denotes which attributes from large binary records such that these attributes should occur in the antecedent or the consequent are true within a same line often enough. It is then part of the rule. easy to derive rules that describe the data, the 1343 | P a g e
  • 3. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.1342-1351 An association rule is an implication processing of rule mining. The empirical study expression illustrates a kind of association relation. proved that the discovery of dependent relations A rule is said to be interesting or valid if its support from closed set helps to eliminate redundant rules. and confidence are user specified minimum support Hetal Thakkar [12]. In the case of stream data, the and confidence thresholds respectively. The post-mining of association is more challenging and association rule primarily comprises two phases continuous post mining of association rules is an based on the identification of frequent item sets unavoidable requirement, which is discussed by this from the given data mining contexts. However, the author. He presented a technique for continuous problem of massive real world data transactions can post-mining of association rules in a data stream be rounded off by adopting other alternatives, which management system. He described the architecture in turn benefits in lossless representation of data. and techniques used to achieve this advanced Theoretically, transaction database and relation functionality in the Stream Mill Miner (SMM) database are two different inter-transformable prototype, an SQL-based DSMS designed to support representations of data. continuous mining queries, which is impressive. The production of association mining is a Hacene Cherfi [10] discussed a post association rule rule base with hundreds to millions of rules. In order mining approach for text mining that combines data to highlight the important and key ones, certain mining, semantic techniques for post-mining and other rules are proposed which are Second –order selection of association rules. To focus on the result rule which states that if the cover of the item set is analysis and to find new knowledge units, known, then the corresponding relation can be easily classification of association rules according to derived. All the technical definitions given hence qualitative criteria using domain model as forth deal with the transaction of the data through background knowledge has been introduced. The item-sets in association mining. The equivalent authors carried out an empirical study on molecular property significantly states that rules and classes of biology dataset that proved the benefits of taking the same hierarchical database support the power in into account a knowledge domain model of the data. the content. Traditional data mining techniques are Ronaldo Cristiano Prati [3]. The Receiver Operating implemented in order to justify the property and its Characteristics (ROC) graph is a popular way of corresponding definition in the specified context. assessing the performance of classification rules, but The thus identified second order rules can be used to they are inappropriate to evaluate the quality of filter out useless rules out of the priority rule-set. association rules, as there is no class in association The effectiveness and efficiency of the rule mining and the consequent part of different classical methods in plating the rules is thoroughly association rules might not have any correlation at verified under the 2.8 GHZ Pentium PC. Two group all. Chapter VIII presents a novel technique of experiments were conducted to prune the QROC, a variation of ROC space to analyze itemset insignificant association rules and to remove useless costs/benefits in association rules. It can be used to association classification rules. Removal of non- help analysts to evaluate the relative interestingness predictive rules by virtue of information gain metric among different association rules in different cost is much similar to CHARM and CBA which also scenarios. work on the same track. To generate association and classification rules by pruning method of Apriori 3. ATTRIBUTE ONTOLOGICAL software, some external tools are essentially RELATIONAL WEIGHTS FOR required. The effectiveness of the pruning algorithm COHERENT ASSOCIATION RULES can be inversely related to the number of rules. This The approach Attribute Ontological along with the computational time consumed, Relational Weights in short can refer as AORW is determine an efficient criterion of pruning. post mining process to prune the rules based on Efficient post processing methods are attribute relational relevancy. The process of hence proposed to remove pointless rules from rule- AORW Framework can be classified as ways by eliminating redundancy among rules. The  Closed itemset mining dependent relation, exploitation, makes this method  Describing item class descriptor a self manageable knowledge. The pruning The input for AORW Framework is procedure has been sliced into three stages starting 1. A set of rules from derivation to pruning operation on rule-set by 2. An XML descriptor describes attributes, classes, the use of close rule-sets. It is cost-effective and class properties and class relations. consumes very little time for the transaction. Hence Here in this proposal we considered our earlier work forth, it can be applied to exploit sampling to find closed itemsets. techniques and data structures, thereby increasing The process steps involved in AORW framework the efficiency. are Huawen Liu, [14] presented a technique on 1. Initially AORW measures the property support post-processing for rule reduction using closed set degree for each attribute involved in given rule. that was targeting to filter the otiose rules in a post- 1344 | P a g e
  • 4. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.1342-1351 2. By using the property support degree of the fts ( st )|st :s p ( for each p 1..|DBS |)| attributes, Attribute Relation support of attribute pairs of the given rule will be measured. DBS Is set of sequences 3. With the help of Attribute Relation supports of all attribute pairs of an itemset that belongs to a fts ( st ) : Represents the total support „ts‟ of given rule, Attribute relation support degree of sequence st is the number of super sequences of st that itemset will be measured. Qualified support „qs‟: The resultant coefficient of 4. Using these Attribute relation support degrees of total support divides by size of sequence database Left Hand Side and Right Hand Side itemsets of adopt as qualified support „qs‟. Qualified support the given rule, relation confidence of the rule will can be found by using fallowing formulation. be determined. f (s ) 5. Prunes the rules based on their attribute relation f qs ( st )  ts t |DBS | support degree. Detailed explanation of each step can be found in Sub-sequence and Super-sequence: A Section V. sequence is sub sequence for its next projected sequence if both sequences having same total 4. CLOSED ITEMSET MINING [34] support. A. Dataset adoption and formulation Super-sequence: A sequence is a super Item Sets I: A set of diverse elements by sequence for a sequence from which that projected, which the sequences generate. if both having same total support. Sub-sequence and super-sequence can be formulated n I   ik as k 1 Note: „I‟ is set of diverse elements If f ts ( st )  rs where „rs‟ is required support Sequence set „S‟: A set of sequences, where each threshold given by user sequence contains elements each element „e‟ And st :s p for any p value where fts ( st ) fts ( s p ) belongs to „I‟ and true for a function p(e). Sequence set can formulate as B. Closed Itemset Discovery[34] m As a first stage of the proposal we perform s   ei |( p (ei ),eiI ) dataset pre-processing and itemsets Database i 1 initialization. We find itemsets with single element, Represents a sequence„s‟ of items those belongs to in parallel prunes itemsets with single element those set of distinct items „I‟. contains total support less than required support. „m‟: total ordered items. Forward Edge Projection: P(ei): a transaction, where ei usage is true for that In this phase, we select all itemsets from transaction. given itemset database as input in parallel. Then we t start projecting edges from each selected itemset to S  s j all possible elements. The first iteration includes the j 1 pruning process in parallel, from second iteration S: represents set of sequences onwards this pruning is not required, which we „t‟: represents total number of sequences and its claimed as an efficient process compared to other value is volatile similar techniques like BIDE. In first iteration, we sj: is a sequence that belongs to S project an itemset s p that spawned from selected Subsequence: a sequence s p of sequence set „S‟ is considered as subsequence of another sequence itemset si from DBS and an element ei considered sq from „I‟. If the f ts ( s p ) is greater or equal to rs , of Sequence Set „S‟ if all items in sequence Sp is belongs to sq as an ordered list. This can be then an edge will be defined between si and ei . If formulated as f ts ( si )  f ts ( s p ) then we prune si from DBS . n If (  s pisq )( s p  sq ) This pruning process required and limited to first i 1 iteration only. n m s pS and sqS From second iteration onwards project the itemset Then  s :  s pi qj i 1 j 1 S p that spawned from S p ' to each element ei of „I‟. where Total Support „ts‟ : occurrence count of a sequence An edge can be defined between S p ' and ei if as an ordered list in all sequences in sequence set „S‟ can adopt as total support „ts‟ of that sequence. fts ( s p ) is greater or equal to rs . In this description Total support „ts‟ of a sequence can determine by S p ' is a projected itemset in previous iteration and fallowing formulation. 1345 | P a g e
  • 5. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.1342-1351 eligible as a sequence. Then apply the fallowing Pattern actual coverage is pattern positive score- validation to find closed sequence. pattern negative score Edge pruning : Interest gain: Actual coverage of the If any of f ts ( s p ' )  f ts ( s p ) that edge will pattern involved in association rule Coherent rule Actual coverage of the rule‟s left side be pruned and all disjoint graphs except s p will be pattern must be greater than or equal to actual considered as closed sequence and moves it coverage of the right side pattern Inference Support ias: refers actual coverage of the into DBS and remove all disjoint graphs from pattern memory. fia ( st ) : Represents the inference support of the The above process continues till the elements available in memory those are connected sequence st through direct or transitive edges and projecting Approach: itemsets i.e., till graph become empty For each pattern sp of the pattern dataset, If fia ( st )  ias then we prune that pattern C. Inference Analysis [35] Inferences: Pattern positive score is sum of no of 5. ATTRIBUTE ONTOLOGICAL transactions in which all items in the pattern exist, RELATIONAL WEIGHTS FRAMEWORK no of transactions in which all items in the pattern The proposed post mining process Attribute does not exist Ontological Relational Weights described in Pattern negative score is no of transactions detailed here. Table 1 represents the notations used in which only few items of the pattern exist in AORW framework. . Table1: Notations used in Attribute Ontological Relational Weights 1 r Left side itemset of the Rule r lhs 2 rrhs Right side itemset of the rule r 3 RS Rule set 4 cpcc Class properties count of class c 5 apca Attribute property count of attribute a 6 psd Property support degree 7 tpc Total properties of class c 8 psa Property support of attribute a of class c apca psa  cpcc ps where a  c 9 psd a  a tpc 10 rsc Relation support of class c is max threshold value 1. 11 ARS Attribute Relation support 12 ARSDi Attribute Relation support Degree of itemset i. 13 If attributes a i and a j belongs to ARS (ai , a j )  1 , where {ai , a j }  c same class c 14 If attribute a i belongs to class ci and psd ai  psd a j ARS (ai , a j )  attribute a j belongs to class c j , ci rsci  rsc j and c j relation is true then 15 If ci and c j relation is false ARS (ai , a j )  0 , 16 ARS (ai , a j )  ARS (a j , ai ) Applicable in all cases such as both belongs to same class, both belongs to different classes that are not having relation and both belongs to different classes that are having relation 17 No of attribute pairs in an itemset pc 1346 | P a g e
  • 6. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.1342-1351 18 pc : is total number of pairs in a given itemset. pc  0; n : is total number of attributes in the given itemset n 1 n 1  pc  i i 1 (or ) pc  2 n 19 pcrlhs Pair-count of itemset, which is lhs of rule r . 20 pcrrhs Pair-count of itemset, which is rhs of rule r . 21 pcrlhs rrhs Pair-count of itemset that generated from rlhs rrhs 22 PS  { p , p , ......., p } Pair set that generated from itemset i i 1 2 m 23 ARS ( pi ) Attribute relation support of pair pi . 24 | psi | Attribute relation support degree of itemset i  ARS ( pk ) k 1 And pk  psi , here pk is kth pair of pair-set ps of itemset i ARSDi  | psi | 25 RCr Relation confidence of rule r . |PSrlhs | Attribute relation support degree of itemset, which is lhs of 26  pk rule r ARSDr  k 1 And pk  psrlhs , here pk is kth pair of pair-set ps of itemset lhs |PS rlhs | lhs of rule r |PSrrhs | Attribute relation support degree of itemset, which is rhs of 27  pk rule r ARSDr  k 1 And pk  psrrhs , here pk is kth pair of pair-set ps of itemset rhs |PSr | rhs rhs of rule r |PS | Attribute relation support degree of itemset that generated from 28 rlhs rrhs  pk lhs  rhs of rule r k 1 ARSDrlhs rrhs  |PSrlhs rrhs | And pk  PS r , here pk is kth pair of pair-set ps of lhs rrhs itemset that generated from lhs  rhs of rule r ARSDrlhs rrhs Relation confidence of r is the coefficient emerged as result 29 rcr  when 30attribute support degree of all attribute involved in rule ARSDrlhs r is divided by attribute support degree of rule r ‟s lhs Property Support: No of attribute properties are Relation confidence: is a rule level measurement matched to number class properties to which that concludes the relation strength between left hand side attribute belongs to [Table 1 row: 8]. itemset and right hand side itemset of a given rule[see Property Support degree: indicates the ratio of table 1 row: 29] properties matched to class level properties [table 1 AORW algorithm: row: 9]. Input: Rule set RS , Class Descriptor CD and relation psa confidence threshold rct Ex: psd a  ; here a is an attribute of class ' cpcc Output: Significant Rule set RS which is subset of c [ a  c] RS RS  RS' Attribute Relation support: Indicates the strength of Begin: the relation between two attributes of an itemset that While RS is not empty are considered as pair for equation [see table 1 row: Begin: 11, 13]. Read a rule r from RS Pair Count: Total number of two attributes sets; here these attribute sets must be unique [see table 1, row Find property support psa and property support 17, 18] degree psd a of each attribute a of rlhs [Table 1 Attribute Relation support degree: is an itemset row: 8, 9] level measurement representing average relation strength of the attributes those belongs to an itemset Find property support psa and property support [see table 1 row: 24] 1347 | P a g e
  • 7. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.1342-1351 degree psd a of each attribute a of rrhs [Table 1 mining techniques. 3) There is the involvement of an enhanced occurrence and a probability reduction in row: 8, 9] the memory exploitation rate with the aid of the trait Find unique two attribute pair set PS from equivalent prognosis and also rim snipping of the rlhs PEPP with inference analysis and AORW. This is on rlhs [Table 1 row: 22] the basis of the surveillance done which concludes Find Attribute relation support ARS p for each pair that AORW implementation is far more noteworthy and important in contrast with the likes of other p , where p  PS rlhs [Table 1; row: 13, 15, 15 and notable models [10, 12, 14]. 16] JAVA 1.6_ 20th build was employed for Find Attribute relation support degree ARSDrlhs of accomplishment of the AORW along with PEPP under inference analysis. A workstation equipped rlhs [Table 1; row: 24, 26]. with core2duo processor, 2GB RAM and Windows Find unique two attribute pair set PS r from XP installation was made use of for investigation of lhs the algorithms. The parallel replica was deployed to rlhs [Table 1 row: 22] attain the thread concept in JAVA. Find Attribute relation support ARS p for each pair Dataset Characteristics: p , where pPSrrhs [Table 1; row: 13, 15, 15 and 16] Pi is supposedly found to be a very opaque Find Attribute relation support degree ARSDrrhs of dataset, which assists in excavating enormous quantity of recurring clogged series with a profitably rlhs [Table 1; row: 24, 27]. high threshold somewhere close to 90%. It also has a Find unique two attribute pair set PSr r from distinct element of being enclosed with 190 protein lhs rhs series and 21 divergent objects. Reviewing of rlhs  rrhs [Table 1 row: 22] serviceable legacy‟s consistency has been made use of by this dataset. Fig. 3 portrays an image depicting Find Attribute relation support ARS p for each pair dataset series extent status. p , where p  PS rlhs  rrhs [Table 1; row: 13, 15, 15 In assessment with all the other regularly quoted forms like [14,12,10], Post-Processing for and 16] Rule Reduction Using Closed Set[14] has made its Find Attribute relation support degree ARSDr mark as a most preferable, superior and sealed lhs rrhs example of post mining copy, taking in view the of rlhs  rrhs [Table 1; row: 24, 28]. detailed study of the factors mainly, experts Find unique two attribute pair set PSr r from involvement, memory consumption and runtime. lhs rhs rlhs  rrhs [Table 1 row: 22] Find Attribute relation support ARS p for each pair p , where p  PS [Table 1; row: 13, 15, 15 rrhs rlhs and 16] Find Attribute relation support degree ARSDr rhs rlhs of rlhs  rrhs [Table 1; row: 24, 28]. Find Relation confidence rcr of rule r If rcr  rct then add rule r to resultant rule set RS ' End End Fig 2: Attribute Ontological Relational Weights algorithm This segment focuses mainly on providing evidence on asserting the claimed assumptions that 1) Fig 3: A comparison report for Runtime The post mining framework AORW is competent enough to momentously surpass results when evaluated against other post mining techniques [14, 10, 12. 2) Utilization of memory and computational complexity is less when compared to other post 1348 | P a g e
  • 8. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.1342-1351 Fig 6: Rules pruned in in % by AORW and RRUC[14] Fig: 4: A comparison report for memory usage 6. CONCLUSION We proposed a post mining process called Attribute relation analysis framework (AORW) for pruning association rules that are contextually irrelevant. In earlier works [10, 12, 14] the contextual irrelevancy was identified in various ways such as (1) rule evaluation by domain expert, (2) rule evaluation by itemset closeness. We argued that none of these two models is significant in all contexts. In second case adaptability to various data contexts is missing. In the first case, rule selection highly influenced by the experts view, that is when expert changes then rule significance might be rated differently. To defend these limits here we proposed a post mining process Fig 5: Sequence length and number of sequences at as an extension to our earlier proposed closed itemset different thresholds in Pi dataset mining algorithm PEPP with inference analysis [34, 35]. Here in this proposed post mining process In contrast to AORW and RRUC [14], a very AORW, the experts view is not defending one to other, intense dataset Pi is used which has petite recurrent rather it extends or refines. This become possible in closed series whose end to end distance is less than 10, AORW because of the proposed concept called even in the instance of high support amounting to attribute class relation descriptor. In this work we around 90%. The diagrammatic representation consider relation confidence as bench mark for rule displayed in Fig 3 explains that the above mentioned pruning; in future this work can be extended to prune two algorithms execute in a similar fashion in case of the rules by opting relation confidence threshold rules that are generated at support 90% and above. under inference analysis. But in situations when the support case is 88% and less, then the act of AORW surpasses RRUC [14]‟s REFERENCES routine. The disparity in memory exploitation of [1] Fienberg, S. E. & Shmueli, G: "Statistical AORW and RRUC [14] can be clearly observed issues and challenges associated with rapid because of the consumption level of AORW being detection of bio-terrorist attacks. Statistics in low than that of RRUC. Fig 5 indicates that rules that Medicine, 2005 are contextually irrelevant have been pruned by [2] Carmona-Saez, P., Chagoyen, M., AORW in high probable rate and stable. Apart from Rodriguez, A., Trelles, O., Carazo, J. M., & the benefits observed, the rules identified by AORW Pascual-Montano, A: :Integrated analysis of are more relevant to transaction consequences. It gene expression by association rules becomes possible since there is no task of experts discovery". BMC Bioinformatics, 2006 evaluation in post mining process. Due to the concept [3] Ronaldo Cristiano Prati, "QROC: A of attribute class relation descriptor, the relation Variation of ROC Space to Analyze Item Set between attributes involved in rule is stable. Costs/Benefits in Association Rules", Post- Mining of Association Rules: Techniques for Effective Knowledge Extraction, IGIGlobal, 2009 Pages: 133-148 pp. [4] McNicholas, P. D: "Association rule analysis of CAO data (with discussion). Journal of the Statistical and Social Inquiry Society of Ireland, 2007 1349 | P a g e
  • 9. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.1342-1351 [5] Prati, R., & Flach, P. "ROCCER: an Knowledge". ACM SIGKDD Conference on algorithm for rule learning based on ROC Knowledge Discovery in Databases, 2004 analysis". Proceeding of the 19th [16] Jaroszewicz, S., & Scheffer, T: "Fast International Joint Conference on Artificial Discovery of Unexpected Patterns in Data, Intelligence, 2005 Relative to a Bayesian Network". ACM [6] Fawcett, T: "PRIE: A system to generate SIGKDD Conference on Knowledge rulelists to maximize ROC performance". Discovery in Databases, 2005 Data Mining and Knowledge Discovery [17] Faure, C., Delprat, D., Boulicaut, J. F., & Journal, 2008, 17(2), 207-224. Mille, A: "Iterative Bayesian Network [7] Kawano, H., & Kawahara, M: "Extended Implementation by using Annotated Association Algorithm Based on ROC Association Rules." 15th Int‟l Conf. on Analysis for Visual Information Navigator". Knowledge Engineering and Knowledge Lecture Notes In Computer Science, 2002, Management – Managing Knowledge in a 2281, 640-649. World of Networks, 2006 [8] Piatetsky-Shapiro, G., Piatetsky-Shapiro, G., [18] Liu, B., & Hsu, W: "Post-Analysis of Frawley, W. J., Brin, S., Motwani, R., Learned Rules". AAAI/IAAI, 1996 Ullman, J. D., : "Discovery, Analysis, and [19] Lent, B., Swami, A. N., & Widom, J: Presentation of Strong Rules". Proceedings "Clustering Association Rules". Proceedings of the 11th international symposium on of the Thirteenth International Conference on Applied Stochastic Models and Data Data Engineering, 1997, Birmingham U.K., Analysis ASMDA, 2005, 16, 191-200 IEEE Computer Society [9] Liu, B., Hsu, W., & Ma, Y: "Pruning and [20] Savasere, A., Omiecinski, E., & Navathe, S. summarizing the discovered associations". B: "Mining for strong negative associations Proceedings of the fifth ACM SIGKDD in a large database of customer transactions". international conference on Knowledge Proceedings of the 14th International discovery and data mining 1999, (pp. 125- Conference on Data Engineering, 134) Washington DC, USA, 1998 [10] Hacene Cherfi, Amedeo Napoli, Yannick [21] Basu, S., Mooney, R. J., Pasupuleti, K. V., & Toussaint: "A Conformity Measure Using Ghosh J: "Evaluating the Novelty of Text- Background Knowledge for Association Mined Rules using Lexical Knowledge". 7th Rules: Application to Text Mining", Post- ACM SIGKDD International Conference on Mining of Association Rules: Techniques for Knowledge Discovery in Databases, 2001 Effective Knowledge Extraction, IGIGlobal, [22] Claudia Marinica and Fabrice Guillet: 2009 Pages: 100-115 pp. "Knowledge-Based Interactive Postmining of [11] Liu, B., Hsu, W., & Ma, Y.: "Pruning and Association Rules Using Ontologies", IEEE summarizing the discovered associations". TRANSACTIONS ON KNOWLEDGE Proceedings of the fifth ACM SIGKDD AND DATA ENGINEERING, VOL. 22, international conference on Knowledge NO. 6, JUNE 2010 discovery and data mining, 1999 [23] M.J. Zaki and C.J. Hsiao, “Charm: An [12] Hetal Thakkar, Barzan Mozafari, Carlo Efficient Algorithm for Closed Itemset Zaniolo: "Continuous Post-Mining of Mining,” Proc. Second SIAM Int‟l Conf. Association Rules in a Data Stream Data Mining, pp. 34-43, 2002. Management System", Post-Mining of [24] J. Pei, J. Han, and R. Mao, “Closet: An Association Rules: Techniques for Effective Efficient Algorithm for Mining Frequent Knowledge Extraction, IGIGlobal, 2009 Closed Itemsets,” Proc. ACM SIGMOD Pages: 116-132 pp. Workshop Research Issues in Data Mining [13] Kuntz, P., Guillet, F., Lehn, R., & Briand, H: and Knowledge Discovery, pp. 21-30, 2000. "A User-Driven Process for Mining [25] M.J. Zaki and M. Ogihara, “Theoretical Association Rules". In D. Zighed, H. Foundations of Association Rules,” Proc. Komorowski, & J. Zytkow (Eds.), 4th Eur. Workshop Research Issues in Data Mining Conf. on Principles of Data Mining and and Knowledge Discovery (DMKD ‟98), pp. Knowledge Discovery 2000 1-8, June 1998. [14] Huawen Liu, Jigui Sun, Huijie Zhang: "Post- [26] D. Burdick, M. Calimlim, J. Flannick, J. Processing for Rule Reduction Using Closed Gehrke, and T. Yiu, “Mafia: A Maximal Set", Post-Mining of Association Rules: Frequent Itemsgorithm,” IEEE Trans. Techniques for Effective Knowledge Knowledge and Data Eng., vol. 17, no. 11, Extraction, IGIGlobal,2009 Pages: 81-99 pp. pp. 1490-1504, Nov. 2005. [15] Jaroszewicz, S., & Simovici, D. A: [27] J. Li, “On Optimal Rule Discovery,” IEEE "Interestingness of Frequent Itemsets using Trans. Knowledge and Data Eng., vol. 18, Bayesian networks as Background no. 4, pp. 460-471, Apr. 2006. 1350 | P a g e
  • 10. M.Saritha, T.Venkata Ramana, Kareemunnisa, Y.Sowjanya, M.Nishantha / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.1342-1351 [28] M. Hahsler, C. Buchta, and K. Hornik, SHORT BIO DATA FOR THE AUTHOR “Selective Association Rule Generation,” Computational Statistic, vol. 23, no. 2, pp. Ms. M. Saritha working as Assistant Professor in the 303-315, Kluwer Academic Publishers, Department of Computer Science and Engineering. 2008. Guru Nanak Institutions of Technical Campus with a [29] H. Toivonen, M. Klemettinen, P. Ronkainen, teaching experience of 1.5 years. She is a K. Hatonen, and H. Mannila, “Pruning and B.Tech(ECE), and M.Tech in Computer science (PC) Grouping of Discovered Association Rules,” and Her areas of interest includes Mobile Computing, Proc. ECML-95 Workshop Statistics, DataMining and Information Security. Machine Learning, and Knowledge Discovery in Databases, pp. 47-52, 1995. Professor.ThaduriVenkataRamana received the [30] J. Bayardo, J. Roberto, and R. Agrawal, Master Degree in Computer Science and Engineering “Mining the Most Interesting Rules,” Proc. from Osmania University, Hyderabad. Pursuing Ph.D ACM SIGKDD, pp. 145-154, 1999. in Computer Science and Engineering from [31] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo, “Finding Jawaharlal Nehru Technological University, Interesting Rules from Large Sets of Hyderabad (JNTUH), Hyderabad. . His area of Discovered Association Rules,” Proc. Int‟l interest in research is Information Retrieval & Data Conf. Information and Knowledge mining. Presently working as a Professor & Head, Management (CIKM), pp. 401-407, 1994. Department of Computer Science and Engineering, [32] R. Srikant and R. Agrawal, “Mining SLC‟s Institute of Engineering and Technology, Generalized Association Rules,” Proc. 21st Int‟l Conf. Very Large Databases, pp. 407- Hyderabad. 419, http://citeseer.ist.psu.edu/srikant95mining.ht Ms. Kareemunnisa working as Assistant Professor in ml, 1995. the Department of Computer Science and [33] Barzan Mozafari, Hetal Thakkar, Carlo Engineering. Guru Nanak Institutions of Technical Zaniolo. Verifying and Mining Frequent Campus with a teaching experience of 6 years. She is Patterns from Large Windows over Data a B.Tech(IT), and M.Tech in Computer scienceand Streams. In Proceedings of the 24th Engineering (CSE) and Her areas of interest includes International Conference on Data Mobile Computing, Data Mining and Information Engineering (ICDE 2008), Cancún, México, Security. April 7-12, 2008 Ms. Y. Sowjanya received Bachelor Degree in [34] Parallel Edge Projection and Pruning (PEPP) Information Technology from Jawaharlal Nehru Based Sequence Graph protrude approach for Technological University, Hyderabad (JNTUH) and Closed Itemset Mining kalli Srinivasa pursuing Master Degree in Computer Science and Nageswara Prasad, Sri Venkateswara Engineering (CSE) form Jawaharlal Nehru University, Tirupati, Andhra Pradesh, India. Technological University, Hyderabad (JNTUH). Her Prof. S. Ramakrishna, Department of research interest includes Computer Networks and Computer Science, Sri Venkateswara Information Retrieval Systems, Data Mining. University, Tirupati, Andhra Pradesh, India ol. 9 No. 9 September 2011 International Ms. M. Nishantha working as Assistant Professor in Journal of Computer Science and the Department of Computer Science and Information Security Publication September Engineering. Guru Nanak Institutions of Technical 2011, Volume 9 No. 9 Campus with a teaching experience of 6 years. She is [35] Mining Closed Itemsets for Coherent Rules: a B.Tech(IT), and M.Tech in Computer scienceand An Inference Analysis Approach By Kalli Engineering (IT) and Her areas of interest includes Srinivasa Nageswara Prasad, Prof. S. Mobile Computing, DataMining and Information Ramakrishna, Global Journal of Computer Security. Science and Technology Volume 11 Issue 19 Version 1.0 November 2011 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172 & Print ISSN: 0975-4350 1351 | P a g e