SlideShare une entreprise Scribd logo
1  sur  9
Télécharger pour lire hors ligne
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014
DOI : 10.5121/ijdkp.2014.4402 17
MINING FREQUENT ITEMSETS (MFI) OVER
DATA STREAMS : VARIABLE WINDOW SIZE
(VWS) BY CONTEXT VARIATION ANALYSIS
(CVA) OF THE STREAMING TRANSACTIONS
V.Sidda Reddy1
, T.V.Rao2
and A.Govardhan3
1
Department of Computer and Engineering,
Sagar Institute of Technology, Chevella, Hyderabad, T.S, India
2
Departments of Computer and Engineering,
PVP Siddhartha Institute of Technology, Vijayawada, A.P, India
3
School of Information Technology,
Jawaharlal Nehru Technological University, Hyderabad, T.S, India
ABSTRACT
The challenges with respect to mining frequent items over data streaming engaging variable window size
and low memory space are addressed in this research paper. To check the varying point of context change
in streaming transaction we have developed a window structure which will be in two levels and supports in
fixing the window size instantly and controls the heterogeneities and assures homogeneities among
transactions added to the window. To minimize the memory utilization, computational cost and improve the
process scalability, this design will allow fixing the coverage or support at window level. Here in this
document, an incremental mining of frequent item-sets from the window and a context variation analysis
approach are being introduced. The complete technology that we are presenting in this document is named
as Mining Frequent Item-sets using Variable Window Size fixed by Context Variation Analysis (MFI-VWS-
CVA). There are clear boundaries among frequent and infrequent item-sets in specific item-sets. In this
design we have used window size change to represent the conceptual drift in an information stream. As it
were, whenever there is a problem in setting window size effectively the item-set will be infrequent. The
experiments that we have executed and documented proved that the algorithm that we have designed is
much efficient than that of existing.
KEYWORDS
Data Mining, Data Streams, Frequent Itemset, Frequent Itemset Mining, Data Stream Mining, Variable
Window, Sliding Window
1. INTRODUCTION
Association rule is an important and well researched method for finding frequent itemsets
(patterns) among set of objects in large database [1]. Frequent itemsets indicates the closeness
among the items that are available in datasets. There are very good range of applications that
includes creation of association rules for market-basket analysis, in text mining grading and
clustering of documents, text, and pages, web mining, that allows users to disclose hidden
patterns and relationships in huge datasets. The time taken to produce the data is outstripping the
speed of its mining in the present upcoming applications where the information is in the form of
an enormous and continuous stream [2]. To traditional static databases this is in quiet opposite, so
data stream mining is considerably different from traditional data mining with respect good
number of aspects. Primarily, the amount of data embedded in its lifetime in a data stream could
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014
18
be overwhelmingly high [3]. Along with the above it is required to create spontaneous responses
by maintaining response time to queries on such information streams because of rigid pitfalls of
resource [4]. The data stream mining is the area where very high level of research is taking place
because of the above mentioned reasons and contemporary research area is the challenge of
getting timely and appropriate association rules. Migrating from conventional data mining
methods to the emerging high efficient methods those will accommodate to function on an open-
ended, high speed stream of data [5]. The challenges that are inevitable to all mining technologies
because of the unique and inherent qualities of a data stream are as follows [3]. Primarily the
conventional technology is not applicable as it is required to create model in this method database
need to be scanned multiple number of times which is not possible because of the continuous
quality of the stream data. The high priority was given to the scalability of the frequent itemsets
mining in approaches like association rules, classification and clustering. This requires high level
of data transformation from the representation to other as the algorithm is designed in such a way,
this in turn will use resources extensively resulting in high CPU overhead. Our proposal in this
paper is a model known as Mining Frequent itemsets using Variable Window Size fixed by
Context Variation Analysis (MFI-VWS-CVA). This model is the resource effective and
measurable, as it functions with limited memory requirements and limited computational
expenses. It portrays the trade-offs between computation, data representation, I/O and heuristics.
Context variation analysis based dynamic window based transaction storage is being used in the
proposed algorithm and also allows TIFIM [15] to regular itemsets from the concluded window.
2. RELATED WORK
CPS-tree a prefix-tree structure was proposed by Syed Khairuzzaman Tanbeer [6]. Dynamic tree
restructuring technique was used in the CPS tree in order to manage the stream data. The basic
pitfall of this model is, it reconstructs the tree for every new arrival of the item. This used to result
in high memory usage and a time consuming process. Weighted Sliding Window (WSW)
algorithm was initiated by Pauray S.M. Tsai [7]. In this method weight of each transaction in each
window will be calculated by the algorithm proposed. Here also same memory space and time are
the major concerns; this method failed using these two economically. An apriori algorithm is
being for candidate generation. Hue-Fu Li [8] introduced an effective bit-sequence dependent
algorithm named as MFI-TransSW (Mining Frequent Item sets with in a Transaction Sensitive
Sliding window). There as three phases in MFI algorithm. As there is any increase in the window
size, there will be a subsequent increase in the memory usage of MFI-TransSW. Similarly
whenever there is an increase in window size, the time consumption in phase 1 and phase 2 of
MFI-TransSW to process will also increase. Yo unghee Kim [9, 16] initiated an effective
algorithm with normalized weight over data stream called WSFI mine (Weighted Support
Frequent Item sets mining). From the database in one scan this WSFI-mine algorithm can mine
all frequent item sets. HUPMS (High Utility Pattern Mining in Stream data) was recommended
by Chowdhury Farhan Ahmed [10]. This is a different algorithm for sliding window based high
utility pattern mining over data stream. For interactive mining only this algorithm is suitable. In
the paper that was presented by Jing Guo [11], they have discussed about how to mine regular
patterns across multiple data streams. Here for analysis they have considered real time news paper
data. In multiple streams it is vital to identify collaborative frequent patterns and comparative
frequent patterns. Prevention of misuse of sensitive data in a stream was addressed by Anushree
Gowtham Ringe [12] in their work. They initiate a new technique for guarding the privacy of data
stream. Fan Guidan [13] proposed a model which is conceptually similar to our model. Matrix in
sliding window over data streams plays major role in this model. In order to store, this algorithms
uses two 0-1 matrices and 2-itemsets, further applies relative operation of the two matrices to
extract frequent itemsets.
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014
19
3. MINING FREQUENT ITEMSETS OVER DATA STREAMS WITH CONTEXT
AWARE VARIABLE SIZE WINDOWS
If streaming data is input to mining strategies such as frequent itemsets mining, the traditional
approaches are not suitable, since those are mainly works by the multiple passes through entire
dataset. Henceforth the mining strategies opted for streaming data considers the tuples of the
streaming transactions as windows and these windows are used as input to the mining algorithms.
The significant issue here in this model is fixing the window size. In the case of data streams with
transitional and temporal state transactions, the transitional and temporal state identifications can
be used to fix the window size. In the other cases that are not having any transitional and temporal
state identities for streaming transactions, fixing window size is a big constraint to achieve quality
factors such as results accuracy, process scalability. In this regard here we propose a novel
context variation based dynamic window size fixing approach to mine frequent itemsets over data
streams. The proposed frequent itemsets mining strategy is centric to following qualities targeted.
• The window size should be optimal and dynamic.
• The window size should fix dynamically between minimal and maximal size given as
thresholds.
• The size of the window should be within the range of minimal and maximal size and
should fix based on the context variation observed in input transaction from the data
stream.
In regard to implementing the proposed model, the only significant constraint related to the
streaming data is that the context change of the transactions should be in an order.
The minimal memory utilization and less computational cost are two main quality metrics
expected from this proposal. The exploration of the proposed window fixing strategy is follows:
Let ds be the DataStream, and stream transactions as horizontal partitions of the transactions, let
each partition having one transaction. Let n be the total count of the attributes used to form the
transactions by ds . Let seta be the attributes set that contains attributes
1 2 1{ , ,....... , ,...... }i i na a a a a+ , which are used to form the transactions. Let
1 2 3 1 ( 1){ , , ...., , ,.... , ,......}i i i m i mt t t t t t t+ + + + be the transactions streaming in the same sequence. Let
minws be the minimum window size and maxws be the maximal window size. Let tranw be the
transaction window and ccaw be the context change analysis window. Let ( )ccas w be the size of
ccaw . The initial values to min max,ws ws and ( )ccas w will be set during the pre-processing step.
The transactions of count minws from the given data stream ds will be moved initially to tranw ,
then following transactions of count ( )ccas w will be moved to ccaw . Then context variation
analysis (CVA) process will be initialized. The exploration of the CVA process is follows:
The attributes involved to generate the transactions moved into tranw will be collected as ( )tranal w
, and attributes involved to form the transactions found in ccaw will also be collected as ( )ccaal w .
Then the similarity score of these two attribute lists ( ), ( )tran ccaal w al w will be found as follows
(Eq1), which is derived from jaccard similarity measuring approach.
( )
( ) ( )
( ) ( )trans cca
tran cca
w w
trans cca
al w al w
ss
al w al w
↔ =
I
U
… (Eq1)
If similarity score ( )tran ccaw wss ↔ is greater than the given similarity score threshold ssτ then the
transactions of ccaw will be moved to tranw (see Eq2).
tran tran ccaw w w= U … (Eq2)
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014
20
If size of the tranw is greater than or equals to maxws then the tranw will be finalized and initiates
process of mining frequent itemsets from the transactions of tranw , else the further streaming
transactions of size ( )ccas w will moved to ccaw and continues CVA process.
Once the tranw is finalized and mining of frequent itemsets is initiated, then tranw and ccaw will be
cleared and continues the process explored to prepare the window tranw will be continued for
further transactions streaming from data stream ds .
The above said process continues till transactions found from data stream ds . The mining
frequent itemsets from the finalized window will be done by using TIFIM [15], which is a tree
based incremental frequent itemsets mining approach that we devised in our earlier research paper
(see sec 3.3).
3.1 Algorithmic exploration of the Fixing Variable Window Size by Context
Variation Analysis
Inputs:
• Data stream ds
• Minimal transaction window size minws
• Maximal transaction window size maxws
• Similarity score threshold ssτ
• Size of the context change analysis window ( )ccas w
1. Begin
2. For each transaction { }t t ds∀ ∈ Begin
3. If min(| | )tranw ws< tranw t←
4. Else Begin
5. ccaw t←
6. If (| | ( ))cca ccaw s w≥
7. ( , )tran ccass CVA w w←
8. if( )ss ssτ≥ Begin
9. ( )tran tran ccaw w w← U
10. If max(| | )tranw ws≥ Begin
11. Finalize window tranw
12. Initiate ( )tranTIFIM w
13. set tranw φ← // empty tranw
14. set ccaw φ← //empty ccaw
15. End of 10
16. End of 8
17. Else Begin
18. Finalize window tranw
19. Initiate ( )tranTIFIM w
20. set tranw φ← // empty tranw
21. set tran ccaw w← //move transactions of window ccaw to new window tranw
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014
21
22. set ccaw φ← //empty ccaw
23. End of 17
24. End of 4
25. End of 2
26. End of 1
3.2 Algorithmic exploration of the Context Variation Analysis
1. ( , )tran ccaCVA w w Begin
2. Set tranfs φ← // initiate field set tranfs of transaction window tranw empty
3. Foreach transaction { }trant t w∀ ∈ Begin
4. tran tranfs fs t← U
5. End of 3
6. Set ccafs φ← // initiate field set ccafs of transaction window ccaw empty
7. Foreach transaction { }ccat t w∀ ∈ Begin
8. cca ccafs fs t← U
9. End of 7
10. tran cca
tran cca
fs fs
ss
fs fs
=
I
U
//measuring similarity score of tranw and ccaw
11. Return ss
12. End of 1
3.3 Tree (Bush) Based Incremental Frequent Itemsets Mining (TIFIM)
3.3.1 Finding Frequent Itemsets
The primary representation of the transactions of data stream ds is as described above. An
asynchronous parallel process runs to find frequent itemsets in incremental manner.
A bush represents itemsets with two attribute pair such that these attributes belongs to tranfs and
transactions contain that pair. The coverage to measure the frequency of the itemsets can be
considered and set in the context of window tranw size. The coverage of two attribute itemsets can
be the count of number of Childs in a bush represented by each pair of attributes.
An asynchronous parallel process called frequent itemsets finder (FIF) performs as follow:
Initially picks the bushes with coverage more than given coverage thresholdcov .
Prepare new bushes from each two bushes by union the roots and intersects the Childs, and
retains it if new bush coverage is greater or equal to cov else discards.
This continues until no new bush formed.
3.3.2 The pruning process
A bush ib said to be sub-bush to bush jb if i jb br r⊆ and ( ) ( )cov covi jb b≤ . Since sub-bush ib
represented by jb , then bush ib can be pruned from the bush-set B .
3.4 Find frequent items
At an event of time, frequent itemsets can be found as follows
The roots of the bushes with coverage more than given cov can be claimed as frequent itemsets.
A bush ‘ ib ’ coverage can be find as follows
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014
22
If a bush jb found to be such that i jb b⊆ and coverage value of jb is higher than any other bush
kb such that i kb b⊆ , then the coverage of ib said to be ( ) ( )cov covj ib b+ .
3.4.1 An algorithmic representation of the caching processes
Input: At an event of time a window iw with transaction it received
For each transaction it :
Let set of attributes
1 2 3
1 2 3
1 2 3
{( , , ,......... )
( , , ,......... )
( , , ,......... ) }
i
i i
i set
a a a a
a a a a t
a a a a A
∀
∈ ∧
⊆
For each pair of attributes{( , ) ( , ) }m n m n ia a a a t∀ ∈ , if found a bush { ( , ) }i m nb a a as root∃ then
add transaction it as node to bush ib , else prepare a bush such that
{ ( , ) t }i m n ib a a as roo t as node∃ ∧
3.4.2 An algorithmic approach of FIF
The bush set B prepared by caching process is said to be input to FIF
For each bush { }i ib b B∀ ∈ perform the following:
For each bush ( : 1,2,3....... ) )i c i cb c n b B+ +∃ = ∧ ∈
Forms a bush ( ) ( ){ }i i c i i cb b B+ +∃ ∉U U by Union the roots of the ‘ ib ’and ‘ i cb+ ’ ( ( ) ( )i i cb br r +
U ) and
intersects nodes of ib and i cb+ ( ( ) ( )i i cb bts ts +
I ).
4. EMPIRICAL ANALYSIS OF THE FIXING VARIABLE WINDOW SIZE BY
CONTEXT VARIATION ANALYSIS
4.1 Dataset characteristics
Multiple sets of data streamed to perform the experiments, and the characteristics of these
streaming data are as follows:
• To achieve the sparseness in streaming transactions, the range of fields considered as
75,100, 125 and 150, the max transaction length set in the range of 12 to 18, the min
transaction range set to 5 and the total number of transactions has taken in the range of
1000 to 10000.
• To achieve the denseness in streaming transactions, the range of fields considered as 20,
30, 40 and 50, the max transaction length set in the range of 10 to 15, the min transaction
range set to 5 and the total number of transactions has taken in the range of 1000 to
10000.
4.2 Experimental results
We compare our algorithm with frequent itemsets mining model for data streams devised in [13],
which is a matrix based frequent itemsets mining (MFIM) algorithm for data streams. The
implementation of our MFI-VWS-CVA and model MFIM done by using java 7 and set of flat
files as streaming data sources. The streaming environment is emulated using java RMI and
parallel process involved in proposed MFI-VWS-CVA is achieved by using java multi threading
concept. The three parameters of each synthetic dataset are the total number of transactions, the
average length, and divergence count of items, respectively. Each transaction of a dataset is
scanned only once in our experiments to simulate the environment of data streams. In regard to
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014
23
measure the computational cost and scalability, the algorithms run under divergent coverage
values in the range of 10% to 90%.
Figure 1: NFI-VWS-CVA advantage over MFIM in Memory usage.
Figure 2: MFI-VWS-CVA advantage over MFIM in terms of execution time.
Figure 1 and 2 shows the comparison of the Memory usage, execution time under divergent
coverage values range given from 10% to 40% respectively. The Figure 3 explores the scalability
of MFI-VWS-CVA over MFIM under divergent streaming data sizes respectively, In Figure 1
and 2, the horizontal axis is the coverage given and the vertical axis is the memory in unit of
mega bytes and time in unit of seconds respectively. In Figure 3, the horizontal axis is the
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014
24
streaming data size given in unit of transactions and the vertical axis is the execution time in unit
of seconds and percentage of elapsed time in unit of seconds respectively. As the coverage value
decreases the average increment in memory usage for matrix based FIM and MFI-VWS-CVA
are 2.29 and 0.7 respectively (see Figure 1) and average execution time increment for matrix
based FIM and MFI-VWS-CVA are 83.2 and 27.9 respectively (see Figure 2). The results
obtained here clearly indicating that the performance of MFI-VWS-CVA is miles ahead than the
matrix based FIM. The performance of MFI-VWS-CVA is scalable as matrix based FIM is taking
average of 14.16% elapsed time under uniform increment of streaming data size with 1500
transactions.
Figure 3: MFI-VWS-CVA advantage over MFIM about Scalability under divergent streaming
data size.
5. CONCLUSION
We explored a novel approach for mining the frequent itemsets from a data stream. We have
implemented an efficient tree based incremental frequent itemsets mining model [15] in our
earlier research paper, further here we developed an approach for mining frequent itemsets using
variable window size by context variation analysis (MFI-VWS-CVA) over data streams. Due to
the factor of fixing window size dynamically by concept variation analysis, the said model is
identified as optimal and scalable. A parallel process that determines frequent itemsets from the
concept of cached bush structures, which is our earlier proposal [15], performs frequent itemsets
mining over data streams. We extended this incremental frequent itemset mining algorithm by
introducing windowing the streaming transaction with variable window size technique in regard
to achieve efficient memory usage and execution time. The experiment results confirm that the
MFI-VWS-CVA is scalable under divergent streaming data size and coverage values. In future
this model can be extended to perform utility based frequent itemset mining over data streams.
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014
25
ACKNOWLEDGEMENT
The authors would like to thank the anonymous reviewers for their valuable comments.
We also thank the authors of all references for helping us setup the paper.
REFERENCES
[1] V.Sidda Reddy, M.Narendra and K.Helini, “Knowledge Discovery from Static Datasets to Evolving
Data Streams and Challenges”, International Journal of Computer Applications, Volume 87, No.15,
pp 22-25, 2014.
[2] Moses Charikar, Kevin Chen and Martin Farach-Colton, “Finding frequent items in data streams”,
Journal of Theoretical Computer Science – Special issue on automata, languages and programming,
pp 3-15, 2004.
[3] Mohamed Medhat Gaber, Arkady Zaslavsky and Shonali Krishnaswamy, “Mining data streams: a
review”, ACM SIGMOD Record, Volume 34, Issue 2, pp 18-26, 2005.
[4] Nan Jiang and Le Gruenwald., “CFI-stream: Mining closed frequent itemsets in data streams”,
Proceedings of the 12th ACM SIGKDD international conference on KDD, USA, 2006.
[5] Gurmeet Singh Manku and Rajeev Motvani, “Approximate frequency counts over data streams”,
Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China,
2002.
[6] Syed Khairuzzaman Tabeer, Chowdary Farha Ahmed, Byeong-Soo Jeong, and Young Koo Lee,
”Efficient frequent pattern mining over data streams”, In proceeding of: Proceedings of the 17th ACM
Conference on Information and Knowledge Management, pp 22-30, 2008.
[7] Pauray S.M.Tsai, “Mining frequent item sets in data streams using the weighted sliding window
model”, Elsevier publication, 2009.
[8] Hue-Fu Li and Suh-Li, “Mining frequent item sets over data streams using efficient window sliding
technique”, Elsevier publication, 2009.
[9] Yo unghee Kim, Won Young Kim and Ungmo Kim, “Mining frequent item sets with normalized
weight in continuous data streams”, Journal of information processing systems, 2010.
[10] Chowdary Farha Ahmed and Byeong Soo Jeong, “Efficient mining of high utility patterns over data
streams with a sliding window model”, Springerlink.com, 2011.
[11] Jing Guo, Peng Zhang, Jianlong Tan and Li Guo, “Mining frequent patterns across multiple data
streams”, Proceedings of the 20th ACM international conference on Information and knowledge
management, pp 2325-2328, 2011.
[12] Anushree Gowtham Ringe, Deeksha Sood and Turga Toshniwa,”Compression and privacy
preservation of data streams using moments”, Information journal of machine learning and
computing, 2011.
[13] Fan Guidan and Yin Shaohong, "A Frequent Itemsets Mining Algorithm Based on Matrix in Sliding
Window over Data Streams", Intelligent System Design and Engineering Applications (ISDEA),
Third International Conference, pp 66-69, 2013.
[14] R. Agrawal, A. Arning, T. Bollinger, M. Mehta, J. Shafer and R. Srikant, “The Quest data mining
system”, Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and
Data Mining, 1996.
[15] V. Sidda Reddy, Dr T.V.Rao and Dr A.Govardhan, "TIFIM: Tree based Incremental Frequent Itemset
Mining over Streaming Data", International Journal Of Computers & Technology Vol 10, No 5,
Council for Innovative Research, www.cirworld.com, 2013.
[16] www.borgelt.net/slides/fpm.pdf.

Contenu connexe

Tendances

Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...IOSRjournaljce
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESIJCSES Journal
 
Managing and implementing the data mining process using a truly stepwise appr...
Managing and implementing the data mining process using a truly stepwise appr...Managing and implementing the data mining process using a truly stepwise appr...
Managing and implementing the data mining process using a truly stepwise appr...Shanmugaraj Ramaiah
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
 
A Novel Data mining Technique to Discover Patterns from Huge Text Corpus
A Novel Data mining Technique to Discover Patterns from Huge  Text CorpusA Novel Data mining Technique to Discover Patterns from Huge  Text Corpus
A Novel Data mining Technique to Discover Patterns from Huge Text CorpusIJMER
 
Effective data mining for proper
Effective data mining for properEffective data mining for proper
Effective data mining for properIJDKP
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools forIJDKP
 
An efficient algorithm for sequence generation in data mining
An efficient algorithm for sequence generation in data miningAn efficient algorithm for sequence generation in data mining
An efficient algorithm for sequence generation in data miningijcisjournal
 
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEA CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEIJDKP
 
1 Introduction to-data-mining lecture
1   Introduction to-data-mining lecture1   Introduction to-data-mining lecture
1 Introduction to-data-mining lectureMahmoud Alfarra
 
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...IOSR Journals
 
New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
 
A unified approach for spatial data query
A unified approach for spatial data queryA unified approach for spatial data query
A unified approach for spatial data queryIJDKP
 
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MININGPATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MININGIJDKP
 

Tendances (20)

Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...
 
I1802055259
I1802055259I1802055259
I1802055259
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
 
Aa31163168
Aa31163168Aa31163168
Aa31163168
 
Managing and implementing the data mining process using a truly stepwise appr...
Managing and implementing the data mining process using a truly stepwise appr...Managing and implementing the data mining process using a truly stepwise appr...
Managing and implementing the data mining process using a truly stepwise appr...
 
Certain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic DataminingCertain Investigation on Dynamic Clustering in Dynamic Datamining
Certain Investigation on Dynamic Clustering in Dynamic Datamining
 
A Novel Data mining Technique to Discover Patterns from Huge Text Corpus
A Novel Data mining Technique to Discover Patterns from Huge  Text CorpusA Novel Data mining Technique to Discover Patterns from Huge  Text Corpus
A Novel Data mining Technique to Discover Patterns from Huge Text Corpus
 
Effective data mining for proper
Effective data mining for properEffective data mining for proper
Effective data mining for proper
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools for
 
An efficient algorithm for sequence generation in data mining
An efficient algorithm for sequence generation in data miningAn efficient algorithm for sequence generation in data mining
An efficient algorithm for sequence generation in data mining
 
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEA CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSE
 
1105.1950
1105.19501105.1950
1105.1950
 
1 Introduction to-data-mining lecture
1   Introduction to-data-mining lecture1   Introduction to-data-mining lecture
1 Introduction to-data-mining lecture
 
50120140503013
5012014050301350120140503013
50120140503013
 
Z36149154
Z36149154Z36149154
Z36149154
 
IJCSIT
IJCSITIJCSIT
IJCSIT
 
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...
Analysis of Pattern Transformation Algorithms for Sensitive Knowledge Protect...
 
New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...
 
A unified approach for spatial data query
A unified approach for spatial data queryA unified approach for spatial data query
A unified approach for spatial data query
 
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MININGPATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
PATTERN GENERATION FOR COMPLEX DATA USING HYBRID MINING
 

En vedette

A new clutering approach for anomaly intrusion detection
A new clutering approach for anomaly intrusion detectionA new clutering approach for anomaly intrusion detection
A new clutering approach for anomaly intrusion detectionIJDKP
 
Design and implementation for
Design and implementation forDesign and implementation for
Design and implementation forIJDKP
 
Prediction of quality features in iberian ham by applying data mining on data...
Prediction of quality features in iberian ham by applying data mining on data...Prediction of quality features in iberian ham by applying data mining on data...
Prediction of quality features in iberian ham by applying data mining on data...IJDKP
 
Spatio textual similarity join
Spatio textual similarity joinSpatio textual similarity join
Spatio textual similarity joinIJDKP
 
FINDING FREQUENT SUBPATHS IN A GRAPH
FINDING FREQUENT SUBPATHS IN A GRAPHFINDING FREQUENT SUBPATHS IN A GRAPH
FINDING FREQUENT SUBPATHS IN A GRAPHIJDKP
 
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHTEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHIJDKP
 
An efficient algorithm for privacy
An efficient algorithm for privacyAn efficient algorithm for privacy
An efficient algorithm for privacyIJDKP
 
Accurate time series classification using shapelets
Accurate time series classification using shapeletsAccurate time series classification using shapelets
Accurate time series classification using shapeletsIJDKP
 
Emotion detection from text documents
Emotion detection from text documentsEmotion detection from text documents
Emotion detection from text documentsIJDKP
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique ofIJDKP
 
Comparison between riss and dcharm for mining gene expression data
Comparison between riss and dcharm for mining gene expression dataComparison between riss and dcharm for mining gene expression data
Comparison between riss and dcharm for mining gene expression dataIJDKP
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...IJDKP
 
Recommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceRecommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceIJDKP
 
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATIONEFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATIONIJDKP
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentIJDKP
 
A data mining approach to predict
A data mining approach to predictA data mining approach to predict
A data mining approach to predictIJDKP
 
The Innovator’s Journey: Asset Owners Insights
The Innovator’s Journey: Asset Owners Insights The Innovator’s Journey: Asset Owners Insights
The Innovator’s Journey: Asset Owners Insights State Street
 
Study on body fat density prediction
Study on body fat density predictionStudy on body fat density prediction
Study on body fat density predictionIJDKP
 
Relative parameter quantification in data
Relative parameter quantification in dataRelative parameter quantification in data
Relative parameter quantification in dataIJDKP
 

En vedette (20)

A new clutering approach for anomaly intrusion detection
A new clutering approach for anomaly intrusion detectionA new clutering approach for anomaly intrusion detection
A new clutering approach for anomaly intrusion detection
 
Design and implementation for
Design and implementation forDesign and implementation for
Design and implementation for
 
Prediction of quality features in iberian ham by applying data mining on data...
Prediction of quality features in iberian ham by applying data mining on data...Prediction of quality features in iberian ham by applying data mining on data...
Prediction of quality features in iberian ham by applying data mining on data...
 
Spatio textual similarity join
Spatio textual similarity joinSpatio textual similarity join
Spatio textual similarity join
 
FINDING FREQUENT SUBPATHS IN A GRAPH
FINDING FREQUENT SUBPATHS IN A GRAPHFINDING FREQUENT SUBPATHS IN A GRAPH
FINDING FREQUENT SUBPATHS IN A GRAPH
 
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHTEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
 
An efficient algorithm for privacy
An efficient algorithm for privacyAn efficient algorithm for privacy
An efficient algorithm for privacy
 
Accurate time series classification using shapelets
Accurate time series classification using shapeletsAccurate time series classification using shapelets
Accurate time series classification using shapelets
 
Emotion detection from text documents
Emotion detection from text documentsEmotion detection from text documents
Emotion detection from text documents
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique of
 
Comparison between riss and dcharm for mining gene expression data
Comparison between riss and dcharm for mining gene expression dataComparison between riss and dcharm for mining gene expression data
Comparison between riss and dcharm for mining gene expression data
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
 
Recommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceRecommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduce
 
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATIONEFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
EFFECTIVE ARABIC STEMMER BASED HYBRID APPROACH FOR ARABIC TEXT CATEGORIZATION
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environment
 
A data mining approach to predict
A data mining approach to predictA data mining approach to predict
A data mining approach to predict
 
The Innovator’s Journey: Asset Owners Insights
The Innovator’s Journey: Asset Owners Insights The Innovator’s Journey: Asset Owners Insights
The Innovator’s Journey: Asset Owners Insights
 
Study on body fat density prediction
Study on body fat density predictionStudy on body fat density prediction
Study on body fat density prediction
 
Relative parameter quantification in data
Relative parameter quantification in dataRelative parameter quantification in data
Relative parameter quantification in data
 

Similaire à Mining frequent itemsets (mfi) over

IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...IJDKP
 
A New Data Stream Mining Algorithm for Interestingness-rich Association Rules
A New Data Stream Mining Algorithm for Interestingness-rich Association RulesA New Data Stream Mining Algorithm for Interestingness-rich Association Rules
A New Data Stream Mining Algorithm for Interestingness-rich Association RulesVenu Madhav
 
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...ijitcs
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesREVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesEditor IJMTER
 
IRJET- AC Duct Monitoring and Cleaning Vehicle for Train Coaches
IRJET- AC Duct Monitoring and Cleaning Vehicle for Train CoachesIRJET- AC Duct Monitoring and Cleaning Vehicle for Train Coaches
IRJET- AC Duct Monitoring and Cleaning Vehicle for Train CoachesIRJET Journal
 
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...IRJET Journal
 
Mining Stream Data using k-Means clustering Algorithm
Mining Stream Data using k-Means clustering AlgorithmMining Stream Data using k-Means clustering Algorithm
Mining Stream Data using k-Means clustering AlgorithmManishankar Medi
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...IJDKP
 
Mining high utility itemsets in data streams based on the weighted sliding wi...
Mining high utility itemsets in data streams based on the weighted sliding wi...Mining high utility itemsets in data streams based on the weighted sliding wi...
Mining high utility itemsets in data streams based on the weighted sliding wi...IJDKP
 
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...Editor IJMTER
 
A Survey Report on High Utility Itemset Mining for Frequent Pattern Mining
A Survey Report on High Utility Itemset Mining for Frequent Pattern MiningA Survey Report on High Utility Itemset Mining for Frequent Pattern Mining
A Survey Report on High Utility Itemset Mining for Frequent Pattern MiningIJSRD
 
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set MiningAn Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set Miningijsrd.com
 
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANK
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANKPATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANK
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANKIJDKP
 
Db2425082511
Db2425082511Db2425082511
Db2425082511IJMER
 
Survey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data MiningSurvey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data Miningijcsit
 

Similaire à Mining frequent itemsets (mfi) over (20)

IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
 
A New Data Stream Mining Algorithm for Interestingness-rich Association Rules
A New Data Stream Mining Algorithm for Interestingness-rich Association RulesA New Data Stream Mining Algorithm for Interestingness-rich Association Rules
A New Data Stream Mining Algorithm for Interestingness-rich Association Rules
 
386 390
386 390386 390
386 390
 
386 390
386 390386 390
386 390
 
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...
Mining Maximum Frequent Item Sets Over Data Streams Using Transaction Sliding...
 
REVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining TechniquesREVIEW: Frequent Pattern Mining Techniques
REVIEW: Frequent Pattern Mining Techniques
 
IRJET- AC Duct Monitoring and Cleaning Vehicle for Train Coaches
IRJET- AC Duct Monitoring and Cleaning Vehicle for Train CoachesIRJET- AC Duct Monitoring and Cleaning Vehicle for Train Coaches
IRJET- AC Duct Monitoring and Cleaning Vehicle for Train Coaches
 
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
 
Mining Stream Data using k-Means clustering Algorithm
Mining Stream Data using k-Means clustering AlgorithmMining Stream Data using k-Means clustering Algorithm
Mining Stream Data using k-Means clustering Algorithm
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Drsp dimension reduction for similarity matching and pruning of time series ...
Drsp  dimension reduction for similarity matching and pruning of time series ...Drsp  dimension reduction for similarity matching and pruning of time series ...
Drsp dimension reduction for similarity matching and pruning of time series ...
 
Mining high utility itemsets in data streams based on the weighted sliding wi...
Mining high utility itemsets in data streams based on the weighted sliding wi...Mining high utility itemsets in data streams based on the weighted sliding wi...
Mining high utility itemsets in data streams based on the weighted sliding wi...
 
Fp3111131118
Fp3111131118Fp3111131118
Fp3111131118
 
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
 
A Survey Report on High Utility Itemset Mining for Frequent Pattern Mining
A Survey Report on High Utility Itemset Mining for Frequent Pattern MiningA Survey Report on High Utility Itemset Mining for Frequent Pattern Mining
A Survey Report on High Utility Itemset Mining for Frequent Pattern Mining
 
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set MiningAn Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
 
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANK
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANKPATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANK
PATTERN DISCOVERY FOR MULTIPLE DATA SOURCES BASED ON ITEM RANK
 
Db2425082511
Db2425082511Db2425082511
Db2425082511
 
Survey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data MiningSurvey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data Mining
 

Dernier

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Dernier (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Mining frequent itemsets (mfi) over

  • 1. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014 DOI : 10.5121/ijdkp.2014.4402 17 MINING FREQUENT ITEMSETS (MFI) OVER DATA STREAMS : VARIABLE WINDOW SIZE (VWS) BY CONTEXT VARIATION ANALYSIS (CVA) OF THE STREAMING TRANSACTIONS V.Sidda Reddy1 , T.V.Rao2 and A.Govardhan3 1 Department of Computer and Engineering, Sagar Institute of Technology, Chevella, Hyderabad, T.S, India 2 Departments of Computer and Engineering, PVP Siddhartha Institute of Technology, Vijayawada, A.P, India 3 School of Information Technology, Jawaharlal Nehru Technological University, Hyderabad, T.S, India ABSTRACT The challenges with respect to mining frequent items over data streaming engaging variable window size and low memory space are addressed in this research paper. To check the varying point of context change in streaming transaction we have developed a window structure which will be in two levels and supports in fixing the window size instantly and controls the heterogeneities and assures homogeneities among transactions added to the window. To minimize the memory utilization, computational cost and improve the process scalability, this design will allow fixing the coverage or support at window level. Here in this document, an incremental mining of frequent item-sets from the window and a context variation analysis approach are being introduced. The complete technology that we are presenting in this document is named as Mining Frequent Item-sets using Variable Window Size fixed by Context Variation Analysis (MFI-VWS- CVA). There are clear boundaries among frequent and infrequent item-sets in specific item-sets. In this design we have used window size change to represent the conceptual drift in an information stream. As it were, whenever there is a problem in setting window size effectively the item-set will be infrequent. The experiments that we have executed and documented proved that the algorithm that we have designed is much efficient than that of existing. KEYWORDS Data Mining, Data Streams, Frequent Itemset, Frequent Itemset Mining, Data Stream Mining, Variable Window, Sliding Window 1. INTRODUCTION Association rule is an important and well researched method for finding frequent itemsets (patterns) among set of objects in large database [1]. Frequent itemsets indicates the closeness among the items that are available in datasets. There are very good range of applications that includes creation of association rules for market-basket analysis, in text mining grading and clustering of documents, text, and pages, web mining, that allows users to disclose hidden patterns and relationships in huge datasets. The time taken to produce the data is outstripping the speed of its mining in the present upcoming applications where the information is in the form of an enormous and continuous stream [2]. To traditional static databases this is in quiet opposite, so data stream mining is considerably different from traditional data mining with respect good number of aspects. Primarily, the amount of data embedded in its lifetime in a data stream could
  • 2. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014 18 be overwhelmingly high [3]. Along with the above it is required to create spontaneous responses by maintaining response time to queries on such information streams because of rigid pitfalls of resource [4]. The data stream mining is the area where very high level of research is taking place because of the above mentioned reasons and contemporary research area is the challenge of getting timely and appropriate association rules. Migrating from conventional data mining methods to the emerging high efficient methods those will accommodate to function on an open- ended, high speed stream of data [5]. The challenges that are inevitable to all mining technologies because of the unique and inherent qualities of a data stream are as follows [3]. Primarily the conventional technology is not applicable as it is required to create model in this method database need to be scanned multiple number of times which is not possible because of the continuous quality of the stream data. The high priority was given to the scalability of the frequent itemsets mining in approaches like association rules, classification and clustering. This requires high level of data transformation from the representation to other as the algorithm is designed in such a way, this in turn will use resources extensively resulting in high CPU overhead. Our proposal in this paper is a model known as Mining Frequent itemsets using Variable Window Size fixed by Context Variation Analysis (MFI-VWS-CVA). This model is the resource effective and measurable, as it functions with limited memory requirements and limited computational expenses. It portrays the trade-offs between computation, data representation, I/O and heuristics. Context variation analysis based dynamic window based transaction storage is being used in the proposed algorithm and also allows TIFIM [15] to regular itemsets from the concluded window. 2. RELATED WORK CPS-tree a prefix-tree structure was proposed by Syed Khairuzzaman Tanbeer [6]. Dynamic tree restructuring technique was used in the CPS tree in order to manage the stream data. The basic pitfall of this model is, it reconstructs the tree for every new arrival of the item. This used to result in high memory usage and a time consuming process. Weighted Sliding Window (WSW) algorithm was initiated by Pauray S.M. Tsai [7]. In this method weight of each transaction in each window will be calculated by the algorithm proposed. Here also same memory space and time are the major concerns; this method failed using these two economically. An apriori algorithm is being for candidate generation. Hue-Fu Li [8] introduced an effective bit-sequence dependent algorithm named as MFI-TransSW (Mining Frequent Item sets with in a Transaction Sensitive Sliding window). There as three phases in MFI algorithm. As there is any increase in the window size, there will be a subsequent increase in the memory usage of MFI-TransSW. Similarly whenever there is an increase in window size, the time consumption in phase 1 and phase 2 of MFI-TransSW to process will also increase. Yo unghee Kim [9, 16] initiated an effective algorithm with normalized weight over data stream called WSFI mine (Weighted Support Frequent Item sets mining). From the database in one scan this WSFI-mine algorithm can mine all frequent item sets. HUPMS (High Utility Pattern Mining in Stream data) was recommended by Chowdhury Farhan Ahmed [10]. This is a different algorithm for sliding window based high utility pattern mining over data stream. For interactive mining only this algorithm is suitable. In the paper that was presented by Jing Guo [11], they have discussed about how to mine regular patterns across multiple data streams. Here for analysis they have considered real time news paper data. In multiple streams it is vital to identify collaborative frequent patterns and comparative frequent patterns. Prevention of misuse of sensitive data in a stream was addressed by Anushree Gowtham Ringe [12] in their work. They initiate a new technique for guarding the privacy of data stream. Fan Guidan [13] proposed a model which is conceptually similar to our model. Matrix in sliding window over data streams plays major role in this model. In order to store, this algorithms uses two 0-1 matrices and 2-itemsets, further applies relative operation of the two matrices to extract frequent itemsets.
  • 3. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014 19 3. MINING FREQUENT ITEMSETS OVER DATA STREAMS WITH CONTEXT AWARE VARIABLE SIZE WINDOWS If streaming data is input to mining strategies such as frequent itemsets mining, the traditional approaches are not suitable, since those are mainly works by the multiple passes through entire dataset. Henceforth the mining strategies opted for streaming data considers the tuples of the streaming transactions as windows and these windows are used as input to the mining algorithms. The significant issue here in this model is fixing the window size. In the case of data streams with transitional and temporal state transactions, the transitional and temporal state identifications can be used to fix the window size. In the other cases that are not having any transitional and temporal state identities for streaming transactions, fixing window size is a big constraint to achieve quality factors such as results accuracy, process scalability. In this regard here we propose a novel context variation based dynamic window size fixing approach to mine frequent itemsets over data streams. The proposed frequent itemsets mining strategy is centric to following qualities targeted. • The window size should be optimal and dynamic. • The window size should fix dynamically between minimal and maximal size given as thresholds. • The size of the window should be within the range of minimal and maximal size and should fix based on the context variation observed in input transaction from the data stream. In regard to implementing the proposed model, the only significant constraint related to the streaming data is that the context change of the transactions should be in an order. The minimal memory utilization and less computational cost are two main quality metrics expected from this proposal. The exploration of the proposed window fixing strategy is follows: Let ds be the DataStream, and stream transactions as horizontal partitions of the transactions, let each partition having one transaction. Let n be the total count of the attributes used to form the transactions by ds . Let seta be the attributes set that contains attributes 1 2 1{ , ,....... , ,...... }i i na a a a a+ , which are used to form the transactions. Let 1 2 3 1 ( 1){ , , ...., , ,.... , ,......}i i i m i mt t t t t t t+ + + + be the transactions streaming in the same sequence. Let minws be the minimum window size and maxws be the maximal window size. Let tranw be the transaction window and ccaw be the context change analysis window. Let ( )ccas w be the size of ccaw . The initial values to min max,ws ws and ( )ccas w will be set during the pre-processing step. The transactions of count minws from the given data stream ds will be moved initially to tranw , then following transactions of count ( )ccas w will be moved to ccaw . Then context variation analysis (CVA) process will be initialized. The exploration of the CVA process is follows: The attributes involved to generate the transactions moved into tranw will be collected as ( )tranal w , and attributes involved to form the transactions found in ccaw will also be collected as ( )ccaal w . Then the similarity score of these two attribute lists ( ), ( )tran ccaal w al w will be found as follows (Eq1), which is derived from jaccard similarity measuring approach. ( ) ( ) ( ) ( ) ( )trans cca tran cca w w trans cca al w al w ss al w al w ↔ = I U … (Eq1) If similarity score ( )tran ccaw wss ↔ is greater than the given similarity score threshold ssτ then the transactions of ccaw will be moved to tranw (see Eq2). tran tran ccaw w w= U … (Eq2)
  • 4. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014 20 If size of the tranw is greater than or equals to maxws then the tranw will be finalized and initiates process of mining frequent itemsets from the transactions of tranw , else the further streaming transactions of size ( )ccas w will moved to ccaw and continues CVA process. Once the tranw is finalized and mining of frequent itemsets is initiated, then tranw and ccaw will be cleared and continues the process explored to prepare the window tranw will be continued for further transactions streaming from data stream ds . The above said process continues till transactions found from data stream ds . The mining frequent itemsets from the finalized window will be done by using TIFIM [15], which is a tree based incremental frequent itemsets mining approach that we devised in our earlier research paper (see sec 3.3). 3.1 Algorithmic exploration of the Fixing Variable Window Size by Context Variation Analysis Inputs: • Data stream ds • Minimal transaction window size minws • Maximal transaction window size maxws • Similarity score threshold ssτ • Size of the context change analysis window ( )ccas w 1. Begin 2. For each transaction { }t t ds∀ ∈ Begin 3. If min(| | )tranw ws< tranw t← 4. Else Begin 5. ccaw t← 6. If (| | ( ))cca ccaw s w≥ 7. ( , )tran ccass CVA w w← 8. if( )ss ssτ≥ Begin 9. ( )tran tran ccaw w w← U 10. If max(| | )tranw ws≥ Begin 11. Finalize window tranw 12. Initiate ( )tranTIFIM w 13. set tranw φ← // empty tranw 14. set ccaw φ← //empty ccaw 15. End of 10 16. End of 8 17. Else Begin 18. Finalize window tranw 19. Initiate ( )tranTIFIM w 20. set tranw φ← // empty tranw 21. set tran ccaw w← //move transactions of window ccaw to new window tranw
  • 5. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014 21 22. set ccaw φ← //empty ccaw 23. End of 17 24. End of 4 25. End of 2 26. End of 1 3.2 Algorithmic exploration of the Context Variation Analysis 1. ( , )tran ccaCVA w w Begin 2. Set tranfs φ← // initiate field set tranfs of transaction window tranw empty 3. Foreach transaction { }trant t w∀ ∈ Begin 4. tran tranfs fs t← U 5. End of 3 6. Set ccafs φ← // initiate field set ccafs of transaction window ccaw empty 7. Foreach transaction { }ccat t w∀ ∈ Begin 8. cca ccafs fs t← U 9. End of 7 10. tran cca tran cca fs fs ss fs fs = I U //measuring similarity score of tranw and ccaw 11. Return ss 12. End of 1 3.3 Tree (Bush) Based Incremental Frequent Itemsets Mining (TIFIM) 3.3.1 Finding Frequent Itemsets The primary representation of the transactions of data stream ds is as described above. An asynchronous parallel process runs to find frequent itemsets in incremental manner. A bush represents itemsets with two attribute pair such that these attributes belongs to tranfs and transactions contain that pair. The coverage to measure the frequency of the itemsets can be considered and set in the context of window tranw size. The coverage of two attribute itemsets can be the count of number of Childs in a bush represented by each pair of attributes. An asynchronous parallel process called frequent itemsets finder (FIF) performs as follow: Initially picks the bushes with coverage more than given coverage thresholdcov . Prepare new bushes from each two bushes by union the roots and intersects the Childs, and retains it if new bush coverage is greater or equal to cov else discards. This continues until no new bush formed. 3.3.2 The pruning process A bush ib said to be sub-bush to bush jb if i jb br r⊆ and ( ) ( )cov covi jb b≤ . Since sub-bush ib represented by jb , then bush ib can be pruned from the bush-set B . 3.4 Find frequent items At an event of time, frequent itemsets can be found as follows The roots of the bushes with coverage more than given cov can be claimed as frequent itemsets. A bush ‘ ib ’ coverage can be find as follows
  • 6. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014 22 If a bush jb found to be such that i jb b⊆ and coverage value of jb is higher than any other bush kb such that i kb b⊆ , then the coverage of ib said to be ( ) ( )cov covj ib b+ . 3.4.1 An algorithmic representation of the caching processes Input: At an event of time a window iw with transaction it received For each transaction it : Let set of attributes 1 2 3 1 2 3 1 2 3 {( , , ,......... ) ( , , ,......... ) ( , , ,......... ) } i i i i set a a a a a a a a t a a a a A ∀ ∈ ∧ ⊆ For each pair of attributes{( , ) ( , ) }m n m n ia a a a t∀ ∈ , if found a bush { ( , ) }i m nb a a as root∃ then add transaction it as node to bush ib , else prepare a bush such that { ( , ) t }i m n ib a a as roo t as node∃ ∧ 3.4.2 An algorithmic approach of FIF The bush set B prepared by caching process is said to be input to FIF For each bush { }i ib b B∀ ∈ perform the following: For each bush ( : 1,2,3....... ) )i c i cb c n b B+ +∃ = ∧ ∈ Forms a bush ( ) ( ){ }i i c i i cb b B+ +∃ ∉U U by Union the roots of the ‘ ib ’and ‘ i cb+ ’ ( ( ) ( )i i cb br r + U ) and intersects nodes of ib and i cb+ ( ( ) ( )i i cb bts ts + I ). 4. EMPIRICAL ANALYSIS OF THE FIXING VARIABLE WINDOW SIZE BY CONTEXT VARIATION ANALYSIS 4.1 Dataset characteristics Multiple sets of data streamed to perform the experiments, and the characteristics of these streaming data are as follows: • To achieve the sparseness in streaming transactions, the range of fields considered as 75,100, 125 and 150, the max transaction length set in the range of 12 to 18, the min transaction range set to 5 and the total number of transactions has taken in the range of 1000 to 10000. • To achieve the denseness in streaming transactions, the range of fields considered as 20, 30, 40 and 50, the max transaction length set in the range of 10 to 15, the min transaction range set to 5 and the total number of transactions has taken in the range of 1000 to 10000. 4.2 Experimental results We compare our algorithm with frequent itemsets mining model for data streams devised in [13], which is a matrix based frequent itemsets mining (MFIM) algorithm for data streams. The implementation of our MFI-VWS-CVA and model MFIM done by using java 7 and set of flat files as streaming data sources. The streaming environment is emulated using java RMI and parallel process involved in proposed MFI-VWS-CVA is achieved by using java multi threading concept. The three parameters of each synthetic dataset are the total number of transactions, the average length, and divergence count of items, respectively. Each transaction of a dataset is scanned only once in our experiments to simulate the environment of data streams. In regard to
  • 7. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014 23 measure the computational cost and scalability, the algorithms run under divergent coverage values in the range of 10% to 90%. Figure 1: NFI-VWS-CVA advantage over MFIM in Memory usage. Figure 2: MFI-VWS-CVA advantage over MFIM in terms of execution time. Figure 1 and 2 shows the comparison of the Memory usage, execution time under divergent coverage values range given from 10% to 40% respectively. The Figure 3 explores the scalability of MFI-VWS-CVA over MFIM under divergent streaming data sizes respectively, In Figure 1 and 2, the horizontal axis is the coverage given and the vertical axis is the memory in unit of mega bytes and time in unit of seconds respectively. In Figure 3, the horizontal axis is the
  • 8. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014 24 streaming data size given in unit of transactions and the vertical axis is the execution time in unit of seconds and percentage of elapsed time in unit of seconds respectively. As the coverage value decreases the average increment in memory usage for matrix based FIM and MFI-VWS-CVA are 2.29 and 0.7 respectively (see Figure 1) and average execution time increment for matrix based FIM and MFI-VWS-CVA are 83.2 and 27.9 respectively (see Figure 2). The results obtained here clearly indicating that the performance of MFI-VWS-CVA is miles ahead than the matrix based FIM. The performance of MFI-VWS-CVA is scalable as matrix based FIM is taking average of 14.16% elapsed time under uniform increment of streaming data size with 1500 transactions. Figure 3: MFI-VWS-CVA advantage over MFIM about Scalability under divergent streaming data size. 5. CONCLUSION We explored a novel approach for mining the frequent itemsets from a data stream. We have implemented an efficient tree based incremental frequent itemsets mining model [15] in our earlier research paper, further here we developed an approach for mining frequent itemsets using variable window size by context variation analysis (MFI-VWS-CVA) over data streams. Due to the factor of fixing window size dynamically by concept variation analysis, the said model is identified as optimal and scalable. A parallel process that determines frequent itemsets from the concept of cached bush structures, which is our earlier proposal [15], performs frequent itemsets mining over data streams. We extended this incremental frequent itemset mining algorithm by introducing windowing the streaming transaction with variable window size technique in regard to achieve efficient memory usage and execution time. The experiment results confirm that the MFI-VWS-CVA is scalable under divergent streaming data size and coverage values. In future this model can be extended to perform utility based frequent itemset mining over data streams.
  • 9. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.4, July 2014 25 ACKNOWLEDGEMENT The authors would like to thank the anonymous reviewers for their valuable comments. We also thank the authors of all references for helping us setup the paper. REFERENCES [1] V.Sidda Reddy, M.Narendra and K.Helini, “Knowledge Discovery from Static Datasets to Evolving Data Streams and Challenges”, International Journal of Computer Applications, Volume 87, No.15, pp 22-25, 2014. [2] Moses Charikar, Kevin Chen and Martin Farach-Colton, “Finding frequent items in data streams”, Journal of Theoretical Computer Science – Special issue on automata, languages and programming, pp 3-15, 2004. [3] Mohamed Medhat Gaber, Arkady Zaslavsky and Shonali Krishnaswamy, “Mining data streams: a review”, ACM SIGMOD Record, Volume 34, Issue 2, pp 18-26, 2005. [4] Nan Jiang and Le Gruenwald., “CFI-stream: Mining closed frequent itemsets in data streams”, Proceedings of the 12th ACM SIGKDD international conference on KDD, USA, 2006. [5] Gurmeet Singh Manku and Rajeev Motvani, “Approximate frequency counts over data streams”, Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China, 2002. [6] Syed Khairuzzaman Tabeer, Chowdary Farha Ahmed, Byeong-Soo Jeong, and Young Koo Lee, ”Efficient frequent pattern mining over data streams”, In proceeding of: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp 22-30, 2008. [7] Pauray S.M.Tsai, “Mining frequent item sets in data streams using the weighted sliding window model”, Elsevier publication, 2009. [8] Hue-Fu Li and Suh-Li, “Mining frequent item sets over data streams using efficient window sliding technique”, Elsevier publication, 2009. [9] Yo unghee Kim, Won Young Kim and Ungmo Kim, “Mining frequent item sets with normalized weight in continuous data streams”, Journal of information processing systems, 2010. [10] Chowdary Farha Ahmed and Byeong Soo Jeong, “Efficient mining of high utility patterns over data streams with a sliding window model”, Springerlink.com, 2011. [11] Jing Guo, Peng Zhang, Jianlong Tan and Li Guo, “Mining frequent patterns across multiple data streams”, Proceedings of the 20th ACM international conference on Information and knowledge management, pp 2325-2328, 2011. [12] Anushree Gowtham Ringe, Deeksha Sood and Turga Toshniwa,”Compression and privacy preservation of data streams using moments”, Information journal of machine learning and computing, 2011. [13] Fan Guidan and Yin Shaohong, "A Frequent Itemsets Mining Algorithm Based on Matrix in Sliding Window over Data Streams", Intelligent System Design and Engineering Applications (ISDEA), Third International Conference, pp 66-69, 2013. [14] R. Agrawal, A. Arning, T. Bollinger, M. Mehta, J. Shafer and R. Srikant, “The Quest data mining system”, Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining, 1996. [15] V. Sidda Reddy, Dr T.V.Rao and Dr A.Govardhan, "TIFIM: Tree based Incremental Frequent Itemset Mining over Streaming Data", International Journal Of Computers & Technology Vol 10, No 5, Council for Innovative Research, www.cirworld.com, 2013. [16] www.borgelt.net/slides/fpm.pdf.