SlideShare une entreprise Scribd logo
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME
275
AN EXTENSIVE LITERATURE SURVEY ON
COMPREHENSIVE RESEARCH ACTIVITIES OF WEB
USAGE MINING
Dr. V.V.R. Maheswara Rao1
, Dr. V. Valli Kumari 2
1
(Professor, Department of CSE, Shri Vishnu Engg. College for Women, AP, INDIA)
2
(Professor, Department of CS & SE, Andhra University, AP, INDIA)
ABSTRACT
The increased usage of World Wide Web (WWW) becomes a vast data repository related to
the users’ interaction with the websites which is unstructured, unlabeled, high redundant and less
reliable recorded in weblog. In addition, the existence of non linearity, incompleteness,
heterogeneous and transient nature of data makes the weblog further complex. This situation creates
inevitably increasing challenges in extracting desired knowledge from web log. Web usage mining is
a methodology that blends traditional mining techniques with sophisticated algorithms to capture,
model and analyze the behavioral patterns from weblog. The knowledge derived from such patterns
creates a great value addition to any organization as they make use in decision making process.
Thus, it is necessary to empower the web usage mining techniques that are aptly compatible
to incremental nature of weblog. These techniques promote the prerequisite of applying the new
approach at all stages of web usage mining comprehensively, to completely investigate the web
usage mining effectively. To design and develop comprehensive model for investigating the web
usage mining, the authors in the present paper conducts an extensive literature survey on various
research activities in the era of web usage mining.
Keywords: Web Mining, Pre-Processing, Storage Models, Pattern Discovery, Optimization
Techniques, Pattern Analysis, Knowledge Representation.
1. INTRODUCTION
The unexpected wide spread use of WWW and dynamically increasing nature of the web
creates new challenges in the web mining since the data in the web inherently unlabelled,
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 5, Issue 12, December (2014), pp. 275-289
© IAEME: www.iaeme.com/IJCET.asp
Journal Impact Factor (2014): 8.5328 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME
276
incomplete, and heterogeneous. In addition, it turned into a golden mount containing extremely
dynamic and interrelated data for web miners to perform web mining.
1.1. Web Mining
Web Mining is the application of data mining techniques [5] to discover and retrieve useful
information and patterns from the WWW documents and services. The web mining can be used to
discover hidden patterns and relationships within the web data. The web mining task can be divided
into three general types, known as Web Content Mining (WCM), Web Structure Mining (WSM) and
Web Usage Mining (WUM) as shown in figure 1.
Figure 1: Types of web mining
Web content mining is a mining technique which can extract the knowledge from the content
published on internet [8], usually as semi-structured (HTML), unstructured (Plain text) and
structured (XML) documents. The content of a web page include, like text, images, HTML, tables or
forms. The ability to conduct web content mining allows results of search engines to maximize the
flow of customer clicks to a website, or particular web pages of the site to be accessed numerous
times in relevance to search queries. The main uses for this type of web mining are to gather,
categorize, organize and provide the best possible information available on web.
Web structure mining is a mining technique which can extract the knowledge from the
WWW and hyperlinks between references in the web. Mining the structure [4] of the web involves
extracting knowledge from the interconnections of the hypertext documents in the WWW. This
results in discovery of web communities, and also pages that are authoritative. Moreover, the nature
of relationship between neighboring pages can be discovered by structure mining. The main purpose
for structure mining is to extract interesting relationships between web pages.
Web usage mining, also known as weblog mining, is the process of automatic discovery and
investigation of patterns in click streams and associated data collected or generated as a result of user
interactions with web resources on websites. The main goal of web usage mining is to capture, model
and analyze the behavioral patterns [3] and profiles of users interacting with websites. The
discovered patterns are usually represented as collection of pages, objects or resources that are
frequently accessed by groups of users with common needs or interests [4]. The primary data
resources used in web usage mining are log files generated by web and application servers. In
addition, it provides detailed information on web user behavior that can be useful for detecting
intrusion and fraud.
1.2 Web Usage Mining
The overall web usage mining process can be divided into mainly three interdependent
stages: Pre-processing, Storage models, Pattern discovery, Optimization techniques & Pattern
analysis as shown in figure 2.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME
277
Figure 2: Stages of web usage mining
Pre-processing is an initial stage [4] in web usage mining for the creation of suitable target
dataset to which mining algorithms can be applied. The inputs for pre-processing stage are the web
server logs, referral logs, registration files and index server logs [7]. In the pre-processing stage, the
click stream data is cleaned and portioned into a set of user transactions representing the activities of
each user during different visits to the sites. However, in order to provide the most suitable data for
further stages of web usage mining, pre-processing is an aspect of data mining whose importance
should not be underestimated.
Web mining applications rely on monolithic storage models which takes the responsibility of
data storage, retrieval and maintenance. The storage model creates a solid platform of consolidated
qualified data for analysis. Basically, the storage models can be divided into two types based on the
nature of their growth, namely static storage models and incremental storage models.
Pattern discovery takes the output generated by pre-processing stage. The goal of pattern
discovery is the stage of learning some general concepts from the pre-processed data. In this phase,
statistical, database and machine learning techniques like classification, clustering and association
rule mining are performed on the extracted information.
The soft optimization techniques are characterized by their ability for granular computation in
avoiding the concept of approximation. Basically, these models provide the foundation for
computational intelligence systems and further outline the basis of future generation computing
systems. These models are close resemblance to human like decision making and used for modeling
highly non linear data, where the pattern discovery, rule generation and learnability are typical.
Pattern analysis is the final stage of usage mining [4] which can extract interested patterns
from the output of pattern discovery. The goal of pattern analysis is the task of understanding,
visualizing, and interpreting the discovered patterns and statistics. The output generated by pattern
analysis is used as input to the various applications such as recommendation engines, visualization
tools and web analytics.
1.3 Web Log Characteristics
Data collection is the primary step in web usage mining process [15] and it is the process of
extracting the task relevant data from diverse web logs. It is very important and difficult task to get
details of web user usage data from weblogs, since they are unstructured, incremental in nature and
rapidly growing. One has to pay attention to collect the data from weblogs which normally includes
web content data, web structure data and web usage data.
Web usage data stores general access logs and user profiles, which consists of general access
patterns and customized access patterns respectively. The web usage mining is a task of applying
data mining techniques to extract useful patterns from such web data through various stages. These
patterns can be used to investigate interesting characteristics of web users.
The data collected using the web servers is richest, but practically it is very difficult to have
web server’s data. The data can be collected using web clients by enabling java scripts, java applets
and modified browsers, yet these methods require user participation to enable their functionalities
[16]. Proxy servers are more suitable and reliable to collect web usage data since they are placed at
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME
278
client location and act as real servers. They capture all requests made by clients to the original server
and store automatically in weblogs and further they improve navigation speed through caching.
The weblog collected at proxy server is unstructured since the information contains different
types of entries. These entries do not have definite number of attributes, identifiable structure and
defined relation. This resulted in ambiguities and it was difficult to understand using computer
programs.
The other characteristics of weblog data are Heterogeneous, Distributed, Different data types,
Dynamic content, Voluminous / Non-Scalable, Time dimension, Incremental in nature and
Exponentially Growing.
2. AN EXTENSIVE AND COMPREHENSIVE LITERATURE SURVEY
Knowledge discovery in the data has been used to analyze the data collected on the web and
extracted useful knowledge. This effort was named as Web Mining by Etzioni in 1996, and from
then onwards the research on web mining got its roots spread by the efforts of Robert Armstrong and
his colleagues. Several approaches [1] have been proposed by many authors in the vein of web
mining to get the work to next level. The rapid growing nature of weblog is a strong endorsement [1]
to drag the high level interest of next generation researchers in the emerging field of web mining.
The efforts in the recent past [7] focused on the issues related to the feasibility, scalability, usability
and efficiency of the web mining techniques.
The survey conducted by various authors [4] and their research contributions identified three
broad categories of web mining, namely web structure mining, web usage mining and web content
mining. The literature survey of web usage mining is as shown in figure 3.
Figure 3: Roadmap of Literature Survey
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME
279
The efforts made by some authors [2, 5] recognized that web usage mining can be utilized for
different tasks namely Personalization, System Improvement, Usage Characterization, Site
Modification and Business Intelligence. Some other authors as shown in [2, 9] have already dealt the
problem of web personalization task that comprises simple functions, advanced functions and
intelligent functions which can perform certain tasks on behalf of user without taking explicit
requests.
The work explored by the authors [5, 6] witnessed that the system improvement emerged as
another focus of web usage mining which concentrates on improvements of techniques to mine the
information and knowledge on the web quickly and effectively. Performance and other service
quality attributes are crucial to user satisfaction from services.
The survey conducted by various authors [1, 3] acknowledged that usage characterization is
an interesting task of web usage mining which focuses on the techniques that predict the user
behavior while the user interacts with the website. Some authors [10] have mostly focused on the
approaches that are utilized in site modification using web usage mining. The site modification is a
crucial issue for many applications in terms of both usage and structure.
The efforts put in by the recent authors [11] have become evident so that emergency e-
services in the web era such as e-commerce, e-learning, e-banking and so on change radically the
usage of internet, turning websites into businesses and thus business intelligence has become one of
the task of web usage mining.
2.1 Pre-Processing Techniques
In the recent past the web usage mining got attention of many researchers [15, 17], yet, the
pre-processing in the knowledge discovery has received less attention than it deserves. Many
researchers [12] are working on pre-processing that involves user identification, session
identification, path completion, transaction identification.
Brodley and Kohavi discussed in 2000, clipping once per session, a single session is
truncated at some point within the session. The implicit unit of analysis here is a fragment of a
session. Martin Arlitt [21] studied numerous user session characteristics including number of
requests per session, number of pages requested per session, session length, and inter session times.
Both of these techniques are time consuming and inaccurate as the web server log captures cache
hits.
During 2001, Berendt, B., B. Mobasher, M. Spiliopoulou, J. Wiltshire [13] provided a formal
frame work that uses sessionizing heuristics which partitions the user activity log into a set of
constructed sessions using time oriented heuristics and navigational oriented heuristics to improve
the accuracy of sessionization.
In 2002, Tan P N and Kumar V [24] explored on web robots that can perform many tasks
automatically and made the web administrators’ task easy. The web robots are used by many
business organizations to collect email addresses, monitor product prices, corporate news and so on.
This emerged into a large proliferation of weblog.
In 2003, M. Spiliopoulou, B. Mobasher, B. Berendt [22] have expressed that the reliability of
web usage mining results depends heavily on the proper preparation of the input datasets. In
particular, errors in the reconstruction of sessions and incomplete tracing of users’ activities in a site
can easily result in invalid patterns.
All the range in 2004, J. Zhang and A. A. Ghorbani [19] described an improved statistical
based time oriented heuristics for the reconstruction of user sessions from a server log. Even though,
some of the results for usage mining in their experiments have shown less performance. They
expressed that combination of time oriented heuristics might be arriving a better performance.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME
280
In 2005, Show-Jane Yen, Yue-Shi Lee [52] applied bit-string database generation technique
as a part of preprocessing to improve the efficiency in finding the interesting association rules.
Moreover, they expressed that the bit-string database generation technique cost extra memory space
in the transformation process.
In the subsequent year 2006, Natheer Khasawneh and Chien-Chung Chan [23] introduced a
fast active user based user identification algorithm and they also presented an ontology based method
that utilizes functionalities to identify sessions.
After that in year 2007, Renata Ivancsy and Sandor Juhasz [26] attempted a novel approach
that uses a complex cookie based method to identify web users. Furthermore, they also explained
steps towards identifying individuals behind impersonal web users. Their approach is demonstrated
by implementing web activity tracking system that aims at a more precise distinction of web users
based on log data..
Then in the year 2008, G T Raju, P S Satyanarayana [18] implemented a complete pre
processing methodology that utilizes several heuristics for cleaning web usage data. This
methodology allows the analyst to merge any collection of weblog into a single file. Further, it
allows the analyzer to analyze jointly on multiple weblogs. Yet, the relational data model is not
suitable for the present huge weblogs, thus, it creates a scope for further research.
Within the year 2009, K. R. Suneetha, Dr. R. Krishnamoorthi [20] summarized the weblog
locations as server side logs, proxy side logs and client side logs along with their nature like transfer
log, agent log, error log and referrer logs. They presented the web log structure and its attributes, and
indicated the future work as the data mining techniques can be applied on the pre processed weblog
to find frequently accessed patterns in less time with high accuracy.
In 2010, V. Chitraa, A.S.D. [27] reviewed the existing work done in the pre processing stage
that includes data collection and pre processing. The data collection comprises of server level, proxy
level and client side. They also concluded weblog is the best source to know usage behavior. But the
raw weblogs contain unnecessary details which will affect the accuracy of pattern discovery and
analysis.
In 2011, Ma Shuyue, Liu Wencai, Wang Shuo [28] reviewed the existing work done in the
pre processing stage that includes the embedded animations in web pages and other page elements
which meet the new standards can be combined with the logs to become the concerns. Therefore, the
preprocessing before the web mining in weblog should become a more important research.
During 2012, P. Nithya, Dr. P. Sumathi [25] proposed a novel pre-processing technique that
can remove local and global noise and web robots data. However, they also expressed that intelligent
techniques are required to discard the noise and the data accessed by web robots automatically as
their future directions.
Recently in 2013, Chintan R. Varnagar, Nirali N. Madhak, Trupti M. Kodinariya, Jayesh N.
Rathod [14] provided a detailed survey of work done so far on data collection and pre-processing
stage of web usage mining. They endorsed that web log data pre-processing is very important and
crucial task in entire process. This phase can be strengthened by choosing and applying various
intelligent techniques.
2.2 Storage Models
Valtchev P. and Missaoui R. [37] proposed a framework in the year 2001 that design a
generic procedure for lattice building. This new lattice building approach is more general than the
previous lattices, which is built on the basis of theory of lattice. They expressed that the features of
growing data sought the development of incremental algorithms as their future work. Yet, the major
known incremental techniques are lagging in theoretical foundation.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME
281
The concept of lattices derived using universally quantified expressions during the year 2002
by JL. Pfaltz and CM. Taylor [33]. They have shown the concept of incremental lattices and their
associated logical implications along with scientific observations are generated. They profoundly
expressed the incremental discovery is as rightly suitable for growing data like weblogs.
In the subsequent year 2003, F. Masseglia, P. Poncelet, and M. Teisseire [31] considered the
problem of incremental mining and presented a new algorithm for mining frequent sequences in the
updated database. They specifically expressed in their future avenue that incremental mining is also
appropriate for web usage mining, where the modifications need to be taken into account in order to
save storage space as previous information is no longer of interest or becomes invalid.
In 2004, Show-Jane Yen, Yue-Shi Lee and Chung-Wen Cho [36] implemented an
incremental updating technique to maintain the discovered frequent traversal patterns when the user
sequences are inserted into the database. The experimental results have shown that the algorithm is
efficient for the maintenance of mining frequent traversal patterns.
Shao M W. presented the approaches of attribute reduction and object reduction for two kinds
of generalized concept lattices in the year 2005, in which they removed the attributes and objects that
are not essential to the generalized lattices.
During 2006, Li H R, Zhang W X, Wang H [34] investigated the attribute classification and
reduction of lattices using binary relations. They also presented two kinds of recognition methods of
attribute classification based on the properties of irreducible elements and its congruence. According
to the classification a reduction method of lattices is obtained.
In the year 2007, Graham Cormode, Flip Korn, S. Muthukrishnan and Divesh Srivastava [32]
proposed an algorithm based on product of hierarchical dimensions from mathematical lattice
structure. They found importance of hierarchical multidimensional summarization of data in
emerging data stream applications. However, there is no clear method how to discard nodes that do
not qualify the minimum threshold.
Ben Martin and Peter Eklund [39] proposed a boarder algorithm for making the covering
relations of concept explicit for iceberg lattices in the year 2008. Empirical testing has been
performed to compare the boarder algorithm with a traditional algorithm based on covering edges
algorithm from concept of data analysis.
An incremental data mining algorithm has been proposed in 2009 by Yue-Shi Lee [39]
towards website redesign to improve the web services based on navigational patterns. Moreover,
they expressed that this technique can not only be used for website design but is also able to analyze
user behavior.
In 2010, Eklund, Peter, Villerd, Jean [30] projected hybrid visual representation techniques of
concept lattices concentrating on line diagrams. However, combining too many value attributes
resulted in complex nested diagram, useful for deep analysis but not suitable for first glance
navigation.
In 2011, Andreas Lubcke, Veit Koppen, and Gunter Saake [38] introduced a decision
approach based design process concerning different storage architectures. Recently in 2012, Santo
Lombardo, Elisabetta Di Nitto and Danilo Ardagna, expressed the necessity of development of
hybrid architecture for storage models as their future work.
2.2 Pattern Discovery Techniques
In the year 2001, Lu, S., Hu, H., and Li, F. [48] presented vertical and mixed weighed
association rules on static datasets to determine correlations between items. The experiments
conducted by them retrieved better predictive ability. This method has also been demonstrated on
static and synthetic datasets.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME
282
During 2002, C. Ezeife, Y. Su [40] presented frequent pattern tree structure to reduce the
required number of database scans. DB Tree, PotFP Tree algorithm are the proposed algorithms
applied on large databases. The discovery of closed patterns on multi dimensional patterns is an
interesting future direction from their work.
All the range in 2003, J. Fong, H.K. Wong, S.M. Huang [45] introduced a frame metadata
model to facilitate the continuous association rules generation in data mining. A new set of
association rules can be derived with the update of source databases with this model using static and
active classes. Mining association rules on weblog data is the future direction.
An efficient method presented in 2004 by Harms, S.K. and Deogun [43] for mining frequent
association rules from multiple data sets. This work highlighted the importance of user constraints by
introducing antecedent and consequent patterns in the mining process on the association rules.
In the next year 2005, F.C Tseng, C.C. Hsu and K.S. Fu [42] established a simpler and more
efficient data structure for representing the frequent pattern list. The technique partitioned both the
search space and solution space so as to apply divide and conquer approach in mining frequent
patterns.
During 2006, Huang, Y.M., Kuo, Y-H., Chen, J.N. and Jeng, Y.L [44] developed a
navigational pattern tree NP-Miner for discovering sequential patterns. Most of the authors uttered
about dynamically maintaining and updating of knowledge base and comprehensive evaluating
methods in knowledge discovery process.
In the subsequent year 2007, Dalamagas, T., Bouros, P., Galanis, T., Eirinaki, M. and Sellis,
T.K [41] provided a set of mining tasks intended for user navigation patterns in focus of
personalizing topic directories according to the navigational behavior of the users. The efforts made
by many of the researchers concluded that among all the data mining algorithms of association rules,
incremental algorithms fit better for growing large databases.
All the range in 2008, J L Balcazar [47] studied and explored the concept of redundancy
among association rules from a fundamental perspective. They discussed several existing alternative
definitions of redundancy between association rules and provided new characterizations and
relationships among them. They also provided a sound and complete calculus to construct deduction
scheme for redundancy rules. They also analyzed the risk degree of lost rules based on the
incremental mining.
Przemysław Kazienko [51] reviewed indirect association rules and presented a new approach
to discover indirect associations existing between pages that rarely occurred in 2009. Their
experimental results revealed the usefulness of indirect rules in the weblog scenario.
Priyanka Makkar, Payal Gulati, Dr. A.K. Sharma [50] have presented recently in the year
2010, an approach for predicting user behavior to improve web performance. They also expressed
that web pre fetching became an attractive solution where in forthcoming page accesses of a user are
predicted based on weblog.
In 2011, Liu Jian, Wang Yan-Qing [49] presented a research frame work that makes a
contribution to web mining. Their experimental results show that there is still space for improvement
to optimize the solution by employing advanced techniques.
During 2012, Mahendra Pratap Yadav, Pankaj Kumar Keserwani, Shefalika Ghosh Samaddar
proposed an adaptive algorithm for incremental mining of association rules. The algorithm is a
highly efficient incremental mining technique and the authors expressed it can be applied in the
weblog scenario.
Recently in 2013, Johannes K. Chiang, Rui-Han Yang [46] proposed an approach which
includes a novel data structure and an efficient algorithm for mining association rules on various
granularities. However, their test results shown its performance, efficiency and scalability better than
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME
283
the current approaches. But the effects of perceived issues and potential development of data mining
and concept description are worthy of further investigation.
2.4 Optimization Techniques
In the year 2002, H. K. Tsai, J. M. Yang, and C. Y. Kao. [56] demonstrated the usage of GA
in finding the optimal global strategies by using clustering technique on biological datasets. The
extensive bibliography provided by them is an evidence of the relevance of usage of GAs in web
mining.
Then in 2003, B. Minaei-Bidgoli, William F. Punch [53] presented an approach for
classifying the students in order to predict their final grade based on features extracted from log data
of an education web based system. To minimize the prediction error rate they used genetic
algorithms by weighting the features. They also provided comparison study of several crossover
operators.
In the next year 2004, B. Minaei-Bidgoli, G. Kortemeyer, and W. F. Punch extended their
previous work [53] and presented a new approach for predicting students’ performance based on
extracting the average of feature values for overall of the problem.
S. Y. Wang and K. Tai [60] in 2005, implemented a bit array representation method for
structural topology optimization using the genetic algorithm. An identical initialization method is
also proposed to improve the genetic algorithm performance in dealing with problems with narrow
design domains. Their results specified that bit array representation is suitable in handling the design
connectivity problem.
In the next year 2006, S. Y. Wang, K. Tai, and M. Y. Wang [61] presented a versatile, robust
and enhanced genetic algorithm for structural topology optimization using problem specific
knowledge. In their implementation process specifically pronounced the importance of choosing
appropriate representation techniques, genetic operators and evaluation methods.
During 2008, S. Ventura, C. Romero, A. Zafra, J. A. Delgado, and C. Hervas [59] designed a
framework that can apply to maximize reusability and availability of evolutionary computation with
a minimum effort in web mining. The heavily demanding computational performance is an open
problem as earmarked in their future research work.
Hyunchul Ahn, Kyoung-jae Kim [57] reviewed prior studies on optimization techniques for
several systems in 2009. They further examined genetic approach for optimization of feature weights
and relevant instances for similarity calculations. They also mentioned in their limitations that the
size of the population and the number of generations for genetic algorithm is very huge. Thus,
reducing the size of population and number of generations for genetic algorithm is an open
challenge.
Recently in the year 2010, Mehmet Kaya [58] proposed a novel method using multi objective
evolutionary algorithm that extracts the patterns automatically. This method applied on dataset with a
sequential character. The methodology of automatic extraction is a promising future research as mark
down in their conclusions.
In 2011, Diana Martın, Alejandro Rosete, Jesus Alcala-Fdez and Francisco Herrera [54]
extended the well-known multi-objective evolutionary algorithms to perform learning of the intervals
of attributes and a condition selection in order to mine a set of optimum association rules with
accuracy.
During 2012, Xiaoyan Sun, Lei Yang, Dunwei Gong and Ming Li, [62] studied that collective
intelligence extracted from multiple users enhance the performance of GA. They felt that designing
evolutionary algorithm is a promising research direction in the knowledge discovery process as
mentioned in their future research directions.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME
284
Recently in 2013, Gaurav Dubey, Arvind Jaiswa [55] have dealt the challenge of association
rule mining problem in finding frequent itemsets using GA based method. However, they noticed
that a more extensive empirical evaluation of their proposed method is a promising future research.
2.5 Pattern Analysis Techniques
During 2001, Hilderman, R. J. and Hamilton, H. J focused on classifying interestingness
measure and provided general overview of more successful and widely used interestingness
measures from the literature that have been employed in data mining applications. They expressed
extending theory of interestingness for diversity measures is an open for future research work.
All the range in 2002, Keim, D.A. [69] proposed a classification of visualization techniques
which is based on the data type to be visualized. And they articulated tight integration of
visualization techniques with traditional techniques for their future work. Grinstein, G., Hoffman, P.,
& Pickett, R described a set of benchmarking for visualization approaches. Although benchmarking
is made, further study and contribution from the research community and the industry is required.
In the year 2003, Brijs T., Vanhoof K. and Wets G [63] provided an overview of existing
measures of interestingness and divided them into objective and subjective measures. They focused
on objective measures by means of statistical criteria.
In the next year 2004, Jaroszewicz, S. and Simovici, D A [68] proposed a new definition of
interestingness as the absolute difference between its support estimated from the data and from the
Bayesian network. In addition, they presented an efficient algorithm based on the new definition.
Their experimental evolution proved usefulness of the algorithm for finding interesting, unexpected
patterns.
During 2005, Furnkranz, J. and Flach, P. A [64] provided analysis of behavior of covering
rule algorithms by visualizing their evaluation metrics and their dynamics and coverage space. They
described heuristics for evaluating rules as well as filtering and stopping criteria. Their experimental
results proved that covering algorithm suitable for understanding both the behavior of heuristic
functions and dynamics.
In 2006, Padmanabhan, B. and Tuzhilin [71] presented a new method for discovering a
minimal set of unexpected patterns by combining two independent concepts of minimality and
unexpectedness, both of which have been well studied in the KDD literature. They demonstrated the
strength of this approach experimentally.
In the next year 2007, Heng-Soon Gan and Andrew [65] defined rescheduling stability
quantitatively and have provided analytical mean for various heuristics. Rescheduling stability of
heuristics is also important apart from effectiveness and efficiency. They extended empirical and
analytical work on heuristic robustness. In their future research work, considering Spearman’s foot
rule, a measure of permutation disarray may shed some further light on heuristics.
During 2008, Vitaly Friedman [72] proposed USER approach that finds unexpected
sequences and implication rules from data with user defined beliefs for mining unexpected behaviors
from weblogs. As the unexpected behaviors impact the web usage analysis and many of the times the
identification of unexpectedness depends on semantics and user beliefs, measure of unexpectedness
elevated as an open research.
In the next year 2009, Michael Friendly [70] surveyed the visualization techniques from the
deep roots to the current fruit. Their experimental results triggered interesting future research paths
towards automation of the process, quality, scalability and intelligence of the sensitivity measure.
In 2011, Izwan Nizal Mohd, Shaharanee, Fedja Hadzic, Tharam S. Dillon [67] have proposed
a strategy that combines data mining and statistical measurement techniques, including redundancy
analysis, sampling and multivariate statistical analysis, to discard the non- significant rules. Their
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME
285
experimental results are evident that show their framework managed to reduce a large number of
non-significant and redundant rules while at the same time preserving relatively high accuracy.
Recently in 2013, David H. Glass focused on a particular class of objective measures known
as confirmation measures. Their proposed class of objective measures provided a solid basis for
future research.
3. CONCLUSIONS
The literature is clearly evident that, web usage mining is a promising and attractive task of
web mining. This extensive research study noticed and emphasized that the usage characterization
consists of mainly five interdependent stages: pre-processing, storage models, pattern discovery,
optimization techniques and pattern analysis. The authors in the present paper also observed the
importance, criticality and efficiency of comprehensive approach in the process of web usage mining
and which has been triggered as the formal basis for the future. Furthermore, the literature survey has
recognized that implementation of interdependent stages comprehensively is a promising and
practical research area.
4. FUTURE WORK
As a future work, a new approach will be designed and developed to concentrates
comprehensively on all stages of web usage mining and to leverage the strengths of individual
techniques. In addition, the comprehensive approach is planned to test with different weblogs that
cover a large spectrum of various applications, such as, web usage analysis for improvements in
fraud detection, product analysis and customer segmentation.
Further, future efforts, extension of comprehensive model can exploit and enable an effective
integration and mining of content, usage and structure web log data, promise to lead to the next
generation of useful and intelligent tools for web mining that can derive real time knowledge from
user transactions on the web.
REFERENCES
[1] B. Masand, M. Spiliopoulou, J. Srivastava, O.R. Zaiane, Proceedings of WebKDD2002, “Web
Mining for Usage Patterns and Profiles”, Edmonton, CA, 2002.
[2] B. Mobasher, R. Cooley, J. Srivastava, “Automatic Personalization Based on Web Usage
Mining” Communications of the ACM, 43(8), pp: 142–151, 2000.
[3] E. Frias-Martinez, V. Karamcheti, “A customizable behavior model for temporal prediction of
web user sequences”, In WEBKDD, Explorations, 1(2), pp: 66–85, 2000.
[4] Geeta R. B., Prof. Shashikumar, G. Totad, Dr. Prasad Reddy, “Amalgamation of Web Usage
Mining and Web Structure Mining”, International Journal of Recent Trends in Engineering,
Vol. 1, No. 2, pp: 279-281, 2009.
[5] Guandong Xu, Yanchun Zhang, Xun Yi, “Modelling User Behaviour for Web
Recommendation Using LDA Model”, IEEE/WIC/ACM International Conference on Web
Intelligence and Intelligent Agent Technology, pp: 529-532, 2008.
[6] Han, J., Chang, K. C., “Data mining for Web intelligence”, IEEE Computer, 35(11), pp: 64-70,
2002.
[7] Joshi K. P., Joshi A., Yesha Y., Krishnapuram R., “Warehousing and mining web logs”, In
Proceedings of the 2nd ACM CIKM Workshop on Web Information and Data Management,
Kansas City, Missouri, USA 1999, pp: 63–8, 1999.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME
286
[8] Kao, H., Lin, S., Ho, J., Chen, M., “Mining Web Informative Structures and Contents Based on
Entropy Analysis”, IEEE Transactions on Knowledge and Data Engineering, Vol.16, Iss.1,
pp: 41 – 55, 2004.
[9] Massimiliano Albanese, Antonio Picariello, Carlo and Lucio Sansone, “Web Personalization
Based on Static Information and Dynamic User Behavior”, ACM 1-58113-978-0/04/0011,
2004.
[10] Qingtian Han, Xiaoyan Gao, Wenguo Wu, “Study on Web Mining Algorithm Based on Usage
Mining”, IEEE Xplore, pp: 1121-1124, 2008.
[11] R. Kohavi, M. Spiliopoulou, J. Srivastava, Proceedings of “WebKDD2000 Web Mining for
E-Commerce – Challenges & Opportunities”, Boston, MA, 2000.
[12] Berendt B. et al., “The Impact of Site Structure and User Environment on Session
Reconstruction in Web Usage Analysis”, Proc. WEBKDD 2002: Mining Web Data for
Discovery Usage Patterns and Profiles, LNCS 2703, Springer-Verlag, pp: 159–179, 2002.
[13] Berendt, B., B. Mobasher, M. Spiliopoulou, J. Wiltshire., “Measuring, the accuracy of
sessionizers for web usage analysis”, Proc.of the Workshop on Web Mining, First SIAM
Internat.Conf. on Data Mining, Chicago, IL, pp: 7–14, 2001.
[14] Chintan R. Varnagar, Nirali N. Madhak, Trupti M. Kodinariya, Jayesh N. Rathod “Web Usage
Mining: A Review on Process, Methods and Techniques” IEEE Conference Publications,
pp: 40-46, 2013.
[15] Chungsheng Zhang, Liyan Zhuang, “New Path Filling Method on Data Preprocessing in Web
Mining”, Computer and Information Science, vol1(3), pp: 112-115, 2008.
[16] Fenstermacher, K.D., M. Ginsburg, “Client-side monitoring for web mining”, Journal of the
American Society for Information Science and Technology, Vol. 54, No. 7, pp: 625-637, 2003.
[17] G. Castellano, A. M. Fanelli, M. A. Torsello, “Log Data Preparation For Mining Web Usage
Patterns”, IADIS International Conference Applied Computing, 2007.
[18] G T Raju, P S Satyanarayana, “Knowledge Discovery from Web Usage Data: Complete
Preprocessing Methodology”, IJCSNS International Journal of Computer Science and Network
Security, VOL.8 No.1, pp: 179- 186, 2008.
[19] J. Zhang, A. A. Ghorbani, “The reconstruction of user sessions from a server log using
improved time oriented heuristics” in CNSR, IEEE Computer Society, pp: 315–322, 2004.
[20] K. R. Suneetha, Dr. R. Krishnamoorthi, “Identifying User Behavior by Analyzing Web Server
Access Log File”, IJCSNS International Journal of Computer Science and Network Security,
VOL.9 (4), pp: 327-332, 2009.
[21] Martin Arlitt, “Characterizing Web User Sessions” Internet and Mobile Systems Laboratory
HP Laboratories Palo Alto HPL- 2000-43, May, 2000.
[22] M. Spiliopoulou, B. Mobasher, B. Berendt, M. Nakagawa, “A Framework for the Evaluation of
Session Reconstruction Heuristics in Web Usage Analysis”. INFORMS Journal of Computing -
Special Issue on Mining Web-Based Data for E-Business Applications, 15 (2), pp: 171–190,
2003.
[23] Natheer Khasawneh, Chien-Chung Chan, “Active User-Based and Ontology-Based Web Log
Data Preprocessing for Web”, Proceedings of the 2006 IEEE/WIC/ACM International
Conference on Web Intelligence, 2006.
[24] Pang-Ning Tan, Vipin Kumar, “Discovery of Web Robot Sessions based on their Navigational
Patterns”, Data Mining and Knowledge Discovery, 6(1), pp: 9-35, 2002.
[25] P. Nithya, Dr. P. Sumathi “Novel Pre-Processing Technique for Web Log Mining by Removing
Global Noise and Web Robots”, IEEE Conference Publications, 2012.
[26] Renata Ivancsy, Sandor Juhasz, “Analysis of Web User Identification Methods, World
Academy of Science, Engineering and Technology”, pp: 338-345, 2007.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME
287
[27] V. Chitraa, A.S.D., "A Survey on Preprocessing Methods for Web Usage Data," (IJCSIS)
International Journal of Computer Science and Information Security, Vol. 7( 3), 2010.
[28] Ma Shuyue, Liu Wencai, Wang Shuo,” The Study on the Preprocessing in Web Log Mining”,
IEEE Conference Publications, pp: 1-5, 2011.
[29] Ben Martin, Peter Eklund, “From Concepts to Concept Lattice: A Border Algorithm for
Making Covers Explicit”, ICFCA 2008, Springer-Verlag Berlin Heidelberg, pp: 78-89, 2008.
[30] Eklund, Peter, Villerd, Jean, “A Survey of Hybrid Representations of Concept Lattices in
Conceptual Knowledge Processing”, Lecture Notes in Computer Science, Springer
Berlin/Heidelberg, pp: 296- 31, 2010.
[31] F. Masseglia, P. Poncelet, M. Teisseire., “Incremental mining of sequential patterns in large
databases”, Data Knowledge Engineering, 46(1), pp: 97-121, 2003.
[32] Graham Cormode, Flip Korn, S. Muthukrishnan, Divesh Srivastava, “Finding Hierarchical
Heavy Hitters in Streaming Data”, ACM Transactions on Database Systems, Vol. V, No. N,
2007.
[33] JL. Pfaltz, CM. Taylor, “Scientific discovery through iterative transformations of concept
lattices”, In Workhop on Discrete Applied Mathematics, in conjunction with the 2nd SIAM
International Conference on Data-Mining, pp: 65–74, 2002.
[34] Li H R, Zhang W X, Wang H., “Classification and reduction of attributes in concept lattices”,
Proc of IEEE International Conference on Granular Computing. Los Alamitos: IEEE Computer
Society, pp: 142-147, 2006.
[35] Shao, M W., “The reduction for two kind of generalized concept lattice”, Proceedings of the
4th International Conference on Machine Learning and Cybernetics, Berlin: Springer,
pp: 2217-2222, 2005.
[36] Show-Jane Yen, Yue-Shi Lee, Chung-Wen Cho, “An Efficient Approach for the Maintenance
of Path Traversal Patterns”, In Proceedings of IEEE International Conference on e-Technology,
e-Commerce and e-Service (EEE), pp: 207-214, 2004.
[37] Valtchev P., Missaoui R., “Building Concept (Galois) Lattices from arts: Generalizing the
Incremental Methods”, In Proceedings of the 9th International Conference on Conceptual
Structures (ICCS 2001), USA, pp: 290-303, 2001.
[38] Andreas Lubcke, Veit Koppen, and Gunter Saake, “A Decision Model to Select the Optimal
Storage Architecture for Relational Databases”, IEEE Conference Publications, pp: 1-11, 2011.
[39] Yue-Shi Lee, “A Lattice-Based Framework for Interactively and Incrementally Mining Web
Traversal Patterns”, DOI: 10.4018/978-1-59904-990-8, ch027, 2009.
[40] C. Ezeife, Y. Su, “Mining incremental association rules with generalized FP-tree,” in Lecture
Notes in Computer Science, LNCS 2338, Springer- Verlag, pp: 147-160, 2002.
[41] Dalamagas, T., Bouros, P., Galanis, T., Eirinaki, M., Sellis, T.K., “Mining user navigation
patterns for personalizing topic directories”, Proc. 9th ACM International Workshop on Web
Information and Data Management, Lisbon, Portugal, pp: 81-88, 2007.
[42] F.C Tseng, C.C. Hsu, K.S. Fu, “The Frequent Pattern List: Another Framework for Mining
Frequent Patterns,” International Journal of Electronic Business Management, vol. 3, no. 2,
pp: 104-115, Feb, 2005.
[43] Harms, S.K., Deogun, J.S., “Sequential association rule mining with time lags”, Journal of
Intelligent Information Systems, Vol. 22, No. 1, pp: 7-22, 2004.
[44] Huang, Y.M., Kuo, Y-H., Chen, J.N., Jeng, Y.L., “NP-miner: A real-time recommendation
algorithm by using web usage mining”, Knowledge Based Systems, Vol. 19, No. 4,
pp: 272-286, 2006.
[45] J. Fong, H.K. Wong, S.M. Huang, “Continuous and incremental data mining association rules
using frame metadata model”, Knowledge-Based Systems 16, Elsevier, pp: 91-100, 2003.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME
288
[46] Johannes K. Chiang, Rui-Han Yang, “Multidimensional Data Mining for Discover Association
Rules in Various Granularities”, IEEE Conference Publications, pp: 1-6, 2013.
[47] J L Balcazar, “Redundancy, Deduction Schemes, and Minimum-Size Bases for Association
Rules” Pascal Report 4259, 2008.
[48] Livingstone, G., Rosenberg, J., Buchanan, B., “An agenda and justification based framework
for discovery systems”, Knowledge and Information Systems 5(2), pp: 133-161, 2003.
[49] Liu Jian, Wang Yan-Qing, “Web Log Data Mining Based on Association Rule”, Eighth
International Conference on Fuzzy Systems and Knowledge Discovery (FSKD),
pp: 1855-1859, 2011.
[50] Priyanka Makkar, Payal Gulati, Dr. A.K. Sharma, “A Novel Approach for Predicting User
Behavior for Improving Web Performance”, International Journal on Computer Science and
Engineering, VOL. 02, No. 04, pp: 1233-1236, 2010.
[51] Przemysław Kazienko, “Mining Indirect Association Rules For Web Recommendation”
International Journal of Applied Mathematics and Computer Science, Vol. 19, No. 1,
pp: 165-186, 2009.
[52] Show-Jane Yen, Yue-Shi Lee, “An efficient data mining approach for discovering interesting
knowledge from customer transactions”, Expert Systems with Applications, Elsevier, pp: 1-8,
2005.
[53] B. Minaei-Bidgoli, William F. Punch, “Using Genetic Algorithms for Data Mining
Optimization in an Educational Web-based System”, http://www.lon-capa.org, 2003.
[54] 171 Diana Martın, Alejandro Rosete, Jesus Alcala-Fdez and Francisco Herrera, “A Multi-
Objective Evolutionary Algorithm for Mining Quantitative Association Rules”, IEEE
Conference Publications, pp: 1397-1402, 2011.
[55] Gaurav Dubey, Arvind Jaiswal, “Identifying Best Association Rules and Their Optimization
Using Genetic Algorithm”, International Journal of Emerging Science and Engineering
(IJESE), Volume-1, Issue-7, pp: 91-96, 2013.
[56] H. K. Tsai, J. M. Yang, C. Y. Kao., “Applying genetic algorithms to finding the optimal Gene
order in displaying the microarray data”, In Proceedings of the Genetic and Evolutionary
Computation Conference (GECCO), pp: 610-617, 2002.
[57] Hyunchul Ahn, Kyoung-jae Kim, “Bankruptcy prediction modeling with hybrid case-based
reasoning and genetic algorithms approach, Applied Soft Computing, Volume 9, Issue 2,
pp: 599–607, 2009.
[58] Mehmet Kaya, “Automated extraction of extended structured motifs using multi-objective
genetic algorithm” Expert Systems with Applications, Volume 37, Issue 3, pp: 2421-2426,
2010.
[59] S. Ventura, C. Romero, A. Zafra, J. A. Delgado, C. Hervas, “JCLEC: A java framework for
evolutionary computation soft computing.” Soft Computing, vol. 4, no. 12, pp: 381–392, 2008.
[60] S. Y. Wang, K. Tai. “Structural topology design optimization using genetic algorithms with a
bit-array representation”, Computer Methods in Applied Mechanics and Engineering, 194, pp:
3749-3770, 2005.
[61] S. Y. Wang, K. Tai, M. Y. Wang. “An enhanced genetic algorithm for structural topology
optimization”, International Journal for Numerical Methods in Engineering, 65, pp: 18-44,
2006.
[62] Xiaoyan Sun, Lei Yang, Dunwei Gong and Ming Li, “Interactive Genetic Algorithm Assisted
with Collective Intelligence from Group Decision Making”, IEEE World Congress on
Computational Intelligence, pp: 1-8, 2012.
[63] Brijs, T., Vanhoof, K., Wets, G., “Defining interestingness for association rules”, International
Journal of Information Theories and Applications, 10(4), pp: 370–376, 2003.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME
289
[64] Furnkranz, J., Flach, P. A., “ROC ‘n’ rule learning: Towards a better understanding of covering
algorithms” Mach. Learn. 58, (1), pp: 39–77, 2005.
[65] Heng-Soon, Gan., Andrew, “Wirth Heuristic stability: A permutation disarray measure”,
Computers & Operations Research, Volume 34, Issue 11, pp: 3187-3208, 2007.
[66] Hilderman, R. J., Hamilton, H. J., “Evaluation of interestingness measures for ranking
discovered knowledge”, Lecture Notes in Computer Science 2035, pp: 247–259, 2001.
[67] Izwan Nizal Mohd, Shaharanee, Fedja Hadzic, Tharam S. Dillon, “Interestingness measures for
association rules based on statistical validity”, Knowledge-Based Systems, pp: 386–392, 2011.
[68] Jaroszewicz, S., Simovici, D. A., “Interestingness of frequent itemsets using Bayesian networks
as background knowledge”, in Proceedings of the 2004 ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, pp: 178–186, 2004.
[69] Keim, D.A., “Information visualization and visual data mining”, IEEE Transactions On
Visualization and Computer Graphics 7, pp: 100–107, 2002.
[70] Michael Friendly, “Milestones in the history of thematic cartography, statistical graphics, and
data visualization”, 2009.
[71] Padmanabhan, B., Tuzhilin, A., “On characterization and discovery of minimal unexpected
patterns in rule discovery”, IEEE Transactions on Knowledge and Data Engineering, Vol. 18,
No. 2, pp: 202–216, 2006.
[72] Vitaly Friedman, “Data Visualization and Infographics” in: Graphics, Monday Inspiration,
January 14th, 2008.
[73] Ravita Mishra, “Web Usage Mining Contextual Factor: Human Information Behavior”,
International Journal of Information Technology and Management Information Systems
(IJITMIS), Volume 5, Issue 1, 2014, pp. 12 - 29, ISSN Print: 0976 – 6405, ISSN Online:
0976 – 6413.
[74] Suresh Subramanian and Dr. Sivaprakasam, “Genetic Algorithm with a Ranking Based
Objective Function and Inverse Index Representation for Web Data Mining”, International
Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 5, 2013, pp. 84 - 90,
ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[75] Jaykumar Jagani and Prof. Kamlesh Patel, “An Enhanced Algorithm for Classification of Web
Data for Web Usage Mining using Supervised Neural Network Algorithms”, International
Journal of Computer Engineering & Technology (IJCET), Volume 5, Issue 4, 2014, pp. 48 - 56,
ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.

Contenu connexe

Similaire à AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB USAGE MINING

A Review on Pattern Discovery Techniques of Web Usage Mining
A Review on Pattern Discovery Techniques of Web Usage MiningA Review on Pattern Discovery Techniques of Web Usage Mining
A Review on Pattern Discovery Techniques of Web Usage Mining
IJERA Editor
 
H0314450
H0314450H0314450
H0314450
iosrjournals
 
Web personalization using clustering of web usage data
Web personalization using clustering of web usage dataWeb personalization using clustering of web usage data
Web personalization using clustering of web usage data
ijfcstjournal
 
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
ijcsa
 
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
ijcsa
 
A Novel Method for Data Cleaning and User- Session Identification for Web Mining
A Novel Method for Data Cleaning and User- Session Identification for Web MiningA Novel Method for Data Cleaning and User- Session Identification for Web Mining
A Novel Method for Data Cleaning and User- Session Identification for Web Mining
IJMER
 
Search Engine Scrapper
Search Engine ScrapperSearch Engine Scrapper
Search Engine Scrapper
IRJET Journal
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
IOSR Journals
 
Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)
OUM SAOKOSAL
 
A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...
A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...
A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...
cscpconf
 
a novel technique to pre-process web log data using sql server management studio
a novel technique to pre-process web log data using sql server management studioa novel technique to pre-process web log data using sql server management studio
a novel technique to pre-process web log data using sql server management studio
INFOGAIN PUBLICATION
 
625 634
625 634625 634
A detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesA detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniques
ijctet
 
A comprehensive study of mining web data
A comprehensive study of mining web dataA comprehensive study of mining web data
A comprehensive study of mining web data
eSAT Publishing House
 
A Comparative Study of Recommendation System Using Web Usage Mining
A Comparative Study of Recommendation System Using Web Usage Mining A Comparative Study of Recommendation System Using Web Usage Mining
A Comparative Study of Recommendation System Using Web Usage Mining
Editor IJMTER
 
Pf3426712675
Pf3426712675Pf3426712675
Pf3426712675
IJERA Editor
 
C03406021027
C03406021027C03406021027
C03406021027
theijes
 
Implementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server MonitoringImplementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server Monitoring
iosrjce
 
C017231726
C017231726C017231726
C017231726
IOSR Journals
 
Integrated Web Recommendation Model with Improved Weighted Association Rule M...
Integrated Web Recommendation Model with Improved Weighted Association Rule M...Integrated Web Recommendation Model with Improved Weighted Association Rule M...
Integrated Web Recommendation Model with Improved Weighted Association Rule M...
ijdkp
 

Similaire à AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB USAGE MINING (20)

A Review on Pattern Discovery Techniques of Web Usage Mining
A Review on Pattern Discovery Techniques of Web Usage MiningA Review on Pattern Discovery Techniques of Web Usage Mining
A Review on Pattern Discovery Techniques of Web Usage Mining
 
H0314450
H0314450H0314450
H0314450
 
Web personalization using clustering of web usage data
Web personalization using clustering of web usage dataWeb personalization using clustering of web usage data
Web personalization using clustering of web usage data
 
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
 
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
 
A Novel Method for Data Cleaning and User- Session Identification for Web Mining
A Novel Method for Data Cleaning and User- Session Identification for Web MiningA Novel Method for Data Cleaning and User- Session Identification for Web Mining
A Novel Method for Data Cleaning and User- Session Identification for Web Mining
 
Search Engine Scrapper
Search Engine ScrapperSearch Engine Scrapper
Search Engine Scrapper
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
 
Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)
 
A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...
A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...
A NEW IMPROVED WEIGHTED ASSOCIATION RULE MINING WITH DYNAMIC PROGRAMMING APPR...
 
a novel technique to pre-process web log data using sql server management studio
a novel technique to pre-process web log data using sql server management studioa novel technique to pre-process web log data using sql server management studio
a novel technique to pre-process web log data using sql server management studio
 
625 634
625 634625 634
625 634
 
A detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesA detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniques
 
A comprehensive study of mining web data
A comprehensive study of mining web dataA comprehensive study of mining web data
A comprehensive study of mining web data
 
A Comparative Study of Recommendation System Using Web Usage Mining
A Comparative Study of Recommendation System Using Web Usage Mining A Comparative Study of Recommendation System Using Web Usage Mining
A Comparative Study of Recommendation System Using Web Usage Mining
 
Pf3426712675
Pf3426712675Pf3426712675
Pf3426712675
 
C03406021027
C03406021027C03406021027
C03406021027
 
Implementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server MonitoringImplementation of Intelligent Web Server Monitoring
Implementation of Intelligent Web Server Monitoring
 
C017231726
C017231726C017231726
C017231726
 
Integrated Web Recommendation Model with Improved Weighted Association Rule M...
Integrated Web Recommendation Model with Improved Weighted Association Rule M...Integrated Web Recommendation Model with Improved Weighted Association Rule M...
Integrated Web Recommendation Model with Improved Weighted Association Rule M...
 

Plus de James Heller

Introduction Of Report Writing 23 Problem Anal
Introduction Of Report Writing 23 Problem AnalIntroduction Of Report Writing 23 Problem Anal
Introduction Of Report Writing 23 Problem Anal
James Heller
 
How Cheap Essay Writing Services Can Get You A Distinction
How Cheap Essay Writing Services Can Get You A DistinctionHow Cheap Essay Writing Services Can Get You A Distinction
How Cheap Essay Writing Services Can Get You A Distinction
James Heller
 
How To Write Good Paragraph Transitions
How To Write Good Paragraph TransitionsHow To Write Good Paragraph Transitions
How To Write Good Paragraph Transitions
James Heller
 
How To Write A Career Development Essay - Aher
How To Write A Career Development Essay - AherHow To Write A Career Development Essay - Aher
How To Write A Career Development Essay - Aher
James Heller
 
Political Science Research Paper Example How The
Political Science Research Paper Example How ThePolitical Science Research Paper Example How The
Political Science Research Paper Example How The
James Heller
 
How To Write A 5 Paragraph Essay 6Th Graders - Ader
How To Write A 5 Paragraph Essay 6Th Graders - AderHow To Write A 5 Paragraph Essay 6Th Graders - Ader
How To Write A 5 Paragraph Essay 6Th Graders - Ader
James Heller
 
Results For Farm Writing Paper TPT
Results For Farm Writing Paper TPTResults For Farm Writing Paper TPT
Results For Farm Writing Paper TPT
James Heller
 
Water Theme Art Wide Ruled Line Paper - Walmart.Com - Wa
Water Theme Art Wide Ruled Line Paper - Walmart.Com - WaWater Theme Art Wide Ruled Line Paper - Walmart.Com - Wa
Water Theme Art Wide Ruled Line Paper - Walmart.Com - Wa
James Heller
 
How To Write A Personal Narrative A Step-By-Step
How To Write A Personal Narrative A Step-By-StepHow To Write A Personal Narrative A Step-By-Step
How To Write A Personal Narrative A Step-By-Step
James Heller
 
Technology Essay Writing This Is An Ielts Writing Task 2 Sampl
Technology Essay Writing This Is An Ielts Writing Task 2 SamplTechnology Essay Writing This Is An Ielts Writing Task 2 Sampl
Technology Essay Writing This Is An Ielts Writing Task 2 Sampl
James Heller
 
How To Write A Film Essay. Critical Film Analysis Essa
How To Write A Film Essay. Critical Film Analysis EssaHow To Write A Film Essay. Critical Film Analysis Essa
How To Write A Film Essay. Critical Film Analysis Essa
James Heller
 
Introduction - How To Write An Essay - LibGuides At U
Introduction - How To Write An Essay - LibGuides At UIntroduction - How To Write An Essay - LibGuides At U
Introduction - How To Write An Essay - LibGuides At U
James Heller
 
Essay Examples Should College At
Essay Examples Should College AtEssay Examples Should College At
Essay Examples Should College At
James Heller
 
Climate Change Essay Telegraph
Climate Change Essay TelegraphClimate Change Essay Telegraph
Climate Change Essay Telegraph
James Heller
 
Pay Someone To Do A Research Paper - Pay For
Pay Someone To Do A Research Paper - Pay ForPay Someone To Do A Research Paper - Pay For
Pay Someone To Do A Research Paper - Pay For
James Heller
 
Best Essays By George Orwell
Best Essays By George OrwellBest Essays By George Orwell
Best Essays By George Orwell
James Heller
 
Reflective Report Examples.Pdf - Acknowled
Reflective Report Examples.Pdf - AcknowledReflective Report Examples.Pdf - Acknowled
Reflective Report Examples.Pdf - Acknowled
James Heller
 
Excellent Essay On School Thatsnotus
Excellent Essay On School ThatsnotusExcellent Essay On School Thatsnotus
Excellent Essay On School Thatsnotus
James Heller
 
Importance Of Environment Essay. Essay On Environm
Importance Of Environment Essay. Essay On EnvironmImportance Of Environment Essay. Essay On Environm
Importance Of Environment Essay. Essay On Environm
James Heller
 
Best Film Analysis Essay Examples PNG - Scholarship
Best Film Analysis Essay Examples PNG - ScholarshipBest Film Analysis Essay Examples PNG - Scholarship
Best Film Analysis Essay Examples PNG - Scholarship
James Heller
 

Plus de James Heller (20)

Introduction Of Report Writing 23 Problem Anal
Introduction Of Report Writing 23 Problem AnalIntroduction Of Report Writing 23 Problem Anal
Introduction Of Report Writing 23 Problem Anal
 
How Cheap Essay Writing Services Can Get You A Distinction
How Cheap Essay Writing Services Can Get You A DistinctionHow Cheap Essay Writing Services Can Get You A Distinction
How Cheap Essay Writing Services Can Get You A Distinction
 
How To Write Good Paragraph Transitions
How To Write Good Paragraph TransitionsHow To Write Good Paragraph Transitions
How To Write Good Paragraph Transitions
 
How To Write A Career Development Essay - Aher
How To Write A Career Development Essay - AherHow To Write A Career Development Essay - Aher
How To Write A Career Development Essay - Aher
 
Political Science Research Paper Example How The
Political Science Research Paper Example How ThePolitical Science Research Paper Example How The
Political Science Research Paper Example How The
 
How To Write A 5 Paragraph Essay 6Th Graders - Ader
How To Write A 5 Paragraph Essay 6Th Graders - AderHow To Write A 5 Paragraph Essay 6Th Graders - Ader
How To Write A 5 Paragraph Essay 6Th Graders - Ader
 
Results For Farm Writing Paper TPT
Results For Farm Writing Paper TPTResults For Farm Writing Paper TPT
Results For Farm Writing Paper TPT
 
Water Theme Art Wide Ruled Line Paper - Walmart.Com - Wa
Water Theme Art Wide Ruled Line Paper - Walmart.Com - WaWater Theme Art Wide Ruled Line Paper - Walmart.Com - Wa
Water Theme Art Wide Ruled Line Paper - Walmart.Com - Wa
 
How To Write A Personal Narrative A Step-By-Step
How To Write A Personal Narrative A Step-By-StepHow To Write A Personal Narrative A Step-By-Step
How To Write A Personal Narrative A Step-By-Step
 
Technology Essay Writing This Is An Ielts Writing Task 2 Sampl
Technology Essay Writing This Is An Ielts Writing Task 2 SamplTechnology Essay Writing This Is An Ielts Writing Task 2 Sampl
Technology Essay Writing This Is An Ielts Writing Task 2 Sampl
 
How To Write A Film Essay. Critical Film Analysis Essa
How To Write A Film Essay. Critical Film Analysis EssaHow To Write A Film Essay. Critical Film Analysis Essa
How To Write A Film Essay. Critical Film Analysis Essa
 
Introduction - How To Write An Essay - LibGuides At U
Introduction - How To Write An Essay - LibGuides At UIntroduction - How To Write An Essay - LibGuides At U
Introduction - How To Write An Essay - LibGuides At U
 
Essay Examples Should College At
Essay Examples Should College AtEssay Examples Should College At
Essay Examples Should College At
 
Climate Change Essay Telegraph
Climate Change Essay TelegraphClimate Change Essay Telegraph
Climate Change Essay Telegraph
 
Pay Someone To Do A Research Paper - Pay For
Pay Someone To Do A Research Paper - Pay ForPay Someone To Do A Research Paper - Pay For
Pay Someone To Do A Research Paper - Pay For
 
Best Essays By George Orwell
Best Essays By George OrwellBest Essays By George Orwell
Best Essays By George Orwell
 
Reflective Report Examples.Pdf - Acknowled
Reflective Report Examples.Pdf - AcknowledReflective Report Examples.Pdf - Acknowled
Reflective Report Examples.Pdf - Acknowled
 
Excellent Essay On School Thatsnotus
Excellent Essay On School ThatsnotusExcellent Essay On School Thatsnotus
Excellent Essay On School Thatsnotus
 
Importance Of Environment Essay. Essay On Environm
Importance Of Environment Essay. Essay On EnvironmImportance Of Environment Essay. Essay On Environm
Importance Of Environment Essay. Essay On Environm
 
Best Film Analysis Essay Examples PNG - Scholarship
Best Film Analysis Essay Examples PNG - ScholarshipBest Film Analysis Essay Examples PNG - Scholarship
Best Film Analysis Essay Examples PNG - Scholarship
 

Dernier

Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 

Dernier (20)

Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 

AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB USAGE MINING

  • 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME 275 AN EXTENSIVE LITERATURE SURVEY ON COMPREHENSIVE RESEARCH ACTIVITIES OF WEB USAGE MINING Dr. V.V.R. Maheswara Rao1 , Dr. V. Valli Kumari 2 1 (Professor, Department of CSE, Shri Vishnu Engg. College for Women, AP, INDIA) 2 (Professor, Department of CS & SE, Andhra University, AP, INDIA) ABSTRACT The increased usage of World Wide Web (WWW) becomes a vast data repository related to the users’ interaction with the websites which is unstructured, unlabeled, high redundant and less reliable recorded in weblog. In addition, the existence of non linearity, incompleteness, heterogeneous and transient nature of data makes the weblog further complex. This situation creates inevitably increasing challenges in extracting desired knowledge from web log. Web usage mining is a methodology that blends traditional mining techniques with sophisticated algorithms to capture, model and analyze the behavioral patterns from weblog. The knowledge derived from such patterns creates a great value addition to any organization as they make use in decision making process. Thus, it is necessary to empower the web usage mining techniques that are aptly compatible to incremental nature of weblog. These techniques promote the prerequisite of applying the new approach at all stages of web usage mining comprehensively, to completely investigate the web usage mining effectively. To design and develop comprehensive model for investigating the web usage mining, the authors in the present paper conducts an extensive literature survey on various research activities in the era of web usage mining. Keywords: Web Mining, Pre-Processing, Storage Models, Pattern Discovery, Optimization Techniques, Pattern Analysis, Knowledge Representation. 1. INTRODUCTION The unexpected wide spread use of WWW and dynamically increasing nature of the web creates new challenges in the web mining since the data in the web inherently unlabelled, INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME: www.iaeme.com/IJCET.asp Journal Impact Factor (2014): 8.5328 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME 276 incomplete, and heterogeneous. In addition, it turned into a golden mount containing extremely dynamic and interrelated data for web miners to perform web mining. 1.1. Web Mining Web Mining is the application of data mining techniques [5] to discover and retrieve useful information and patterns from the WWW documents and services. The web mining can be used to discover hidden patterns and relationships within the web data. The web mining task can be divided into three general types, known as Web Content Mining (WCM), Web Structure Mining (WSM) and Web Usage Mining (WUM) as shown in figure 1. Figure 1: Types of web mining Web content mining is a mining technique which can extract the knowledge from the content published on internet [8], usually as semi-structured (HTML), unstructured (Plain text) and structured (XML) documents. The content of a web page include, like text, images, HTML, tables or forms. The ability to conduct web content mining allows results of search engines to maximize the flow of customer clicks to a website, or particular web pages of the site to be accessed numerous times in relevance to search queries. The main uses for this type of web mining are to gather, categorize, organize and provide the best possible information available on web. Web structure mining is a mining technique which can extract the knowledge from the WWW and hyperlinks between references in the web. Mining the structure [4] of the web involves extracting knowledge from the interconnections of the hypertext documents in the WWW. This results in discovery of web communities, and also pages that are authoritative. Moreover, the nature of relationship between neighboring pages can be discovered by structure mining. The main purpose for structure mining is to extract interesting relationships between web pages. Web usage mining, also known as weblog mining, is the process of automatic discovery and investigation of patterns in click streams and associated data collected or generated as a result of user interactions with web resources on websites. The main goal of web usage mining is to capture, model and analyze the behavioral patterns [3] and profiles of users interacting with websites. The discovered patterns are usually represented as collection of pages, objects or resources that are frequently accessed by groups of users with common needs or interests [4]. The primary data resources used in web usage mining are log files generated by web and application servers. In addition, it provides detailed information on web user behavior that can be useful for detecting intrusion and fraud. 1.2 Web Usage Mining The overall web usage mining process can be divided into mainly three interdependent stages: Pre-processing, Storage models, Pattern discovery, Optimization techniques & Pattern analysis as shown in figure 2.
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME 277 Figure 2: Stages of web usage mining Pre-processing is an initial stage [4] in web usage mining for the creation of suitable target dataset to which mining algorithms can be applied. The inputs for pre-processing stage are the web server logs, referral logs, registration files and index server logs [7]. In the pre-processing stage, the click stream data is cleaned and portioned into a set of user transactions representing the activities of each user during different visits to the sites. However, in order to provide the most suitable data for further stages of web usage mining, pre-processing is an aspect of data mining whose importance should not be underestimated. Web mining applications rely on monolithic storage models which takes the responsibility of data storage, retrieval and maintenance. The storage model creates a solid platform of consolidated qualified data for analysis. Basically, the storage models can be divided into two types based on the nature of their growth, namely static storage models and incremental storage models. Pattern discovery takes the output generated by pre-processing stage. The goal of pattern discovery is the stage of learning some general concepts from the pre-processed data. In this phase, statistical, database and machine learning techniques like classification, clustering and association rule mining are performed on the extracted information. The soft optimization techniques are characterized by their ability for granular computation in avoiding the concept of approximation. Basically, these models provide the foundation for computational intelligence systems and further outline the basis of future generation computing systems. These models are close resemblance to human like decision making and used for modeling highly non linear data, where the pattern discovery, rule generation and learnability are typical. Pattern analysis is the final stage of usage mining [4] which can extract interested patterns from the output of pattern discovery. The goal of pattern analysis is the task of understanding, visualizing, and interpreting the discovered patterns and statistics. The output generated by pattern analysis is used as input to the various applications such as recommendation engines, visualization tools and web analytics. 1.3 Web Log Characteristics Data collection is the primary step in web usage mining process [15] and it is the process of extracting the task relevant data from diverse web logs. It is very important and difficult task to get details of web user usage data from weblogs, since they are unstructured, incremental in nature and rapidly growing. One has to pay attention to collect the data from weblogs which normally includes web content data, web structure data and web usage data. Web usage data stores general access logs and user profiles, which consists of general access patterns and customized access patterns respectively. The web usage mining is a task of applying data mining techniques to extract useful patterns from such web data through various stages. These patterns can be used to investigate interesting characteristics of web users. The data collected using the web servers is richest, but practically it is very difficult to have web server’s data. The data can be collected using web clients by enabling java scripts, java applets and modified browsers, yet these methods require user participation to enable their functionalities [16]. Proxy servers are more suitable and reliable to collect web usage data since they are placed at
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME 278 client location and act as real servers. They capture all requests made by clients to the original server and store automatically in weblogs and further they improve navigation speed through caching. The weblog collected at proxy server is unstructured since the information contains different types of entries. These entries do not have definite number of attributes, identifiable structure and defined relation. This resulted in ambiguities and it was difficult to understand using computer programs. The other characteristics of weblog data are Heterogeneous, Distributed, Different data types, Dynamic content, Voluminous / Non-Scalable, Time dimension, Incremental in nature and Exponentially Growing. 2. AN EXTENSIVE AND COMPREHENSIVE LITERATURE SURVEY Knowledge discovery in the data has been used to analyze the data collected on the web and extracted useful knowledge. This effort was named as Web Mining by Etzioni in 1996, and from then onwards the research on web mining got its roots spread by the efforts of Robert Armstrong and his colleagues. Several approaches [1] have been proposed by many authors in the vein of web mining to get the work to next level. The rapid growing nature of weblog is a strong endorsement [1] to drag the high level interest of next generation researchers in the emerging field of web mining. The efforts in the recent past [7] focused on the issues related to the feasibility, scalability, usability and efficiency of the web mining techniques. The survey conducted by various authors [4] and their research contributions identified three broad categories of web mining, namely web structure mining, web usage mining and web content mining. The literature survey of web usage mining is as shown in figure 3. Figure 3: Roadmap of Literature Survey
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME 279 The efforts made by some authors [2, 5] recognized that web usage mining can be utilized for different tasks namely Personalization, System Improvement, Usage Characterization, Site Modification and Business Intelligence. Some other authors as shown in [2, 9] have already dealt the problem of web personalization task that comprises simple functions, advanced functions and intelligent functions which can perform certain tasks on behalf of user without taking explicit requests. The work explored by the authors [5, 6] witnessed that the system improvement emerged as another focus of web usage mining which concentrates on improvements of techniques to mine the information and knowledge on the web quickly and effectively. Performance and other service quality attributes are crucial to user satisfaction from services. The survey conducted by various authors [1, 3] acknowledged that usage characterization is an interesting task of web usage mining which focuses on the techniques that predict the user behavior while the user interacts with the website. Some authors [10] have mostly focused on the approaches that are utilized in site modification using web usage mining. The site modification is a crucial issue for many applications in terms of both usage and structure. The efforts put in by the recent authors [11] have become evident so that emergency e- services in the web era such as e-commerce, e-learning, e-banking and so on change radically the usage of internet, turning websites into businesses and thus business intelligence has become one of the task of web usage mining. 2.1 Pre-Processing Techniques In the recent past the web usage mining got attention of many researchers [15, 17], yet, the pre-processing in the knowledge discovery has received less attention than it deserves. Many researchers [12] are working on pre-processing that involves user identification, session identification, path completion, transaction identification. Brodley and Kohavi discussed in 2000, clipping once per session, a single session is truncated at some point within the session. The implicit unit of analysis here is a fragment of a session. Martin Arlitt [21] studied numerous user session characteristics including number of requests per session, number of pages requested per session, session length, and inter session times. Both of these techniques are time consuming and inaccurate as the web server log captures cache hits. During 2001, Berendt, B., B. Mobasher, M. Spiliopoulou, J. Wiltshire [13] provided a formal frame work that uses sessionizing heuristics which partitions the user activity log into a set of constructed sessions using time oriented heuristics and navigational oriented heuristics to improve the accuracy of sessionization. In 2002, Tan P N and Kumar V [24] explored on web robots that can perform many tasks automatically and made the web administrators’ task easy. The web robots are used by many business organizations to collect email addresses, monitor product prices, corporate news and so on. This emerged into a large proliferation of weblog. In 2003, M. Spiliopoulou, B. Mobasher, B. Berendt [22] have expressed that the reliability of web usage mining results depends heavily on the proper preparation of the input datasets. In particular, errors in the reconstruction of sessions and incomplete tracing of users’ activities in a site can easily result in invalid patterns. All the range in 2004, J. Zhang and A. A. Ghorbani [19] described an improved statistical based time oriented heuristics for the reconstruction of user sessions from a server log. Even though, some of the results for usage mining in their experiments have shown less performance. They expressed that combination of time oriented heuristics might be arriving a better performance.
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME 280 In 2005, Show-Jane Yen, Yue-Shi Lee [52] applied bit-string database generation technique as a part of preprocessing to improve the efficiency in finding the interesting association rules. Moreover, they expressed that the bit-string database generation technique cost extra memory space in the transformation process. In the subsequent year 2006, Natheer Khasawneh and Chien-Chung Chan [23] introduced a fast active user based user identification algorithm and they also presented an ontology based method that utilizes functionalities to identify sessions. After that in year 2007, Renata Ivancsy and Sandor Juhasz [26] attempted a novel approach that uses a complex cookie based method to identify web users. Furthermore, they also explained steps towards identifying individuals behind impersonal web users. Their approach is demonstrated by implementing web activity tracking system that aims at a more precise distinction of web users based on log data.. Then in the year 2008, G T Raju, P S Satyanarayana [18] implemented a complete pre processing methodology that utilizes several heuristics for cleaning web usage data. This methodology allows the analyst to merge any collection of weblog into a single file. Further, it allows the analyzer to analyze jointly on multiple weblogs. Yet, the relational data model is not suitable for the present huge weblogs, thus, it creates a scope for further research. Within the year 2009, K. R. Suneetha, Dr. R. Krishnamoorthi [20] summarized the weblog locations as server side logs, proxy side logs and client side logs along with their nature like transfer log, agent log, error log and referrer logs. They presented the web log structure and its attributes, and indicated the future work as the data mining techniques can be applied on the pre processed weblog to find frequently accessed patterns in less time with high accuracy. In 2010, V. Chitraa, A.S.D. [27] reviewed the existing work done in the pre processing stage that includes data collection and pre processing. The data collection comprises of server level, proxy level and client side. They also concluded weblog is the best source to know usage behavior. But the raw weblogs contain unnecessary details which will affect the accuracy of pattern discovery and analysis. In 2011, Ma Shuyue, Liu Wencai, Wang Shuo [28] reviewed the existing work done in the pre processing stage that includes the embedded animations in web pages and other page elements which meet the new standards can be combined with the logs to become the concerns. Therefore, the preprocessing before the web mining in weblog should become a more important research. During 2012, P. Nithya, Dr. P. Sumathi [25] proposed a novel pre-processing technique that can remove local and global noise and web robots data. However, they also expressed that intelligent techniques are required to discard the noise and the data accessed by web robots automatically as their future directions. Recently in 2013, Chintan R. Varnagar, Nirali N. Madhak, Trupti M. Kodinariya, Jayesh N. Rathod [14] provided a detailed survey of work done so far on data collection and pre-processing stage of web usage mining. They endorsed that web log data pre-processing is very important and crucial task in entire process. This phase can be strengthened by choosing and applying various intelligent techniques. 2.2 Storage Models Valtchev P. and Missaoui R. [37] proposed a framework in the year 2001 that design a generic procedure for lattice building. This new lattice building approach is more general than the previous lattices, which is built on the basis of theory of lattice. They expressed that the features of growing data sought the development of incremental algorithms as their future work. Yet, the major known incremental techniques are lagging in theoretical foundation.
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME 281 The concept of lattices derived using universally quantified expressions during the year 2002 by JL. Pfaltz and CM. Taylor [33]. They have shown the concept of incremental lattices and their associated logical implications along with scientific observations are generated. They profoundly expressed the incremental discovery is as rightly suitable for growing data like weblogs. In the subsequent year 2003, F. Masseglia, P. Poncelet, and M. Teisseire [31] considered the problem of incremental mining and presented a new algorithm for mining frequent sequences in the updated database. They specifically expressed in their future avenue that incremental mining is also appropriate for web usage mining, where the modifications need to be taken into account in order to save storage space as previous information is no longer of interest or becomes invalid. In 2004, Show-Jane Yen, Yue-Shi Lee and Chung-Wen Cho [36] implemented an incremental updating technique to maintain the discovered frequent traversal patterns when the user sequences are inserted into the database. The experimental results have shown that the algorithm is efficient for the maintenance of mining frequent traversal patterns. Shao M W. presented the approaches of attribute reduction and object reduction for two kinds of generalized concept lattices in the year 2005, in which they removed the attributes and objects that are not essential to the generalized lattices. During 2006, Li H R, Zhang W X, Wang H [34] investigated the attribute classification and reduction of lattices using binary relations. They also presented two kinds of recognition methods of attribute classification based on the properties of irreducible elements and its congruence. According to the classification a reduction method of lattices is obtained. In the year 2007, Graham Cormode, Flip Korn, S. Muthukrishnan and Divesh Srivastava [32] proposed an algorithm based on product of hierarchical dimensions from mathematical lattice structure. They found importance of hierarchical multidimensional summarization of data in emerging data stream applications. However, there is no clear method how to discard nodes that do not qualify the minimum threshold. Ben Martin and Peter Eklund [39] proposed a boarder algorithm for making the covering relations of concept explicit for iceberg lattices in the year 2008. Empirical testing has been performed to compare the boarder algorithm with a traditional algorithm based on covering edges algorithm from concept of data analysis. An incremental data mining algorithm has been proposed in 2009 by Yue-Shi Lee [39] towards website redesign to improve the web services based on navigational patterns. Moreover, they expressed that this technique can not only be used for website design but is also able to analyze user behavior. In 2010, Eklund, Peter, Villerd, Jean [30] projected hybrid visual representation techniques of concept lattices concentrating on line diagrams. However, combining too many value attributes resulted in complex nested diagram, useful for deep analysis but not suitable for first glance navigation. In 2011, Andreas Lubcke, Veit Koppen, and Gunter Saake [38] introduced a decision approach based design process concerning different storage architectures. Recently in 2012, Santo Lombardo, Elisabetta Di Nitto and Danilo Ardagna, expressed the necessity of development of hybrid architecture for storage models as their future work. 2.2 Pattern Discovery Techniques In the year 2001, Lu, S., Hu, H., and Li, F. [48] presented vertical and mixed weighed association rules on static datasets to determine correlations between items. The experiments conducted by them retrieved better predictive ability. This method has also been demonstrated on static and synthetic datasets.
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME 282 During 2002, C. Ezeife, Y. Su [40] presented frequent pattern tree structure to reduce the required number of database scans. DB Tree, PotFP Tree algorithm are the proposed algorithms applied on large databases. The discovery of closed patterns on multi dimensional patterns is an interesting future direction from their work. All the range in 2003, J. Fong, H.K. Wong, S.M. Huang [45] introduced a frame metadata model to facilitate the continuous association rules generation in data mining. A new set of association rules can be derived with the update of source databases with this model using static and active classes. Mining association rules on weblog data is the future direction. An efficient method presented in 2004 by Harms, S.K. and Deogun [43] for mining frequent association rules from multiple data sets. This work highlighted the importance of user constraints by introducing antecedent and consequent patterns in the mining process on the association rules. In the next year 2005, F.C Tseng, C.C. Hsu and K.S. Fu [42] established a simpler and more efficient data structure for representing the frequent pattern list. The technique partitioned both the search space and solution space so as to apply divide and conquer approach in mining frequent patterns. During 2006, Huang, Y.M., Kuo, Y-H., Chen, J.N. and Jeng, Y.L [44] developed a navigational pattern tree NP-Miner for discovering sequential patterns. Most of the authors uttered about dynamically maintaining and updating of knowledge base and comprehensive evaluating methods in knowledge discovery process. In the subsequent year 2007, Dalamagas, T., Bouros, P., Galanis, T., Eirinaki, M. and Sellis, T.K [41] provided a set of mining tasks intended for user navigation patterns in focus of personalizing topic directories according to the navigational behavior of the users. The efforts made by many of the researchers concluded that among all the data mining algorithms of association rules, incremental algorithms fit better for growing large databases. All the range in 2008, J L Balcazar [47] studied and explored the concept of redundancy among association rules from a fundamental perspective. They discussed several existing alternative definitions of redundancy between association rules and provided new characterizations and relationships among them. They also provided a sound and complete calculus to construct deduction scheme for redundancy rules. They also analyzed the risk degree of lost rules based on the incremental mining. Przemysław Kazienko [51] reviewed indirect association rules and presented a new approach to discover indirect associations existing between pages that rarely occurred in 2009. Their experimental results revealed the usefulness of indirect rules in the weblog scenario. Priyanka Makkar, Payal Gulati, Dr. A.K. Sharma [50] have presented recently in the year 2010, an approach for predicting user behavior to improve web performance. They also expressed that web pre fetching became an attractive solution where in forthcoming page accesses of a user are predicted based on weblog. In 2011, Liu Jian, Wang Yan-Qing [49] presented a research frame work that makes a contribution to web mining. Their experimental results show that there is still space for improvement to optimize the solution by employing advanced techniques. During 2012, Mahendra Pratap Yadav, Pankaj Kumar Keserwani, Shefalika Ghosh Samaddar proposed an adaptive algorithm for incremental mining of association rules. The algorithm is a highly efficient incremental mining technique and the authors expressed it can be applied in the weblog scenario. Recently in 2013, Johannes K. Chiang, Rui-Han Yang [46] proposed an approach which includes a novel data structure and an efficient algorithm for mining association rules on various granularities. However, their test results shown its performance, efficiency and scalability better than
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME 283 the current approaches. But the effects of perceived issues and potential development of data mining and concept description are worthy of further investigation. 2.4 Optimization Techniques In the year 2002, H. K. Tsai, J. M. Yang, and C. Y. Kao. [56] demonstrated the usage of GA in finding the optimal global strategies by using clustering technique on biological datasets. The extensive bibliography provided by them is an evidence of the relevance of usage of GAs in web mining. Then in 2003, B. Minaei-Bidgoli, William F. Punch [53] presented an approach for classifying the students in order to predict their final grade based on features extracted from log data of an education web based system. To minimize the prediction error rate they used genetic algorithms by weighting the features. They also provided comparison study of several crossover operators. In the next year 2004, B. Minaei-Bidgoli, G. Kortemeyer, and W. F. Punch extended their previous work [53] and presented a new approach for predicting students’ performance based on extracting the average of feature values for overall of the problem. S. Y. Wang and K. Tai [60] in 2005, implemented a bit array representation method for structural topology optimization using the genetic algorithm. An identical initialization method is also proposed to improve the genetic algorithm performance in dealing with problems with narrow design domains. Their results specified that bit array representation is suitable in handling the design connectivity problem. In the next year 2006, S. Y. Wang, K. Tai, and M. Y. Wang [61] presented a versatile, robust and enhanced genetic algorithm for structural topology optimization using problem specific knowledge. In their implementation process specifically pronounced the importance of choosing appropriate representation techniques, genetic operators and evaluation methods. During 2008, S. Ventura, C. Romero, A. Zafra, J. A. Delgado, and C. Hervas [59] designed a framework that can apply to maximize reusability and availability of evolutionary computation with a minimum effort in web mining. The heavily demanding computational performance is an open problem as earmarked in their future research work. Hyunchul Ahn, Kyoung-jae Kim [57] reviewed prior studies on optimization techniques for several systems in 2009. They further examined genetic approach for optimization of feature weights and relevant instances for similarity calculations. They also mentioned in their limitations that the size of the population and the number of generations for genetic algorithm is very huge. Thus, reducing the size of population and number of generations for genetic algorithm is an open challenge. Recently in the year 2010, Mehmet Kaya [58] proposed a novel method using multi objective evolutionary algorithm that extracts the patterns automatically. This method applied on dataset with a sequential character. The methodology of automatic extraction is a promising future research as mark down in their conclusions. In 2011, Diana Martın, Alejandro Rosete, Jesus Alcala-Fdez and Francisco Herrera [54] extended the well-known multi-objective evolutionary algorithms to perform learning of the intervals of attributes and a condition selection in order to mine a set of optimum association rules with accuracy. During 2012, Xiaoyan Sun, Lei Yang, Dunwei Gong and Ming Li, [62] studied that collective intelligence extracted from multiple users enhance the performance of GA. They felt that designing evolutionary algorithm is a promising research direction in the knowledge discovery process as mentioned in their future research directions.
  • 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME 284 Recently in 2013, Gaurav Dubey, Arvind Jaiswa [55] have dealt the challenge of association rule mining problem in finding frequent itemsets using GA based method. However, they noticed that a more extensive empirical evaluation of their proposed method is a promising future research. 2.5 Pattern Analysis Techniques During 2001, Hilderman, R. J. and Hamilton, H. J focused on classifying interestingness measure and provided general overview of more successful and widely used interestingness measures from the literature that have been employed in data mining applications. They expressed extending theory of interestingness for diversity measures is an open for future research work. All the range in 2002, Keim, D.A. [69] proposed a classification of visualization techniques which is based on the data type to be visualized. And they articulated tight integration of visualization techniques with traditional techniques for their future work. Grinstein, G., Hoffman, P., & Pickett, R described a set of benchmarking for visualization approaches. Although benchmarking is made, further study and contribution from the research community and the industry is required. In the year 2003, Brijs T., Vanhoof K. and Wets G [63] provided an overview of existing measures of interestingness and divided them into objective and subjective measures. They focused on objective measures by means of statistical criteria. In the next year 2004, Jaroszewicz, S. and Simovici, D A [68] proposed a new definition of interestingness as the absolute difference between its support estimated from the data and from the Bayesian network. In addition, they presented an efficient algorithm based on the new definition. Their experimental evolution proved usefulness of the algorithm for finding interesting, unexpected patterns. During 2005, Furnkranz, J. and Flach, P. A [64] provided analysis of behavior of covering rule algorithms by visualizing their evaluation metrics and their dynamics and coverage space. They described heuristics for evaluating rules as well as filtering and stopping criteria. Their experimental results proved that covering algorithm suitable for understanding both the behavior of heuristic functions and dynamics. In 2006, Padmanabhan, B. and Tuzhilin [71] presented a new method for discovering a minimal set of unexpected patterns by combining two independent concepts of minimality and unexpectedness, both of which have been well studied in the KDD literature. They demonstrated the strength of this approach experimentally. In the next year 2007, Heng-Soon Gan and Andrew [65] defined rescheduling stability quantitatively and have provided analytical mean for various heuristics. Rescheduling stability of heuristics is also important apart from effectiveness and efficiency. They extended empirical and analytical work on heuristic robustness. In their future research work, considering Spearman’s foot rule, a measure of permutation disarray may shed some further light on heuristics. During 2008, Vitaly Friedman [72] proposed USER approach that finds unexpected sequences and implication rules from data with user defined beliefs for mining unexpected behaviors from weblogs. As the unexpected behaviors impact the web usage analysis and many of the times the identification of unexpectedness depends on semantics and user beliefs, measure of unexpectedness elevated as an open research. In the next year 2009, Michael Friendly [70] surveyed the visualization techniques from the deep roots to the current fruit. Their experimental results triggered interesting future research paths towards automation of the process, quality, scalability and intelligence of the sensitivity measure. In 2011, Izwan Nizal Mohd, Shaharanee, Fedja Hadzic, Tharam S. Dillon [67] have proposed a strategy that combines data mining and statistical measurement techniques, including redundancy analysis, sampling and multivariate statistical analysis, to discard the non- significant rules. Their
  • 11. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME 285 experimental results are evident that show their framework managed to reduce a large number of non-significant and redundant rules while at the same time preserving relatively high accuracy. Recently in 2013, David H. Glass focused on a particular class of objective measures known as confirmation measures. Their proposed class of objective measures provided a solid basis for future research. 3. CONCLUSIONS The literature is clearly evident that, web usage mining is a promising and attractive task of web mining. This extensive research study noticed and emphasized that the usage characterization consists of mainly five interdependent stages: pre-processing, storage models, pattern discovery, optimization techniques and pattern analysis. The authors in the present paper also observed the importance, criticality and efficiency of comprehensive approach in the process of web usage mining and which has been triggered as the formal basis for the future. Furthermore, the literature survey has recognized that implementation of interdependent stages comprehensively is a promising and practical research area. 4. FUTURE WORK As a future work, a new approach will be designed and developed to concentrates comprehensively on all stages of web usage mining and to leverage the strengths of individual techniques. In addition, the comprehensive approach is planned to test with different weblogs that cover a large spectrum of various applications, such as, web usage analysis for improvements in fraud detection, product analysis and customer segmentation. Further, future efforts, extension of comprehensive model can exploit and enable an effective integration and mining of content, usage and structure web log data, promise to lead to the next generation of useful and intelligent tools for web mining that can derive real time knowledge from user transactions on the web. REFERENCES [1] B. Masand, M. Spiliopoulou, J. Srivastava, O.R. Zaiane, Proceedings of WebKDD2002, “Web Mining for Usage Patterns and Profiles”, Edmonton, CA, 2002. [2] B. Mobasher, R. Cooley, J. Srivastava, “Automatic Personalization Based on Web Usage Mining” Communications of the ACM, 43(8), pp: 142–151, 2000. [3] E. Frias-Martinez, V. Karamcheti, “A customizable behavior model for temporal prediction of web user sequences”, In WEBKDD, Explorations, 1(2), pp: 66–85, 2000. [4] Geeta R. B., Prof. Shashikumar, G. Totad, Dr. Prasad Reddy, “Amalgamation of Web Usage Mining and Web Structure Mining”, International Journal of Recent Trends in Engineering, Vol. 1, No. 2, pp: 279-281, 2009. [5] Guandong Xu, Yanchun Zhang, Xun Yi, “Modelling User Behaviour for Web Recommendation Using LDA Model”, IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp: 529-532, 2008. [6] Han, J., Chang, K. C., “Data mining for Web intelligence”, IEEE Computer, 35(11), pp: 64-70, 2002. [7] Joshi K. P., Joshi A., Yesha Y., Krishnapuram R., “Warehousing and mining web logs”, In Proceedings of the 2nd ACM CIKM Workshop on Web Information and Data Management, Kansas City, Missouri, USA 1999, pp: 63–8, 1999.
  • 12. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME 286 [8] Kao, H., Lin, S., Ho, J., Chen, M., “Mining Web Informative Structures and Contents Based on Entropy Analysis”, IEEE Transactions on Knowledge and Data Engineering, Vol.16, Iss.1, pp: 41 – 55, 2004. [9] Massimiliano Albanese, Antonio Picariello, Carlo and Lucio Sansone, “Web Personalization Based on Static Information and Dynamic User Behavior”, ACM 1-58113-978-0/04/0011, 2004. [10] Qingtian Han, Xiaoyan Gao, Wenguo Wu, “Study on Web Mining Algorithm Based on Usage Mining”, IEEE Xplore, pp: 1121-1124, 2008. [11] R. Kohavi, M. Spiliopoulou, J. Srivastava, Proceedings of “WebKDD2000 Web Mining for E-Commerce – Challenges & Opportunities”, Boston, MA, 2000. [12] Berendt B. et al., “The Impact of Site Structure and User Environment on Session Reconstruction in Web Usage Analysis”, Proc. WEBKDD 2002: Mining Web Data for Discovery Usage Patterns and Profiles, LNCS 2703, Springer-Verlag, pp: 159–179, 2002. [13] Berendt, B., B. Mobasher, M. Spiliopoulou, J. Wiltshire., “Measuring, the accuracy of sessionizers for web usage analysis”, Proc.of the Workshop on Web Mining, First SIAM Internat.Conf. on Data Mining, Chicago, IL, pp: 7–14, 2001. [14] Chintan R. Varnagar, Nirali N. Madhak, Trupti M. Kodinariya, Jayesh N. Rathod “Web Usage Mining: A Review on Process, Methods and Techniques” IEEE Conference Publications, pp: 40-46, 2013. [15] Chungsheng Zhang, Liyan Zhuang, “New Path Filling Method on Data Preprocessing in Web Mining”, Computer and Information Science, vol1(3), pp: 112-115, 2008. [16] Fenstermacher, K.D., M. Ginsburg, “Client-side monitoring for web mining”, Journal of the American Society for Information Science and Technology, Vol. 54, No. 7, pp: 625-637, 2003. [17] G. Castellano, A. M. Fanelli, M. A. Torsello, “Log Data Preparation For Mining Web Usage Patterns”, IADIS International Conference Applied Computing, 2007. [18] G T Raju, P S Satyanarayana, “Knowledge Discovery from Web Usage Data: Complete Preprocessing Methodology”, IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, pp: 179- 186, 2008. [19] J. Zhang, A. A. Ghorbani, “The reconstruction of user sessions from a server log using improved time oriented heuristics” in CNSR, IEEE Computer Society, pp: 315–322, 2004. [20] K. R. Suneetha, Dr. R. Krishnamoorthi, “Identifying User Behavior by Analyzing Web Server Access Log File”, IJCSNS International Journal of Computer Science and Network Security, VOL.9 (4), pp: 327-332, 2009. [21] Martin Arlitt, “Characterizing Web User Sessions” Internet and Mobile Systems Laboratory HP Laboratories Palo Alto HPL- 2000-43, May, 2000. [22] M. Spiliopoulou, B. Mobasher, B. Berendt, M. Nakagawa, “A Framework for the Evaluation of Session Reconstruction Heuristics in Web Usage Analysis”. INFORMS Journal of Computing - Special Issue on Mining Web-Based Data for E-Business Applications, 15 (2), pp: 171–190, 2003. [23] Natheer Khasawneh, Chien-Chung Chan, “Active User-Based and Ontology-Based Web Log Data Preprocessing for Web”, Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, 2006. [24] Pang-Ning Tan, Vipin Kumar, “Discovery of Web Robot Sessions based on their Navigational Patterns”, Data Mining and Knowledge Discovery, 6(1), pp: 9-35, 2002. [25] P. Nithya, Dr. P. Sumathi “Novel Pre-Processing Technique for Web Log Mining by Removing Global Noise and Web Robots”, IEEE Conference Publications, 2012. [26] Renata Ivancsy, Sandor Juhasz, “Analysis of Web User Identification Methods, World Academy of Science, Engineering and Technology”, pp: 338-345, 2007.
  • 13. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME 287 [27] V. Chitraa, A.S.D., "A Survey on Preprocessing Methods for Web Usage Data," (IJCSIS) International Journal of Computer Science and Information Security, Vol. 7( 3), 2010. [28] Ma Shuyue, Liu Wencai, Wang Shuo,” The Study on the Preprocessing in Web Log Mining”, IEEE Conference Publications, pp: 1-5, 2011. [29] Ben Martin, Peter Eklund, “From Concepts to Concept Lattice: A Border Algorithm for Making Covers Explicit”, ICFCA 2008, Springer-Verlag Berlin Heidelberg, pp: 78-89, 2008. [30] Eklund, Peter, Villerd, Jean, “A Survey of Hybrid Representations of Concept Lattices in Conceptual Knowledge Processing”, Lecture Notes in Computer Science, Springer Berlin/Heidelberg, pp: 296- 31, 2010. [31] F. Masseglia, P. Poncelet, M. Teisseire., “Incremental mining of sequential patterns in large databases”, Data Knowledge Engineering, 46(1), pp: 97-121, 2003. [32] Graham Cormode, Flip Korn, S. Muthukrishnan, Divesh Srivastava, “Finding Hierarchical Heavy Hitters in Streaming Data”, ACM Transactions on Database Systems, Vol. V, No. N, 2007. [33] JL. Pfaltz, CM. Taylor, “Scientific discovery through iterative transformations of concept lattices”, In Workhop on Discrete Applied Mathematics, in conjunction with the 2nd SIAM International Conference on Data-Mining, pp: 65–74, 2002. [34] Li H R, Zhang W X, Wang H., “Classification and reduction of attributes in concept lattices”, Proc of IEEE International Conference on Granular Computing. Los Alamitos: IEEE Computer Society, pp: 142-147, 2006. [35] Shao, M W., “The reduction for two kind of generalized concept lattice”, Proceedings of the 4th International Conference on Machine Learning and Cybernetics, Berlin: Springer, pp: 2217-2222, 2005. [36] Show-Jane Yen, Yue-Shi Lee, Chung-Wen Cho, “An Efficient Approach for the Maintenance of Path Traversal Patterns”, In Proceedings of IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE), pp: 207-214, 2004. [37] Valtchev P., Missaoui R., “Building Concept (Galois) Lattices from arts: Generalizing the Incremental Methods”, In Proceedings of the 9th International Conference on Conceptual Structures (ICCS 2001), USA, pp: 290-303, 2001. [38] Andreas Lubcke, Veit Koppen, and Gunter Saake, “A Decision Model to Select the Optimal Storage Architecture for Relational Databases”, IEEE Conference Publications, pp: 1-11, 2011. [39] Yue-Shi Lee, “A Lattice-Based Framework for Interactively and Incrementally Mining Web Traversal Patterns”, DOI: 10.4018/978-1-59904-990-8, ch027, 2009. [40] C. Ezeife, Y. Su, “Mining incremental association rules with generalized FP-tree,” in Lecture Notes in Computer Science, LNCS 2338, Springer- Verlag, pp: 147-160, 2002. [41] Dalamagas, T., Bouros, P., Galanis, T., Eirinaki, M., Sellis, T.K., “Mining user navigation patterns for personalizing topic directories”, Proc. 9th ACM International Workshop on Web Information and Data Management, Lisbon, Portugal, pp: 81-88, 2007. [42] F.C Tseng, C.C. Hsu, K.S. Fu, “The Frequent Pattern List: Another Framework for Mining Frequent Patterns,” International Journal of Electronic Business Management, vol. 3, no. 2, pp: 104-115, Feb, 2005. [43] Harms, S.K., Deogun, J.S., “Sequential association rule mining with time lags”, Journal of Intelligent Information Systems, Vol. 22, No. 1, pp: 7-22, 2004. [44] Huang, Y.M., Kuo, Y-H., Chen, J.N., Jeng, Y.L., “NP-miner: A real-time recommendation algorithm by using web usage mining”, Knowledge Based Systems, Vol. 19, No. 4, pp: 272-286, 2006. [45] J. Fong, H.K. Wong, S.M. Huang, “Continuous and incremental data mining association rules using frame metadata model”, Knowledge-Based Systems 16, Elsevier, pp: 91-100, 2003.
  • 14. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME 288 [46] Johannes K. Chiang, Rui-Han Yang, “Multidimensional Data Mining for Discover Association Rules in Various Granularities”, IEEE Conference Publications, pp: 1-6, 2013. [47] J L Balcazar, “Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules” Pascal Report 4259, 2008. [48] Livingstone, G., Rosenberg, J., Buchanan, B., “An agenda and justification based framework for discovery systems”, Knowledge and Information Systems 5(2), pp: 133-161, 2003. [49] Liu Jian, Wang Yan-Qing, “Web Log Data Mining Based on Association Rule”, Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp: 1855-1859, 2011. [50] Priyanka Makkar, Payal Gulati, Dr. A.K. Sharma, “A Novel Approach for Predicting User Behavior for Improving Web Performance”, International Journal on Computer Science and Engineering, VOL. 02, No. 04, pp: 1233-1236, 2010. [51] Przemysław Kazienko, “Mining Indirect Association Rules For Web Recommendation” International Journal of Applied Mathematics and Computer Science, Vol. 19, No. 1, pp: 165-186, 2009. [52] Show-Jane Yen, Yue-Shi Lee, “An efficient data mining approach for discovering interesting knowledge from customer transactions”, Expert Systems with Applications, Elsevier, pp: 1-8, 2005. [53] B. Minaei-Bidgoli, William F. Punch, “Using Genetic Algorithms for Data Mining Optimization in an Educational Web-based System”, http://www.lon-capa.org, 2003. [54] 171 Diana Martın, Alejandro Rosete, Jesus Alcala-Fdez and Francisco Herrera, “A Multi- Objective Evolutionary Algorithm for Mining Quantitative Association Rules”, IEEE Conference Publications, pp: 1397-1402, 2011. [55] Gaurav Dubey, Arvind Jaiswal, “Identifying Best Association Rules and Their Optimization Using Genetic Algorithm”, International Journal of Emerging Science and Engineering (IJESE), Volume-1, Issue-7, pp: 91-96, 2013. [56] H. K. Tsai, J. M. Yang, C. Y. Kao., “Applying genetic algorithms to finding the optimal Gene order in displaying the microarray data”, In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pp: 610-617, 2002. [57] Hyunchul Ahn, Kyoung-jae Kim, “Bankruptcy prediction modeling with hybrid case-based reasoning and genetic algorithms approach, Applied Soft Computing, Volume 9, Issue 2, pp: 599–607, 2009. [58] Mehmet Kaya, “Automated extraction of extended structured motifs using multi-objective genetic algorithm” Expert Systems with Applications, Volume 37, Issue 3, pp: 2421-2426, 2010. [59] S. Ventura, C. Romero, A. Zafra, J. A. Delgado, C. Hervas, “JCLEC: A java framework for evolutionary computation soft computing.” Soft Computing, vol. 4, no. 12, pp: 381–392, 2008. [60] S. Y. Wang, K. Tai. “Structural topology design optimization using genetic algorithms with a bit-array representation”, Computer Methods in Applied Mechanics and Engineering, 194, pp: 3749-3770, 2005. [61] S. Y. Wang, K. Tai, M. Y. Wang. “An enhanced genetic algorithm for structural topology optimization”, International Journal for Numerical Methods in Engineering, 65, pp: 18-44, 2006. [62] Xiaoyan Sun, Lei Yang, Dunwei Gong and Ming Li, “Interactive Genetic Algorithm Assisted with Collective Intelligence from Group Decision Making”, IEEE World Congress on Computational Intelligence, pp: 1-8, 2012. [63] Brijs, T., Vanhoof, K., Wets, G., “Defining interestingness for association rules”, International Journal of Information Theories and Applications, 10(4), pp: 370–376, 2003.
  • 15. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 12, December (2014), pp. 275-289 © IAEME 289 [64] Furnkranz, J., Flach, P. A., “ROC ‘n’ rule learning: Towards a better understanding of covering algorithms” Mach. Learn. 58, (1), pp: 39–77, 2005. [65] Heng-Soon, Gan., Andrew, “Wirth Heuristic stability: A permutation disarray measure”, Computers & Operations Research, Volume 34, Issue 11, pp: 3187-3208, 2007. [66] Hilderman, R. J., Hamilton, H. J., “Evaluation of interestingness measures for ranking discovered knowledge”, Lecture Notes in Computer Science 2035, pp: 247–259, 2001. [67] Izwan Nizal Mohd, Shaharanee, Fedja Hadzic, Tharam S. Dillon, “Interestingness measures for association rules based on statistical validity”, Knowledge-Based Systems, pp: 386–392, 2011. [68] Jaroszewicz, S., Simovici, D. A., “Interestingness of frequent itemsets using Bayesian networks as background knowledge”, in Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, pp: 178–186, 2004. [69] Keim, D.A., “Information visualization and visual data mining”, IEEE Transactions On Visualization and Computer Graphics 7, pp: 100–107, 2002. [70] Michael Friendly, “Milestones in the history of thematic cartography, statistical graphics, and data visualization”, 2009. [71] Padmanabhan, B., Tuzhilin, A., “On characterization and discovery of minimal unexpected patterns in rule discovery”, IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 2, pp: 202–216, 2006. [72] Vitaly Friedman, “Data Visualization and Infographics” in: Graphics, Monday Inspiration, January 14th, 2008. [73] Ravita Mishra, “Web Usage Mining Contextual Factor: Human Information Behavior”, International Journal of Information Technology and Management Information Systems (IJITMIS), Volume 5, Issue 1, 2014, pp. 12 - 29, ISSN Print: 0976 – 6405, ISSN Online: 0976 – 6413. [74] Suresh Subramanian and Dr. Sivaprakasam, “Genetic Algorithm with a Ranking Based Objective Function and Inverse Index Representation for Web Data Mining”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 5, 2013, pp. 84 - 90, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [75] Jaykumar Jagani and Prof. Kamlesh Patel, “An Enhanced Algorithm for Classification of Web Data for Web Usage Mining using Supervised Neural Network Algorithms”, International Journal of Computer Engineering & Technology (IJCET), Volume 5, Issue 4, 2014, pp. 48 - 56, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.