SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
12
WEB USAGE MINING CONTEXTUAL FACTOR: HUMAN
INFORMATION BEHAVIOR
Ms. Ravita Mishra
Information Technology Dept, Ramrao Adik Institute of Technology, Nerul Navi Mumbai,
India
ABSTRACT
With the rapid development of information technology, the World Wide Web has
been widely used in various applications, such as search engines, online learning and
electronic commerce. These applications are used by a diverse population of users with
heterogeneous backgrounds, in terms of their knowledge, skills, and needs. Therefore, human
factors are key issues for the development of web-based application and research. This paper
first identifies reviews from different authorsand then examines the three important human
factors: gender differences, prior knowledge, and cognitive styles. The review result is not
significantly correct; a new model is proposed that will access the data (log data) and show
the human access behavior. The proposed model has two stages: web intelligence and
navigation pattern. Stage 1(web intelligence system) captures data from different server and
converts in the form of table (data store). Stage 2 uses the N-gram algorithm which assumes
that the last N-pages browsed affect the probability of the next page to be visited, and user
navigation sessions are modelled as a hypertext probabilistic grammar whose higher
probability strings correspond to the user’s preferred trails.In this paper web caching and pre-
fetching are two important approaches used to reduce the noticeable response time perceived
by users.The model improves the navigation pattern of users and find the users behavior (
gender difference and user type) that finding is used by site designer and researchers and also
used for detecting and avoiding the terror threats caused by terrorists all over the world.The
paper is organized into five different parts, first part contain introduction, second part contain
different type of web mining third part contain usage mining on the web and forth part
contain analysis of human factor and evaluation technique,fifth part contain propose
methodology and last part contains application, limitation, conclusion and further work.
Keywords: Pattern Discovery, Contextual factor, Information Retrieval, N-gram,
Gender difference, Cognitive style and Prior experience.
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY &
MANAGEMENT INFORMATION SYSTEM (IJITMIS)
ISSN 0976 – 6405(Print)
ISSN 0976 – 6413(Online)
Volume 5, Issue 1, January - April (2014), pp. 12-29
© IAEME: http://www.iaeme.com/IJITMIS.asp
Journal Impact Factor (2013): 5.2372 (Calculated by GISI)
www.jifactor.com
IJITMIS
© I A E M E
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
13
1.INTRODUCTION
Web mining is a very hot research topic which combines two of the activated research
areas: Data Mining and World Wide Web. The Web mining research relates to several
research communities such as Database, Information Retrieval and Artificial Intelligence.
Web mining is categorized in into three areas: Web content mining, Web structure mining
and Web usage mining. Web content mining focuses on the discovery/retrieval of the useful
information from the web contents/data/documents, while the Web structure mining
emphasizes to the discovery of how to model the underlying link structures of the web [14,
16]. Web usage mining is relative independent, but not isolated, category, which mainly
describes the techniques that discover the user's usage pattern and try to predict the user's
behaviors. Web mining is the term of applying data mining techniques to automatically
discover and extract useful information from the World Wide Web documents and services
[16]. Here, human factors are increasingly seen as important issues, as reflected in the
substantial number of existing studies in the area. Among various human factors, gender
differences (e.g., Roy, Taylor, & Chi, 2003), prior knowledge (e.g., Calisir&Gurel, 2003) and
cognitive styles (e.g., Chen &Macredie, 2004) have significant impacts on web-based
interaction. Furthermore, these three human factors have certain inter-relations. For example,
females tend to behave similarly to novices, in terms of the extent to which they experience
disorientation problems; males and experts seem to have similar preferences in their
interaction patterns, with studies reporting that they enjoy non-linear interaction (Ford &
Chen, 2000). Despite the growing number of studies looking at these three human factors,
there is a lack of an integrated review which synthesizes their effects.
2. WEB DATA MINING
2.1 Overview: Today, with the tremendous growth of the data sources available on the Web
and the dramatic popularity of e-commerce in the business community, Web mining has
become the focus of quite a few research projects and papers [13, 14, and 15]. In previous
research, they suggested a similar way to decompose web mining into the following subtasks:
Resource Discovery: The task of retrieving the intended information from web.
Information Extraction: Automatically selecting and pre-processing specific information
from the retrieved web resources. Generalization: Automatically discovers general patters at
the both individual web sites and across multiple sites. Analysis: Analyzing the mined
pattern. The authors of [10] claims the web involves three types of data: data on the web
(content), web log data (usage) and web structure data. The author classified the data type as
content data, structure data, usage data, and user profile data.
2.1.1 Web Content Mining: Web content mining describes the automatic search of
information resourceavailable online and involves mining web data contents. The web
document usually contains several types of data, such as text, image, audio, video, metadata
and hyperlinks. The technologies that are normally used in web content mining are NLP and
IR. Some of them are semi-structured such as HTML documents or a more structured data
like the data in the tables or database generated HTML pages, butmost of the data is
unstructured text data [14].
2.1.2 Web Structure Mining: Technically, web content mining mainly focuses on the
structure of inner-document, while web structure mining tries to discover the link structure of
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
14
the hyperlinks at the inter-document level. Based on the topology of the hyperlinks, web
structure mining will categorize the web pages and generate the information, such as the
similarity and relationship between different web sites. Web structure mining can also have
another direction – discovering the structure of web document itself. This type of structure
mining can be used to reveal the structure (schema) of web pages; this would be good for
navigation purpose and make it possible to compare/integrate web page schemes. The
structural information generated from the web structure mining includes the following: the
information measuring the frequency of the local links in the web tuples in a web table; the
information measuring the frequency of web tuples in a web table containing links that are
interior and the links that are within the same document; the information measuring the
frequency of web tuples in a web table that contains links that are global and the links that
span different web sites; the information measuring the frequency of identical web tuples that
appear in a web table or among the web tables [15,20]. In general, if a web page is linked to
another web page directly, or the web pages are neighbors, we would like to discover the
relationships among those web pages. The relations maybe fall in one of the types, such as
they related by synonyms or ontology, they may have similar contents, and both of them may
sit in the same web server therefore created by the same person [13, 14].
2.1.3 Web Usage Mining: Analyzing the web access logs of different web sites can help
understand the user behaviour and the web structure, thereby improving the design of this
colossal collection of resources. There are two main tendencies in web usage mining driven
by the applications of the discoveries: General Access Pattern Tracking and Customized
Usage Tracking. The general access pattern tracking analyzes the web logs to understand
access patterns and trends. These analyses can be used for better structure and grouping of
resource providers. Applying data mining techniques on access logs unveils interesting access
patterns that can be used to restructure sites in a more efficient grouping, pinpoint effective
advertising locations, and target specific users for specific Selling ads. Customized usage
tracking analyzes individual trends. Its purpose is to customize web sites to users. The
information displayed the depth of the site structure and the format of the resources can all be
dynamically customized for each user over time based on their access patterns.
2.2. STEPS IN WEB MINING
Web usage mining falls in three areas 1: Pre-processing 2: Pattern discovery 3:
Pattern analysis. Preprocessing further categorized into three parts.
2.2.1 Pre-processing: Pre-processing is categorized in three types they are: Content Pre-
processing, Structure Pre-processing and Usage Pre-processing. Content preprocessing is the
process of converting text, image, scripts and other files into the forms that can be used by
the usage mining. For the content of static page views, the preprocessing can be easily done
by parsing the HTML and reformatting the information or running additional algorithm as
desired [15].The structure preprocessing can be treated similar as the content preprocessing.
However, each server session may have to construct a different site structure than others [13,
15].The inputs of the preprocessing phase may include the Web server logs, referral logs,
registration files, index server logs, and optionally usage statistics from a previous analysis.
The outputs are the user session file, transaction file, site topology, and page classifications.
It’s always necessary to adopt a data cleaning techniques to eliminate the impact of the
irrelevant items to the analysis result. Without sufficient data, it is very difficult to identify
the users [14].The session identification is also a part of the usage preprocessing. The goal of
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
15
it is to divide the page accesses of each user, who is likely to visit the Web site more than
once, into individual sessions. The simplest way to do is to use a timeout to break a user’s
click-stream into session. Another problem is named as path completion, which indicates the
determining if there are any important accesses missed in the access log. The methods used
for the user identification can be used for path completion. The final procedure of the pre-
processing is formatting, which is a preparation module to properly format the sessions or
transactions.
2.2.2 Pattern Discovery
Pattern discovery converges the algorithms and techniques from several research
areas, such as data mining, machine learning, statistics, and pattern recognition. Pattern
discovery falls in following categories: Statistical Analysis, Association Rules, Clustering,
Classification, Sequential Pattern and Dependency Modeling. Statistical techniques are the
most powerful tools in extracting knowledge about visitors to a Web site. The analysts may
perform different kinds of descriptive statistical analyses based on different variables when
analyzing the session file [13]. By analyzing the statistical information contained in the
periodic web system report, the extracted report can be potentially useful for improving the
system performance, enhancing the security of the system.Association rule mining techniques
can be used to discover unordered correlation between items found in a database of
transactions [13]. The association rules refer to sets of pages that are accessed together with a
support value exceeding some specified threshold. The web designers can restructure their
web sites efficiently with the help of the presence or absence of the association rules.
Clustering analysis is a technique to group together users or data items with the similar
characteristics. Clustering of user information or pages can facilitate the development and
execution of future marketing strategies [13]. Clustering of users will help to discover the
group of users, who have similar navigation pattern. It’s very useful for inferring user
demographics to perform market segmentation in E-commerce applications or provide
personalized web content to the individual users. Classification is supervised inductive
learning technique that maps a data item into one of several predefined classes. In the web
domain, Web master or marketer will have to use this technique if he/she want to establish a
profile of users belonging to a particular class or category. This requires extraction and
selection of features that best describe the properties of a given class or category [13].
Sequential Patternfinds the inter-session pattern, such that a set of the items follows the
presence of another’s in a time-ordered set of sessions.It also includes other types of temporal
analysis such as trend analysis, change point detection, or similarity analysis. It’s very useful
for the web marketer to predict the future trend, which help to place advertisements aimed at
certain user groups [13]. Dependency Modelingrepresents significant dependencies among
the various variables in the web domain [13]. The modeling technique provides a theoretical
framework for analyzing the behavior of users, and is potentially useful for predicting future
web resource consumption.
2.3 PATTERN ANALYSIS
The goal of this process is to eliminate the irrelative rules or patterns and to extract
the interesting rules or patterns from the output of the pattern discovery process. Output of
algorithms is not in the form suitable for direct human consumption, and thus need to be
transform to a format can be assimilate easily [13]. There are two most common approaches
for the pattern analysis. One is to use the knowledge query mechanism such as SQL, while
another is to construct multi-dimensional data cube before perform OLAP operations.
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online)
3. ANALYSIS OFCONTEXTUAL FACTOR
In the given framework
analysed and it includes: information exploration, seeking, filtering, use
Based on the framework, various contextual
credibility and browser dependence
economic, social, and political -
model, the user dimension is considered to be influenced by the particular task, information
need, knowledge state, cognitive style, affective state and so on. They measured users’
cognitive styles and affective states before a user study, applying a process
while users were conducting information
relationships among the elements of the dimensions
users judge cognitive authority and information quality by two types of judgment
judgment and evaluative judgment
the judgments through a user study.
Review process:Due to the massive growth of the
topic and attracts more and more attenti
extract information from data set for business needs, which determines its application is
highly customer-related. In business r
research area which demonstrates completes human information behavior based on
experimental dataset. Analysis of this factor is based on four points. 1. Gender difference 2.
Cognitive style 3. Prior experience
few commercial analysis applications available
efficient, flexible and powerful tools, lots of work needs to be done for
developer.
Figure 4.1 illustrates the review process, which consists of four stages
As shown in above Fig.
journals and search engines; here
include empirical studies related to gender differences, prior knowledge and cognitive styles.
The search terms for these electronic resources included four group
International Journal of Information Technology & Management Information System (IJITMIS),
6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
16
ANALYSIS OFCONTEXTUAL FACTOR (HUMAN INFORMATION BEHAVIOR)
In the given framework, contextual parameter human information behaviour
information exploration, seeking, filtering, use and communication
e framework, various contextual factors –user interest, difficulty, time taken,
credibility and browser dependence and their influential factor physical, cognitive, affective,
- and their implications were investigated [12]. In
, the user dimension is considered to be influenced by the particular task, information
need, knowledge state, cognitive style, affective state and so on. They measured users’
states before a user study, applying a process-tracing technique
while users were conducting information-seeking tasks, and found various types of
relationships among the elements of the dimensions. In (Rieh 2002), the authors found that
ive authority and information quality by two types of judgment
judgment and evaluative judgment – and they also identified the main facets and keywords of
the judgments through a user study.
Due to the massive growth of the e-commerce, privacy becomes a sensitive
topic and attracts more and more attention recently. The basic goal of web mining is to
extract information from data set for business needs, which determines its application is
related. In business related customer data, Human factor is
which demonstrates completes human information behavior based on
experimental dataset. Analysis of this factor is based on four points. 1. Gender difference 2.
enceand 4. Web based interaction. Although there are quite a
few commercial analysis applications available and many more are free on to develop the
efficient, flexible and powerful tools, lots of work needs to be done for both researcher
4.1 illustrates the review process, which consists of four stages
Fig.there is four major stages. Stage one search
here resources were selected because they were known to
empirical studies related to gender differences, prior knowledge and cognitive styles.
The search terms for these electronic resources included four groups: (1) Internet and
International Journal of Information Technology & Management Information System (IJITMIS),
(2014), © IAEME
BEHAVIOR)
information behaviour is
and communication.
user interest, difficulty, time taken,
physical, cognitive, affective,
cations were investigated [12]. In previous
, the user dimension is considered to be influenced by the particular task, information
need, knowledge state, cognitive style, affective state and so on. They measured users’
tracing technique
seeking tasks, and found various types of
In (Rieh 2002), the authors found that
ive authority and information quality by two types of judgment - predictive
and they also identified the main facets and keywords of
commerce, privacy becomes a sensitive
eb mining is to
extract information from data set for business needs, which determines its application is
Human factor is a fertilized
which demonstrates completes human information behavior based on
experimental dataset. Analysis of this factor is based on four points. 1. Gender difference 2.
Although there are quite a
o develop the
both researcher and
4.1 illustrates the review process, which consists of four stages
searches electronic
resources were selected because they were known to
empirical studies related to gender differences, prior knowledge and cognitive styles.
s: (1) Internet and WWW;
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
17
(2) Gender, females/males; boys/girls, and men/women; (3) Prior knowledge, system
experience, novices/experts, domain expertise, domain knowledge, computer experience,
previous experience, Internet experience; and (4) Cognitive styles, learning styles, field
dependence.Stage two analyzes search based on timeline. Stage three selects the analysis
based on titles, elements and keywords. Stage four asses the behavior based on credibility.
3.1GENDER DIFFERENCES
Gender difference is important variable that influences computing skills and find the
human information behavior and their emotions. As the web has become a popular platform
for various applications, such as search engines, online learning and electronic commerce, a
growing body of studies has been conducted to examine gender differences in the use of the
web, this literature suggests that major differences between males and females lie within
navigation patterns, attitudes and perceptions [8, 9].In the previous research number of
theoretical survey will be taken and the literature has suggested that males report lower levels
of computer anxiety than their female counterparts; in addition, it also seems that males
achieve much better outcomes than females in the use of computers (Karavidas, Lim,
&Katsikas, 2004). Gender difference will be analyzed by Navigation Pattern andAttitudes
and Perceptions.
Navigation pattern is defined as the way user access the webpages. Without good
navigation, a site becomes useless to visitors. They can’t find the information they need, and
then seek out competing sites instead. It’s vital that your sites be easy to navigate if you want
to be a successful designer. There are certain navigation patterns that work on virtually all
sites. The first pattern tabbed navigation, second pattern is header navigation and third pattern
is blog, informational and reference site, corporate site etc.Large et al. (2002) examined how
boys and girls behaved differently when retrieving information from the web. 53 students,
comprising 23 boys and 30 girls from two grade-six classes, were the subjects of their study.
Overall, the boys explored more hypertext links per minute, tended to perform more page
jumps per minute, entered more searches in search engines, and gathered and saved
information more often than the girls, while the boys spent less time viewing pages than the
girls [8, 9]. Furthermore, Ford, Miller and Moss (2001) investigated individual differences in
internet searching using a sample of 64 Master’s students with 20 males and 44 females. The
above mentioned studies suggest that females and males show different approaches to
navigation, reflected in the navigation patterns that they exhibit, but that there are
contradictory findings.Table 1 Summarize how male and female student explore the web
pages.
Table 1: Gender Difference
Author/Year Male Female
ET/el/2002(23 boys and 30 girls) Explore more hyperlink Explore less hyperlink
Roy et el /2003(equal no. of boys
and girls)
More page Jump Less Page Jump
Lorigo/2006( 23 boys and 30
girls)
Linear Non-Linear
Lio,Huang2008( equal no. of
boys and girls)
Non-linear Linear
Ford,Miller/1996( 24 boys and
44 girls)
More Effective Less Effective
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
18
Attitudes and Perceptions: Perceptioncan determine the attitude it defines how you perceive
the word.Attitude is what the individual thinks about the perception and perception is the
human subjective experience of information provided by the senses. A number of studies
suggest that there are gender differences in attitudes towards web-based interaction and
perceptions. The first survey result state that 630 Anglo-American undergraduates completed
the Student Computer and Internet Survey, the results of which indicated that females
reported more computer anxiety and less computer self-efficacy than males. Schumacher and
Morahan-Martin (2001) conducted a survey to identify gender differences in attitudes
towards computers and the Internet. The survey was completed by 619 students,the results of
which indicated that females reported more computer anxiety and less computer self-efficacy
than males. Similar results were also found in the study by Koohang(2004), which
investigated 154 students of undergraduate management program, and the results indicated
that males had significantly higher positive perceptions than the females toward using the
digital library [5].The studies reviewed so far in this section indicate that females tend to have
more negative attitudes towards the use of the web than males and that they feel less able
when using the web than their male peers.
Table 2: Attitude and Perception
Author/Year Male Female
Jackson,Ervin/2001(630 students) Less computer
anxiety
More Computer
anxiety
Koohnag/2004 (245 students) Positive perception Negative perception
Koohang,Durante/2003(125 students) No significant
difference
---
Hong/2002( 24 students) Asynchronous
learning
Synchronous learning
3.2 PRIOR KNOWLEDGE
User’s prior knowledge includes system experience and domain knowledge and
alsorefers to user’s understanding of the content area (Lazonder, 2000). Prior knowledge or
domain knowledge also depends on web-based instruction, text structure, navigation facility
and internet searching, number of studies suggests that prior knowledge also growing body of
research low prior knowledge users and high prior knowledge users show different levels of
familiarity and have different requirements. The first survey result state that 200 students
participated in the web-based course and the authors found that the participants with more
experience in the use of internet tools used less time to organize their work and visited fewer
pages in each session [5]. The results showed that experts issued longer queries than non-
experts and experts also used many more technical query terms than non-experts [8].Prior
knowledge depends on the following categories:
Web-based instruction, Text structure, Navigation facilities and Internet Searching:
Web-based instruction:Some research has suggested that individuals with different levels of
prior knowledge show preferences for different types of text structure and different kinds of
navigation facilities.
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
19
Text structure: Three types of text structure – hierarchical, non-linear, and mixed
(hierarchical structure with cross referential links) has found and a number of studies have
examined how text structure interacts with user’s prior knowledge; the findings suggest that
experts and novices differ in their performance depending on the text structure used in Web-
based instruction. Survey 1, McDonald and Stevenson (1998) examined the effects of text
structure and prior knowledge on navigation performance [8, 9]. The results showed that the
performance of knowledgeable participants was better than that of non-knowledgeable
participants, as they had a better conception of the subject matter than non-knowledgeable
participants. Survey 2,Calisir and Gurel (2003) also investigated the interaction of three types
of text structure – linear, hierarchical and mixed in relation to the prior knowledge of users.
However, in contrast to the study by McDonald and Stevenson (1998), they examined the
influence of text structure and prior knowledge on learning performance, rather than on
navigation performance. Survey 3,Amadieu, Tricot, and MarinéDo (2005) obtained similar
results. Three types of structure were provided: hierarchical; network; and linear. The results
indicated that low prior knowledge learners demonstrated better performance in the
hierarchical structure, whereas the hierarchical structure seemed to obstruct the domain
representation for high prior knowledge learners. The findings suggest that a hierarchical
structure is most appropriate for non-knowledgeable subjects. The summary of text structure
analysis is given below:
Table 3: Text Structure
Author/Year Knowledge participant Non-knowledge
participant
McDonald and steewan(1998)
(Three structure non-linear,
hierarchical and mixed)
Better understanding of
subject matter
Less understanding of
subject matter
Calisir and Gurel (2003)
(Three types of text structure –
linear, hierarchical and mixed)
Linear and Mixed
Structure
Hierarchical structure
Amadieu, Tricot, and
MarinéDo (2005)(Three types
of structure hierarchical,
networkand linear.
Non-linear structure Hierarchical Structure
Mitchell, Chen, and Macredie
(2005) students reacted to
Web-based instruction with 74
undergraduate students
Non-linear Linear
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
20
Navigation facilities: When considering the relationships between learning strategies and
navigation facilities, student’s prior knowledge is an important factor in determining whether
a particular navigation facility is likely to be useful. Most current Web-based instruction
applications provide a range of navigation facilities to allow users to employ multiple
approaches to support their learning. Hierarchical maps and alphabetical indices are most
commonly used in Web-based instruction; each of them provides different functions in
relation to information access. The characteristics of the different navigation facilities may
influence how users develop their learning strategies, making navigation support a critical
issue. Farrell and Moore (2001) investigated with the use of different navigation facilities
(linear, main menu and search engine) influence user’s achievement and attitude [2, 3]. 200
students were placed into three groups based on their knowledge levels (low, middle, and
high) with the results indicating that high-knowledge users commonly tended to use search
engines to locate specific topics. Conversely, low-knowledge users seem to benefit from
hierarchical maps, which can facilitate the integration of individual topics [4].
Internet Searching: The goal of each fact-finding task was to find one specific answer to a
simple question while the broader tasks required the participants to find several documents
that would satisfy the task. The results indicated that no significant differences were noted
between experts and novices regarding the fact-finding, several studies also argue that prior
knowledge plays a substantial role in internet searching, which covers three aspects: search
strategies; search performance; and search perception. Regarding search strategies, Tabatabai
and Luconi(1998) investigated different strategies used by three experts and three novices.
The results showed that experts used more keywords while novices used the ‘Back’ key more
often; used fewer search engines; and missed some highly relevant sites [5].
Table 4: Internet searching
Author/Year Expert Novices
Tabatabi and Luconi/1998 More keywords Back key
2006 One specific answer Broader answer
Thatcher/2008 Web experience Cognitive search
3.3 COGNITIVE STYLES
Cognitive style also plays an essential role in web-based instruction, learning
preference, learning performance and internet searching. Field Dependence is a user’s
perception or comprehension of information is influenced by the surrounding perceptual or
contextual field.
Web-based instruction:Web based instruction isthe relationships between the degree of Field
Dependence and student’s learning performance and learning preferences.
Learning performance: Students Cognitive styles are determined by using cognitive style
analysis (Riding, 1991) and their learning performance are in breadth first and depth first
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
21
versions. Ford and Chen (2000) found that Field Dependent learners in the breadth-first
version performed better than those in the depth-first version. Conversely, Field Independent
students performed better in the depth-first version than those in the breadth-first version [5].
Graff (2003) determine an individual’s cognitive style, and the relationship between cognitive
style and performance in two versions of the system – long-page and short-page versions [4].
The study’s findings indicated that Field Independent students achieved superior scores in the
long-page condition whereas Field Dependent students were superior in the short-page
condition [5].
Learning preferences: Learning preferences are the choices that learners show in certain
types of learning environments and activities such as the selection of certain navigation paths
or facilities. Studies state that field independent and field dependent students show different
learning preferences. Lee, Cheng, Rai, and Depickere (2005) investigated student’s learning
preferences in WebCT. The study’s findings indicate that field dependent students were
accustomed to linear learning whereas field independent students tended to have a preference
for non-linear learning.
Internet searching: In this analysis GEFT was used to identify the participant’s cognitive
styles and participants were asked to find answers from the Web for two search questions.
The results showed that there were a statistically significant correlation between GEFT scores
and the time spent for searching and the URLs visited. The participants with the higher GEFT
scores conducted the longer search sessions, and visited more URLs. In contrast, the
participants with the lower GEFT scores had the shorter search sessions.Kim, Yun, and Kim
(2004) compare search strategies of different cognitive style groups and the results showed
that the Field Dependent group demonstrated significantly more repeated search attempt and,
more use of search operators [4,5].
4. PROPOSED MODEL
4.2 WEB INTELLIGENCE ARCHITECTURE
The proposed model solves the problem discussed above and provides easier
technique to find behaviour and increased the reliability of the system. The model is divided
into two parts in first part web intelligent system is used to record the web logs from server or
client using ISP. Second part uses the N-gram technique to combine content and usage
mining. The framework should enable the collection of online data from various Internet
Service Providers (ISPs), optionally analyzing the data in real-time, andtransmitting the
relevant data cleaning purpose. Previous review results had some limitation like:Inconsistent
results:The results reported in existing studies are not fully consistent. There are
contradictory findings as to whether gender differences influence user’s attitudes and
perceptions towards Web-based interaction and whether cognitive styles affect user’s
learning performance. In the future, we are developing a standard template for the
questionnaires so that the accuracy of the results can be improved. Lack of mixed methods
and limited application:The survey suggests that quantitative methods are favoured when
seeking to find the overall effectiveness of the systems. It is clear that quantitative and
qualitative methods have different strengths and weaknesses. However, existing study mixes
quantitative and qualitative methods. Fig.2. Proposed Architecture. As illustrated in Fig. 1,
individual surfers' activities are managed by various ISP’s and are recorded by each ISP. The
data is cleaned and filtered according to requirements. Filtered data is transmitted to relay and
is further propagated to a persistent data store, where it can be further analyzed by Big-Data
analysis tools.
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online)
Stage-1
Data sets consisting of web log records for 5063 users are
University website. Web log is unprocessed text file which is recorded from t
Server. E.g. Log file of DePaul University (
De-Paul University (or any other log file) will be used for analysis. The pattern of log file is
shown below:
<Date><Time><C-ip><Cs-Username><S
<Cs-Method><Cs-Uri-Stem><Cs-
Web
Page
Persistent
Data Store
Big Data
Analytics
Pre-
Processing
Relay(Real
time
Analysis)
International Journal of Information Technology & Management Information System (IJITMIS),
6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
22
1 Stage-
Fig. 2 Web intelligent System
Data sets consisting of web log records for 5063 users are collected from De Paul
University website. Web log is unprocessed text file which is recorded from t
Log file of DePaul University (www.cdm.maya.depaul.edu).Recorded log file
rsity (or any other log file) will be used for analysis. The pattern of log file is
Username><S-Sitename><S-Computername><S-Ip><S
-Uri-Query><Sc-Status><Time-Taken><Cs-Version>
Web
Page
EOI Parameter
(Behavioral
eter)
N-gram
Generation
Extraction
Classification/
Prediction
Contextual F
(Human Behavior)
Classification of
Web PLog
File
International Journal of Information Technology & Management Information System (IJITMIS),
(2014), © IAEME
-2
collected from De Paul
University website. Web log is unprocessed text file which is recorded from the IIS Web
Recorded log file
rsity (or any other log file) will be used for analysis. The pattern of log file is
Ip><S-Port>
Version>
EOI Parameter
ehavioralParam
eter)
gram Feature
eneration and
action
Classification/
Prediction
Contextual Factor
(Human Behavior)
Classification of
Web Pages
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
23
The structure of log file:
Here we are suggesting few parameters that indicate the active involvement of the
subject in an EOI. Where each parameter in itself may have a limited predictive value, the
combination of these parameters may yield an accurate prediction or evidence.
A. Intensity of surfing/accessing
It measures the intensity of the user's Internet surfing activities and measuringthe
browsing intensity value by the number of pages that the user visited in a given time. When a
user shows an increased interest in a given event, we can assume that he will visit related web
pages, more intensively than usual. Consequently, historical data of the user's surfing
intensity should be used when searching for anomalies. We are measuring browser intensity
of users by field CS-Uri-Stem and CS-Version of log file.
B. Frequency of revisiting/refreshing a given page
It measures the number of revisit/refresh operations performed by the user on each
page. Through this information the system may locate stressful behavior, where the user
strives for immediate updates regarding his topic of interest. He may repeatedly and
frequently revisit the same page, or simply push the 'refresh' button on the browser.
Significant peaks in this parameter may be observed at real-time and it is calculated by the
CS-Uri-Stem and Time-Taken field of log file.
C. Irregular/Unusual hours of activity
It measures irregular surfing hours and irregular lengths of surfing sessions.
Examination of a user's historical data may reveal a regular pattern, concerning his surfing
hours. This parameter requires analyzing the user's historical data to learn the regular surfing
hours and session-lengths. The irregular hours are calculated by Time-Taken filed of log file
and deviations from such patterns can be found by anomaly detection methods.
D. Interaction level (Passive (high)/Active (low))
It measures the level of the user's interaction, ranging from 'low' (passive only), to
'high' (mostly active). In passive surfing the user suffices with reading pages, whereas in
active surfing he may chat, write email, commit responses or talkbacks, do Internet shopping,
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
24
and so on and it is calculated by S-code and Cs-Method filed of log file. Regarding our
'terrorist' scenario, we hypothesize that, as the deadline comes closer, the subject will lower
his or her active profile, and will focus on passive consumption of relevant information.
E. Diversity of interest topics/content topics
It measures user's range of interest topics, surfers are often attracted to diverse topics
such as news, sports, music, gaming or finances. When the subject is focused on an urgent
issue, we assume that it will affect his or her surfing pattern, restricting the range of visited
sites to a specific topic. The diversity measure can be learned from user’shistorical data,
using clustering methods and it is calculated by S-Sitename, CS-Uri-Stem and Cs-Uri-Query
field of log file. Significant deviations show up as anomalies or outliers.
F. Classification of webpage
Web pages are index pages and content pages. An index page is a page used by the user
for navigation of the web site. It normally contains little information except links. A content
page is a page containing information the user would be interested in and its content offers
something other than links.
Algorithm step
• Two threshold count threshold and link threshold
• Set χ =1/(mean reference length of all pages)
• t= -ln(1- )/χ
• For each page p
• If P’s file type is not HTML orP’s end of session count > count _threshold
• Mark P as a content page else
• P’s number of links > link _threshold
• Mark p as an index page else
• If P’s reference length < t
• Mark P as an index page else mark P as a content page
Correlation with EOI timing
We assume that our five behavioral parameters are correlated with the timing of the
EOI. When the timing of the EOI is known to the investigator, as in forensic investigations,
such correlations can provide supportive evidence in a rather straightforward manner.
However, when the timing of the EOI is unknown to the investigator, as in pre-emptive
investigations, the behavioural parameters can still be used for prediction.
4.2 IMPROVED NAVIGATION PATTERN
Here we are using the N gram model which assumes that the last N pages browsed
affect the probability of the next page to be visited. The model is based on the theory of
probabilistic grammars providing it with a sound theoretical foundation for future
enhancements. We propose a new model for handling the problem of mining log data which
directly captures the semantics of the user navigation sessions. We model the user navigation
records, inferred from log data, as a hypertext probabilistic grammar whose higher
probability generated strings correspond to the user’s preferred trails. There are two contexts
in which such model is potentially useful. On the one hand, it can help the service provider to
understand the user’s needs and as a result improve the quality of its service. The quality of
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
25
service can be improved by providing adaptive pages suited to the individual user, by
building dynamic pages in advance to reduce waiting time. On the other hand, such a model
can be useful to the individual web user by acting as a personal assistant integrated with
his/her web browser. Model has the advantage of being compact, self-contained, coherent,
and based on the well-established work probabilistic grammars. In fact the size of the model
depends only on the size of the web site being analysed and the amount of data collected.
Extensive experiments with both real and random data were conducted and the results show
that, in practice, the algorithm runs in linear time in the size of the grammar. Our model has
potential use both in helping the web site designer to understand the preferences of the
sitevisitor’s, and in helping individual users. To better understand their own navigation
patterns and increase their knowledge of the web’s content.Our approach has the following
characteristics: 1) Extracting search-focused information from web pages. 2) Taking key n-
grams as the representations of search-focused information. 3) Employing data mining for
extraction model using search log data. 4) Employing learning to search-focused key n-grams
as features.
4.2.1 KEY N-GRAMEXTRACTION
Extraction step requires data pre-processing, training data generation and N-gram
feature generation and N-gram extraction with task classification.
Pre-processing: We assume that the objects to be searched and ranked by the search engine
are web pages. During pre-processing, a web page in HTML format is parsed and represented
as a sequence of tags/words.
Algorithm step
• Read records in Logtable, For each record in Logtable
• Read fields (Sc_code, Sc_method)
• If Sc_code = ‘**’and Sc_ method = ‘**’ Then
• Get IP address and URL_link
• If suffix.URL_Link= {*.gif,*.jpg,*.css} Then
• Delete suffix.URL_link
• Save IP_sddress and URL_Link
• End if Else , Read next record End
Training Data Generation: We can consider automatically extracting queries from the page.
Head pages generally include a number of associated queries in the search log data. Such data
can naturally be used as training data for the automatic extraction of queries, particularly for
tail pages. We treat the n-grams in each of the document’s queries as its labelled key n-grams.
For example, when a document is “ABDC” associated with the query “ABC”, we consider
unigrams “A”, “B”, “C”, and bigrams “AB” are key n-grams with the assumption that they
should be ranked higher than unigram “D”, and bigrams “BD”and “DC”, by the extraction
model.
N-gram Features Generation: Web pages contain rich formatting information compared to
plain text. We utilize both textual and formatting information to create features in the
extraction model in order to accurately extract key n-grams. Feature generation based on two
parameter1. Frequency features 2. Appearance features.
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
26
1. Frequency Features
The original/normalized term frequencies of an n-gram within several fields, tags and
attributes are utilized.
• Frequency in Fields: Frequency in fields is: URL, page title, meta- keyword and
meta-description.
• Frequency within Structure Tags: The frequencies of n-gram in texts within a header,
table or list indicated by HTML tags including <h1>, . . . ,<h6>, <table>, <li> and
<dd>.
• Frequency within Highlight Tags: Texts highlighted or emphasized by HTML tags
including <a>, <b>, <i>, <em> and <strong>.
• Frequency within Attributes of Tags: These are hidden texts which are not visible to
users. Specifically, title, alt, href and src tag attributes are used.
• Frequencies in other Contexts: It includes: page headers, page meta-data, page body
and HTML file.
2. Appearance Features
The appearances of n-grams are also important for position, coverage and distribution
.indicators of their importance.Position indicates when it first appears in the title, paragraph
and document and Coverage indicate the coverage of an n-gram in the title or a header and
distribution are used to distribute across different parts of a page.
N-Gram Extraction and Task Classification: Features for each n-gram are then extracted, an
extraction model is trained.Key n-gram extraction is formalized as a learning to rank
problem.In learning, a ranking model is trained which rank n-grams and task user’s current
task will be finalized.The main aim task classification algorithm is to find the user’s task and
is classified into two main group’s casual user and careful user, in casual searching the user
wants to find the precise and credible information.
Algorithm step
• Frequently visited URLs as indicators for the task type classification (Cs-Uri-Stem)
field.
• Web task threshold (t=5ms).
• Storing all frequently visited URLs and counting the occurrence of the Frequently
Visited URLs.
• If frequently visited URLs are more than or equals to 5 then setting the user task is
careful user, otherwise the user task is casual user.
• If frequently visited URL have query (Cs-Uri-Query) and that query will be same then
setting the user task is casual otherwise the user task is careful user.
• Total no. of the URL in casual searching was higher than total no. of URL in careful
searching.
5. APPLICATION AND FUTURE TRENDS AND CONCLUSION
5.1 APPLICATION
Web-wide tracking – DoubleClick: ‘Web-wide tracking’, is tracking an individual across all
sites he visits is one of the most intriguing and controversial technologies, it provides an
understanding of an individual’s lifestyle and habits. The value of this technology in
applications such as cyber-threat analysis and homeland defense is quite clear, and it might
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
27
be only a matter of time before these organizations are asked to provide this information.
Understanding Web communities – AOL: Applying web mining to the data collected from
community interactions provides AOL with a very good understanding of its communities,
which it has used for targeted marketing through ads and e-mail solicitations. The idea is to
treat the community as a highly specialized focus group, understand its needs and opinions on
new and existing products; and also test strategies for influencing opinions. Web Catching:
The Web caching aims to improve the performance of web-based systems by storing and
reusing web objects that are likely to be used in the near future. It has proven to be an
effective technique in reducing network traffic, decreasing the access latency and lowering
the server load[18] .Web caching has focused on the use of historic information about web
objects to aid the cache replacement policies. Web Prefetching: Web prefetching is a
technique for reducing web latency based on predicting the next future web objects to be
accessed by the user and prefetching them during times. The prefetching technique has two
main components: The prediction engine and the prefetching engine. The prediction engine
runs a prediction algorithm to predict the next user’s request [18].
5.2 FUTURE DIRECTION
Fraud and Threat analysis: The anonymity provided by the Web has led to a significant
increase in attempted fraud, from unauthorized use of individual credit cards to hacking into
credit card databases for blackmail purposes. Yet another example is auction fraud, which has
been increasing on popular sites like eBay. Since all these frauds are being perpetrated
through the Internet, Web mining is the perfect analysis technique for detecting and
preventing them. Web mining and Privacy: While there are many benefits to be gained from
Web mining, a clear drawback is the potential for severe violations of privacy. Public attitude
towards privacy seems to be almost schizophrenic – i.e. people say one thing and do quite the
opposite. The research issue generated by this attitude is the need to develop approaches,
methodologies and tools that can be used to verify and validate that a Web service is indeed
using an end-user’s information in a manner consistent with its stated policies.
5.3 CONCLUSION
This paper will present a state-of-the art review of the current research associated with
these human factors. This review will be important for practitioners who want to develop a
sound understanding of the needs and preferences of users with various characteristics such
as intensity of surfing, interest, gender difference and topic similarity. Our model has
potential use both in helping the web site designer to understand the preferences of the site
visitor’s, and their behaviour and access pattern that will be used to decide the human
information behaviour. The model also analyzes the users’ web surfing patterns and traces the
terrorists and criminals activities. In this paper we are using the N-grams methods to search
log data, and the characteristics of key n-grams can be applied to the other data set. The
extracted key n-grams are used as features of the relevance ranking model for finding users
current task and their access behaviour. This approach also applicable to understand the
navigation patterns and increase their knowledge of the web’s content and it also applicable
in a posterior forensic investigation. The model will also help designers to develop web-based
personalized applications that can accommodate user’s individual differences and used for
detecting and avoiding the terror threats caused by terrorists all over the world.
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
28
REFERENCES
[1] Ford, N., Miller, D., & Moss, N, “ Web search strategies and human individual
differences: Cognitive and demographic factors, internet attitudes, and approaches” .
Journal of the American Society for Information Science and Technology, pp. 741–
756. 2005.
[2] Graff, M. (2003). “Learning from web-based instructional Systems and cognitive
style”. British Journal of Educational Technology, 34(4), 407–418.
[3] Chi E. H.; Pirolli P.; Chen K.; and Pitkow J. 2001. “Using information scent to model
user information needs and actions and the Web” . In Proceedings of the SIGCHI
conference on Human factors in computing systems,490- 497, Seattle, Washington,
United States: AC/M 22/11/2007).
[4] Kim K. and Allen B. 2002. Cognitive and task influences on web searching behavior.
Journal of the American Society forInformation Science and Technology, 53(2):109-
119: JohnWiley& Sons.
[5] Sherry y. chen, Robert Macradie,” web based interaction: A review of three important
human factors”, International journal of information management, journal homepage:
www.elsevier.com/locate/ijinfomgt pp. 1-9, 2010.
[6] G. Eason, B. Noble, and I. N. Sneddon, “On certain integrals of Lipschitz-Hankel
type involving products of Besselfunctions,” Phil. Trans. Roy. Soc. London, vol.
A247, pp. 529– 551, April 1955.
[7] White R. W. and Drucker S. M. 2007. Investigating behavioral variability in web
search. In Proceedings of the16th international conference on World Wide Web, 21-
30,Banff, Alberta, Canada: ACM.
[8] K.R.Suneetha, K.R.Krishnamoorthy,“Identifying User behavior by Analyzing Web
Server Access File” IJCSNA International Journal of Computer Science and Network
Security, Vol. 9 No.4 April 2009
[9] Alaa El-Halees “Mining Students Data to AnalyzingLearning Behavior: a Case
Study”, http://eref.uqu.edu.sa/files/eref2/folder6/f158.pdf
[10] R.Cooley, B.Mobasher, and J.Srivastav, “Web mining: Information and Pattern
Discovery on the World Wide Web”,Proc. IEEE Intl. Conf. Tools with Al, Newport
Beach, CA, pp.558-56, 1997
[11] Mahesh thyloreramkrishna, LathaKomalGowdar, LalatessSomashekarHavanur, “Web
mining: Key Accomplishments, Application, and Future Directions”, International
conference on Data Storage and Data Engineering, pp. 186-191, 2010
[12] Jinhyuk Choi, Jeongseok Seo, Geehyuk Lee “Analysis of web usage pattern using
various contextual factors” Association of advancement of artificial intelligence pp. 1-
9, 2009.
[13] R. Cooley, B. Mobasher, J. Srivastava, “Web Mining Information and Pattern
Discovery on the World Wide Web”, InProceedings of the 9th IEEE International
Conference on Tools With Artificial Intelligence, Newport Beach, CA, 1997.
[14] J.Srivastava, R. Cooley, M. Deshpande and P- N.Tan, “Web Usage Mining:
Discovery and Applications of usage pattern From Web Data”, SIGKDD
Explorations, Vol.1, Issue 2, 2000.
[15] Cooley, R., Mobasher, B.,&Srivastava, J. (1999). “Data preparation for mining world
wide web browsing patterns” Journal of Knowledge and Information Systems, 1 (1),
5-32.
International Journal of Information Technology & Management Information System (IJITMIS),
ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME
29
[16] R. Kosala, H. Blockeel, “Web Mining Research: A Survey”,in SIGKDD Explorations
2(1), ACM, July 2000.
[17] JaideepSrivastava, Robert Cooleyz ,MukundDeshpande, Pang-Ning Tan, “Web
Usage Mining: Discovery and Applications of Usage Patterns from Web
Data”SIGKD Explorations. ACM SIGKDD, pp. 1-10, Jan 2000.
[18] Sandhaya Gawade , Hitesh Gupta, “Review of Algorithms for Web Pre-fetching
andCaching, International Journal of Advanced Research in Computer and
Communication Engineering Vol. 1, Issue 2, pp. 1-4, April 2012.
[19] RozitaJamiliOsfouei, “Behaviour mining of female students by analysing log files”, In
Proceeding of IEEE fifth international Conferences on Digital
InformationManagement ICDM 2010, Canada pp. 5-8. July 2010.
[20] T. Anand, S. Padmapriya, E. kirubakram, “Terror Tracking Using Advanced Web
Mining Perspective”, In Proceeding of IEEE Fourth international Conferences on
Intelligent agent and multimedia. pp. 1-4, 2009.
[21] Jos’eBorges and Mark Levene, “Data Mining of User Navigation Patterns”
Department of Computer Science, University College London, Gower Street, London,
pp. 1-19, April 2000.
[22] Chen Wan, KepingBi,Yunhua Hu, “Extracting Search-Focused Key N-Grams for
Relevance Ranking in Web Search” WSDM’12, February 8–12, 2012, Seattle,
Washington, USA, ACM. pp. 1-10.2012.
[23] Prof. Sindhu P Menon and Dr. Nagaratna P Hegde, “Research on Classification
Algorithms and its Impact on Web Mining”, International Journal of Computer
Engineering & Technology (IJCET), Volume 4, Issue 4, 2013, pp. 495 - 504,
ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[24] Alamelu Mangai J, Santhosh Kumar V and Sugumaran V, “Recent Research in Web
Page Classification – A Review”, International Journal of Computer Engineering &
Technology (IJCET), Volume 1, Issue 1, 2010, pp. 112 - 122, ISSN Print: 0976 –
6367, ISSN Online: 0976 – 6375.
[25] Suresh Subramanian and Dr. Sivaprakasam, “Genetic Algorithm with a Ranking
Based Objective Function and Inverse Index Representation for Web Data Mining”,
International Journal of Computer Engineering & Technology (IJCET), Volume 4,
Issue 5, 2013, pp. 84 - 90, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[26] Purvi Dubey and Asst. Prof. Sourabh Dave, “Effective Web Mining Technique for
Retrieval Information on the World Wide Web”, International Journal of Computer
Engineering & Technology (IJCET), Volume 4, Issue 6, 2013, pp. 156 - 160, ISSN
Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[27] Hemprasad Badgujar and Dr. R.C.Thool, “His: Human Identification Schemes on
Web”, International Journal of Computer Engineering & Technology (IJCET),
Volume 4, Issue 2, 2013, pp. 198 - 212, ISSN Print: 0976 – 6367, ISSN Online:
0976 – 6375.

Contenu connexe

Tendances

Discovering knowledge using web structure mining
Discovering knowledge using web structure miningDiscovering knowledge using web structure mining
Discovering knowledge using web structure miningAtul Khanna
 
Netnography in online dating services
Netnography in online dating servicesNetnography in online dating services
Netnography in online dating servicesDanish Ilyas
 
3 iaetsd semantic web page recommender system
3 iaetsd semantic web page recommender system3 iaetsd semantic web page recommender system
3 iaetsd semantic web page recommender systemIaetsd Iaetsd
 
An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...
An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...
An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...Waqas Tariq
 
A Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media DataA Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media DataIOSR Journals
 
Web personalization using clustering of web usage data
Web personalization using clustering of web usage dataWeb personalization using clustering of web usage data
Web personalization using clustering of web usage dataijfcstjournal
 
Gateway to Oklahoma History Case Study: Structured Data and Metadata Evaluati...
Gateway to Oklahoma History Case Study: Structured Data and Metadata Evaluati...Gateway to Oklahoma History Case Study: Structured Data and Metadata Evaluati...
Gateway to Oklahoma History Case Study: Structured Data and Metadata Evaluati...Emily Kolvitz
 
Recent research in web page classification – a review
Recent research in web page classification – a reviewRecent research in web page classification – a review
Recent research in web page classification – a reviewIAEME Publication
 
Recent research in web page classification – a review
Recent research in web page classification – a reviewRecent research in web page classification – a review
Recent research in web page classification – a reviewiaemedu
 

Tendances (20)

Discovering knowledge using web structure mining
Discovering knowledge using web structure miningDiscovering knowledge using web structure mining
Discovering knowledge using web structure mining
 
Netnography in online dating services
Netnography in online dating servicesNetnography in online dating services
Netnography in online dating services
 
5463 26 web mining
5463 26 web mining5463 26 web mining
5463 26 web mining
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Pxc3893553
Pxc3893553Pxc3893553
Pxc3893553
 
3 iaetsd semantic web page recommender system
3 iaetsd semantic web page recommender system3 iaetsd semantic web page recommender system
3 iaetsd semantic web page recommender system
 
An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...
An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...
An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Beh...
 
Web content mining
Web content miningWeb content mining
Web content mining
 
A Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media DataA Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media Data
 
Kp3518241828
Kp3518241828Kp3518241828
Kp3518241828
 
Web mining
Web miningWeb mining
Web mining
 
Web Usage Pattern
Web Usage PatternWeb Usage Pattern
Web Usage Pattern
 
Ab03401550159
Ab03401550159Ab03401550159
Ab03401550159
 
H0314450
H0314450H0314450
H0314450
 
Flyer for Dr. Chirag Shah Speaker
Flyer for Dr. Chirag Shah SpeakerFlyer for Dr. Chirag Shah Speaker
Flyer for Dr. Chirag Shah Speaker
 
Web personalization using clustering of web usage data
Web personalization using clustering of web usage dataWeb personalization using clustering of web usage data
Web personalization using clustering of web usage data
 
Gateway to Oklahoma History Case Study: Structured Data and Metadata Evaluati...
Gateway to Oklahoma History Case Study: Structured Data and Metadata Evaluati...Gateway to Oklahoma History Case Study: Structured Data and Metadata Evaluati...
Gateway to Oklahoma History Case Study: Structured Data and Metadata Evaluati...
 
Web Content Mining
Web Content MiningWeb Content Mining
Web Content Mining
 
Recent research in web page classification – a review
Recent research in web page classification – a reviewRecent research in web page classification – a review
Recent research in web page classification – a review
 
Recent research in web page classification – a review
Recent research in web page classification – a reviewRecent research in web page classification – a review
Recent research in web page classification – a review
 

En vedette (20)

10120140502014 2
10120140502014 210120140502014 2
10120140502014 2
 
20320140502008
2032014050200820320140502008
20320140502008
 
40220140502003
4022014050200340220140502003
40220140502003
 
20320130406023
2032013040602320320130406023
20320130406023
 
40120130406011 2-3
40120130406011 2-340120130406011 2-3
40120130406011 2-3
 
20320130406015 2-3-4
20320130406015 2-3-420320130406015 2-3-4
20320130406015 2-3-4
 
10120130406024
1012013040602410120130406024
10120130406024
 
1
11
1
 
1
11
1
 
1
11
1
 
1
11
1
 
1
11
1
 
Código das equipes atualizado
Código das equipes atualizadoCódigo das equipes atualizado
Código das equipes atualizado
 
1
11
1
 
1
11
1
 
1
11
1
 
1
11
1
 
Acções Futuras
Acções FuturasAcções Futuras
Acções Futuras
 
Circuito RA Belo Horizonte - Transparencia, a base das relações de consumo s...
Circuito RA Belo Horizonte  - Transparencia, a base das relações de consumo s...Circuito RA Belo Horizonte  - Transparencia, a base das relações de consumo s...
Circuito RA Belo Horizonte - Transparencia, a base das relações de consumo s...
 
1
11
1
 

Similaire à 50320140501002

Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
Web Usage Mining: A Survey on User's Navigation Pattern from Web LogsWeb Usage Mining: A Survey on User's Navigation Pattern from Web Logs
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logsijsrd.com
 
A detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesA detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesijctet
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...IAEME Publication
 
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...ijdkp
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...IOSR Journals
 
A Study of Pattern Analysis Techniques of Web Usage
A Study of Pattern Analysis Techniques of Web UsageA Study of Pattern Analysis Techniques of Web Usage
A Study of Pattern Analysis Techniques of Web Usageijbuiiir1
 
C03406021027
C03406021027C03406021027
C03406021027theijes
 
Recommendation generation by integrating sequential
Recommendation generation by integrating sequentialRecommendation generation by integrating sequential
Recommendation generation by integrating sequentialeSAT Publishing House
 
Recommendation generation by integrating sequential pattern mining and semantics
Recommendation generation by integrating sequential pattern mining and semanticsRecommendation generation by integrating sequential pattern mining and semantics
Recommendation generation by integrating sequential pattern mining and semanticseSAT Journals
 
Web Page Recommendation Using Web Mining
Web Page Recommendation Using Web MiningWeb Page Recommendation Using Web Mining
Web Page Recommendation Using Web MiningIJERA Editor
 
`A Survey on approaches of Web Mining in Varied Areas
`A Survey on approaches of Web Mining in Varied Areas`A Survey on approaches of Web Mining in Varied Areas
`A Survey on approaches of Web Mining in Varied Areasinventionjournals
 
An effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded contentAn effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded contentijdpsjournal
 
Comparable Analysis of Web Mining Categories
Comparable Analysis of Web Mining CategoriesComparable Analysis of Web Mining Categories
Comparable Analysis of Web Mining Categoriestheijes
 

Similaire à 50320140501002 (20)

Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
Web Usage Mining: A Survey on User's Navigation Pattern from Web LogsWeb Usage Mining: A Survey on User's Navigation Pattern from Web Logs
Web Usage Mining: A Survey on User's Navigation Pattern from Web Logs
 
A detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniquesA detail survey of page re ranking various web features and techniques
A detail survey of page re ranking various web features and techniques
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
ANALYTICAL IMPLEMENTATION OF WEB STRUCTURE MINING USING DATA ANALYSIS IN ONLI...
 
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...
AN INTELLIGENT OPTIMAL GENETIC MODEL TO INVESTIGATE THE USER USAGE BEHAVIOUR ...
 
Introduction abstract
Introduction abstractIntroduction abstract
Introduction abstract
 
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...Performance of Real Time Web Traffic Analysis Using Feed  Forward Neural Netw...
Performance of Real Time Web Traffic Analysis Using Feed Forward Neural Netw...
 
01635156
0163515601635156
01635156
 
A Study of Pattern Analysis Techniques of Web Usage
A Study of Pattern Analysis Techniques of Web UsageA Study of Pattern Analysis Techniques of Web Usage
A Study of Pattern Analysis Techniques of Web Usage
 
C03406021027
C03406021027C03406021027
C03406021027
 
Recommendation generation by integrating sequential
Recommendation generation by integrating sequentialRecommendation generation by integrating sequential
Recommendation generation by integrating sequential
 
Recommendation generation by integrating sequential pattern mining and semantics
Recommendation generation by integrating sequential pattern mining and semanticsRecommendation generation by integrating sequential pattern mining and semantics
Recommendation generation by integrating sequential pattern mining and semantics
 
Web Page Recommendation Using Web Mining
Web Page Recommendation Using Web MiningWeb Page Recommendation Using Web Mining
Web Page Recommendation Using Web Mining
 
WEB MINING.pptx
WEB MINING.pptxWEB MINING.pptx
WEB MINING.pptx
 
Paper24
Paper24Paper24
Paper24
 
[IJCT-V3I2P30] Authors: Sunny Sharma
[IJCT-V3I2P30] Authors: Sunny Sharma[IJCT-V3I2P30] Authors: Sunny Sharma
[IJCT-V3I2P30] Authors: Sunny Sharma
 
50320130403007
5032013040300750320130403007
50320130403007
 
`A Survey on approaches of Web Mining in Varied Areas
`A Survey on approaches of Web Mining in Varied Areas`A Survey on approaches of Web Mining in Varied Areas
`A Survey on approaches of Web Mining in Varied Areas
 
An effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded contentAn effective search on web log from most popular downloaded content
An effective search on web log from most popular downloaded content
 
Comparable Analysis of Web Mining Categories
Comparable Analysis of Web Mining CategoriesComparable Analysis of Web Mining Categories
Comparable Analysis of Web Mining Categories
 

Plus de IAEME Publication

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME Publication
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...IAEME Publication
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSIAEME Publication
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSIAEME Publication
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSIAEME Publication
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSIAEME Publication
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOIAEME Publication
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IAEME Publication
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYIAEME Publication
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...IAEME Publication
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEIAEME Publication
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...IAEME Publication
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...IAEME Publication
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...IAEME Publication
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...IAEME Publication
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...IAEME Publication
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...IAEME Publication
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...IAEME Publication
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...IAEME Publication
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTIAEME Publication
 

Plus de IAEME Publication (20)

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdf
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICE
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
 

Dernier

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Dernier (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

50320140501002

  • 1. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 12 WEB USAGE MINING CONTEXTUAL FACTOR: HUMAN INFORMATION BEHAVIOR Ms. Ravita Mishra Information Technology Dept, Ramrao Adik Institute of Technology, Nerul Navi Mumbai, India ABSTRACT With the rapid development of information technology, the World Wide Web has been widely used in various applications, such as search engines, online learning and electronic commerce. These applications are used by a diverse population of users with heterogeneous backgrounds, in terms of their knowledge, skills, and needs. Therefore, human factors are key issues for the development of web-based application and research. This paper first identifies reviews from different authorsand then examines the three important human factors: gender differences, prior knowledge, and cognitive styles. The review result is not significantly correct; a new model is proposed that will access the data (log data) and show the human access behavior. The proposed model has two stages: web intelligence and navigation pattern. Stage 1(web intelligence system) captures data from different server and converts in the form of table (data store). Stage 2 uses the N-gram algorithm which assumes that the last N-pages browsed affect the probability of the next page to be visited, and user navigation sessions are modelled as a hypertext probabilistic grammar whose higher probability strings correspond to the user’s preferred trails.In this paper web caching and pre- fetching are two important approaches used to reduce the noticeable response time perceived by users.The model improves the navigation pattern of users and find the users behavior ( gender difference and user type) that finding is used by site designer and researchers and also used for detecting and avoiding the terror threats caused by terrorists all over the world.The paper is organized into five different parts, first part contain introduction, second part contain different type of web mining third part contain usage mining on the web and forth part contain analysis of human factor and evaluation technique,fifth part contain propose methodology and last part contains application, limitation, conclusion and further work. Keywords: Pattern Discovery, Contextual factor, Information Retrieval, N-gram, Gender difference, Cognitive style and Prior experience. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & MANAGEMENT INFORMATION SYSTEM (IJITMIS) ISSN 0976 – 6405(Print) ISSN 0976 – 6413(Online) Volume 5, Issue 1, January - April (2014), pp. 12-29 © IAEME: http://www.iaeme.com/IJITMIS.asp Journal Impact Factor (2013): 5.2372 (Calculated by GISI) www.jifactor.com IJITMIS © I A E M E
  • 2. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 13 1.INTRODUCTION Web mining is a very hot research topic which combines two of the activated research areas: Data Mining and World Wide Web. The Web mining research relates to several research communities such as Database, Information Retrieval and Artificial Intelligence. Web mining is categorized in into three areas: Web content mining, Web structure mining and Web usage mining. Web content mining focuses on the discovery/retrieval of the useful information from the web contents/data/documents, while the Web structure mining emphasizes to the discovery of how to model the underlying link structures of the web [14, 16]. Web usage mining is relative independent, but not isolated, category, which mainly describes the techniques that discover the user's usage pattern and try to predict the user's behaviors. Web mining is the term of applying data mining techniques to automatically discover and extract useful information from the World Wide Web documents and services [16]. Here, human factors are increasingly seen as important issues, as reflected in the substantial number of existing studies in the area. Among various human factors, gender differences (e.g., Roy, Taylor, & Chi, 2003), prior knowledge (e.g., Calisir&Gurel, 2003) and cognitive styles (e.g., Chen &Macredie, 2004) have significant impacts on web-based interaction. Furthermore, these three human factors have certain inter-relations. For example, females tend to behave similarly to novices, in terms of the extent to which they experience disorientation problems; males and experts seem to have similar preferences in their interaction patterns, with studies reporting that they enjoy non-linear interaction (Ford & Chen, 2000). Despite the growing number of studies looking at these three human factors, there is a lack of an integrated review which synthesizes their effects. 2. WEB DATA MINING 2.1 Overview: Today, with the tremendous growth of the data sources available on the Web and the dramatic popularity of e-commerce in the business community, Web mining has become the focus of quite a few research projects and papers [13, 14, and 15]. In previous research, they suggested a similar way to decompose web mining into the following subtasks: Resource Discovery: The task of retrieving the intended information from web. Information Extraction: Automatically selecting and pre-processing specific information from the retrieved web resources. Generalization: Automatically discovers general patters at the both individual web sites and across multiple sites. Analysis: Analyzing the mined pattern. The authors of [10] claims the web involves three types of data: data on the web (content), web log data (usage) and web structure data. The author classified the data type as content data, structure data, usage data, and user profile data. 2.1.1 Web Content Mining: Web content mining describes the automatic search of information resourceavailable online and involves mining web data contents. The web document usually contains several types of data, such as text, image, audio, video, metadata and hyperlinks. The technologies that are normally used in web content mining are NLP and IR. Some of them are semi-structured such as HTML documents or a more structured data like the data in the tables or database generated HTML pages, butmost of the data is unstructured text data [14]. 2.1.2 Web Structure Mining: Technically, web content mining mainly focuses on the structure of inner-document, while web structure mining tries to discover the link structure of
  • 3. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 14 the hyperlinks at the inter-document level. Based on the topology of the hyperlinks, web structure mining will categorize the web pages and generate the information, such as the similarity and relationship between different web sites. Web structure mining can also have another direction – discovering the structure of web document itself. This type of structure mining can be used to reveal the structure (schema) of web pages; this would be good for navigation purpose and make it possible to compare/integrate web page schemes. The structural information generated from the web structure mining includes the following: the information measuring the frequency of the local links in the web tuples in a web table; the information measuring the frequency of web tuples in a web table containing links that are interior and the links that are within the same document; the information measuring the frequency of web tuples in a web table that contains links that are global and the links that span different web sites; the information measuring the frequency of identical web tuples that appear in a web table or among the web tables [15,20]. In general, if a web page is linked to another web page directly, or the web pages are neighbors, we would like to discover the relationships among those web pages. The relations maybe fall in one of the types, such as they related by synonyms or ontology, they may have similar contents, and both of them may sit in the same web server therefore created by the same person [13, 14]. 2.1.3 Web Usage Mining: Analyzing the web access logs of different web sites can help understand the user behaviour and the web structure, thereby improving the design of this colossal collection of resources. There are two main tendencies in web usage mining driven by the applications of the discoveries: General Access Pattern Tracking and Customized Usage Tracking. The general access pattern tracking analyzes the web logs to understand access patterns and trends. These analyses can be used for better structure and grouping of resource providers. Applying data mining techniques on access logs unveils interesting access patterns that can be used to restructure sites in a more efficient grouping, pinpoint effective advertising locations, and target specific users for specific Selling ads. Customized usage tracking analyzes individual trends. Its purpose is to customize web sites to users. The information displayed the depth of the site structure and the format of the resources can all be dynamically customized for each user over time based on their access patterns. 2.2. STEPS IN WEB MINING Web usage mining falls in three areas 1: Pre-processing 2: Pattern discovery 3: Pattern analysis. Preprocessing further categorized into three parts. 2.2.1 Pre-processing: Pre-processing is categorized in three types they are: Content Pre- processing, Structure Pre-processing and Usage Pre-processing. Content preprocessing is the process of converting text, image, scripts and other files into the forms that can be used by the usage mining. For the content of static page views, the preprocessing can be easily done by parsing the HTML and reformatting the information or running additional algorithm as desired [15].The structure preprocessing can be treated similar as the content preprocessing. However, each server session may have to construct a different site structure than others [13, 15].The inputs of the preprocessing phase may include the Web server logs, referral logs, registration files, index server logs, and optionally usage statistics from a previous analysis. The outputs are the user session file, transaction file, site topology, and page classifications. It’s always necessary to adopt a data cleaning techniques to eliminate the impact of the irrelevant items to the analysis result. Without sufficient data, it is very difficult to identify the users [14].The session identification is also a part of the usage preprocessing. The goal of
  • 4. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 15 it is to divide the page accesses of each user, who is likely to visit the Web site more than once, into individual sessions. The simplest way to do is to use a timeout to break a user’s click-stream into session. Another problem is named as path completion, which indicates the determining if there are any important accesses missed in the access log. The methods used for the user identification can be used for path completion. The final procedure of the pre- processing is formatting, which is a preparation module to properly format the sessions or transactions. 2.2.2 Pattern Discovery Pattern discovery converges the algorithms and techniques from several research areas, such as data mining, machine learning, statistics, and pattern recognition. Pattern discovery falls in following categories: Statistical Analysis, Association Rules, Clustering, Classification, Sequential Pattern and Dependency Modeling. Statistical techniques are the most powerful tools in extracting knowledge about visitors to a Web site. The analysts may perform different kinds of descriptive statistical analyses based on different variables when analyzing the session file [13]. By analyzing the statistical information contained in the periodic web system report, the extracted report can be potentially useful for improving the system performance, enhancing the security of the system.Association rule mining techniques can be used to discover unordered correlation between items found in a database of transactions [13]. The association rules refer to sets of pages that are accessed together with a support value exceeding some specified threshold. The web designers can restructure their web sites efficiently with the help of the presence or absence of the association rules. Clustering analysis is a technique to group together users or data items with the similar characteristics. Clustering of user information or pages can facilitate the development and execution of future marketing strategies [13]. Clustering of users will help to discover the group of users, who have similar navigation pattern. It’s very useful for inferring user demographics to perform market segmentation in E-commerce applications or provide personalized web content to the individual users. Classification is supervised inductive learning technique that maps a data item into one of several predefined classes. In the web domain, Web master or marketer will have to use this technique if he/she want to establish a profile of users belonging to a particular class or category. This requires extraction and selection of features that best describe the properties of a given class or category [13]. Sequential Patternfinds the inter-session pattern, such that a set of the items follows the presence of another’s in a time-ordered set of sessions.It also includes other types of temporal analysis such as trend analysis, change point detection, or similarity analysis. It’s very useful for the web marketer to predict the future trend, which help to place advertisements aimed at certain user groups [13]. Dependency Modelingrepresents significant dependencies among the various variables in the web domain [13]. The modeling technique provides a theoretical framework for analyzing the behavior of users, and is potentially useful for predicting future web resource consumption. 2.3 PATTERN ANALYSIS The goal of this process is to eliminate the irrelative rules or patterns and to extract the interesting rules or patterns from the output of the pattern discovery process. Output of algorithms is not in the form suitable for direct human consumption, and thus need to be transform to a format can be assimilate easily [13]. There are two most common approaches for the pattern analysis. One is to use the knowledge query mechanism such as SQL, while another is to construct multi-dimensional data cube before perform OLAP operations.
  • 5. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online) 3. ANALYSIS OFCONTEXTUAL FACTOR In the given framework analysed and it includes: information exploration, seeking, filtering, use Based on the framework, various contextual credibility and browser dependence economic, social, and political - model, the user dimension is considered to be influenced by the particular task, information need, knowledge state, cognitive style, affective state and so on. They measured users’ cognitive styles and affective states before a user study, applying a process while users were conducting information relationships among the elements of the dimensions users judge cognitive authority and information quality by two types of judgment judgment and evaluative judgment the judgments through a user study. Review process:Due to the massive growth of the topic and attracts more and more attenti extract information from data set for business needs, which determines its application is highly customer-related. In business r research area which demonstrates completes human information behavior based on experimental dataset. Analysis of this factor is based on four points. 1. Gender difference 2. Cognitive style 3. Prior experience few commercial analysis applications available efficient, flexible and powerful tools, lots of work needs to be done for developer. Figure 4.1 illustrates the review process, which consists of four stages As shown in above Fig. journals and search engines; here include empirical studies related to gender differences, prior knowledge and cognitive styles. The search terms for these electronic resources included four group International Journal of Information Technology & Management Information System (IJITMIS), 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 16 ANALYSIS OFCONTEXTUAL FACTOR (HUMAN INFORMATION BEHAVIOR) In the given framework, contextual parameter human information behaviour information exploration, seeking, filtering, use and communication e framework, various contextual factors –user interest, difficulty, time taken, credibility and browser dependence and their influential factor physical, cognitive, affective, - and their implications were investigated [12]. In , the user dimension is considered to be influenced by the particular task, information need, knowledge state, cognitive style, affective state and so on. They measured users’ states before a user study, applying a process-tracing technique while users were conducting information-seeking tasks, and found various types of relationships among the elements of the dimensions. In (Rieh 2002), the authors found that ive authority and information quality by two types of judgment judgment and evaluative judgment – and they also identified the main facets and keywords of the judgments through a user study. Due to the massive growth of the e-commerce, privacy becomes a sensitive topic and attracts more and more attention recently. The basic goal of web mining is to extract information from data set for business needs, which determines its application is related. In business related customer data, Human factor is which demonstrates completes human information behavior based on experimental dataset. Analysis of this factor is based on four points. 1. Gender difference 2. enceand 4. Web based interaction. Although there are quite a few commercial analysis applications available and many more are free on to develop the efficient, flexible and powerful tools, lots of work needs to be done for both researcher 4.1 illustrates the review process, which consists of four stages Fig.there is four major stages. Stage one search here resources were selected because they were known to empirical studies related to gender differences, prior knowledge and cognitive styles. The search terms for these electronic resources included four groups: (1) Internet and International Journal of Information Technology & Management Information System (IJITMIS), (2014), © IAEME BEHAVIOR) information behaviour is and communication. user interest, difficulty, time taken, physical, cognitive, affective, cations were investigated [12]. In previous , the user dimension is considered to be influenced by the particular task, information need, knowledge state, cognitive style, affective state and so on. They measured users’ tracing technique seeking tasks, and found various types of In (Rieh 2002), the authors found that ive authority and information quality by two types of judgment - predictive and they also identified the main facets and keywords of commerce, privacy becomes a sensitive eb mining is to extract information from data set for business needs, which determines its application is Human factor is a fertilized which demonstrates completes human information behavior based on experimental dataset. Analysis of this factor is based on four points. 1. Gender difference 2. Although there are quite a o develop the both researcher and 4.1 illustrates the review process, which consists of four stages searches electronic resources were selected because they were known to empirical studies related to gender differences, prior knowledge and cognitive styles. s: (1) Internet and WWW;
  • 6. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 17 (2) Gender, females/males; boys/girls, and men/women; (3) Prior knowledge, system experience, novices/experts, domain expertise, domain knowledge, computer experience, previous experience, Internet experience; and (4) Cognitive styles, learning styles, field dependence.Stage two analyzes search based on timeline. Stage three selects the analysis based on titles, elements and keywords. Stage four asses the behavior based on credibility. 3.1GENDER DIFFERENCES Gender difference is important variable that influences computing skills and find the human information behavior and their emotions. As the web has become a popular platform for various applications, such as search engines, online learning and electronic commerce, a growing body of studies has been conducted to examine gender differences in the use of the web, this literature suggests that major differences between males and females lie within navigation patterns, attitudes and perceptions [8, 9].In the previous research number of theoretical survey will be taken and the literature has suggested that males report lower levels of computer anxiety than their female counterparts; in addition, it also seems that males achieve much better outcomes than females in the use of computers (Karavidas, Lim, &Katsikas, 2004). Gender difference will be analyzed by Navigation Pattern andAttitudes and Perceptions. Navigation pattern is defined as the way user access the webpages. Without good navigation, a site becomes useless to visitors. They can’t find the information they need, and then seek out competing sites instead. It’s vital that your sites be easy to navigate if you want to be a successful designer. There are certain navigation patterns that work on virtually all sites. The first pattern tabbed navigation, second pattern is header navigation and third pattern is blog, informational and reference site, corporate site etc.Large et al. (2002) examined how boys and girls behaved differently when retrieving information from the web. 53 students, comprising 23 boys and 30 girls from two grade-six classes, were the subjects of their study. Overall, the boys explored more hypertext links per minute, tended to perform more page jumps per minute, entered more searches in search engines, and gathered and saved information more often than the girls, while the boys spent less time viewing pages than the girls [8, 9]. Furthermore, Ford, Miller and Moss (2001) investigated individual differences in internet searching using a sample of 64 Master’s students with 20 males and 44 females. The above mentioned studies suggest that females and males show different approaches to navigation, reflected in the navigation patterns that they exhibit, but that there are contradictory findings.Table 1 Summarize how male and female student explore the web pages. Table 1: Gender Difference Author/Year Male Female ET/el/2002(23 boys and 30 girls) Explore more hyperlink Explore less hyperlink Roy et el /2003(equal no. of boys and girls) More page Jump Less Page Jump Lorigo/2006( 23 boys and 30 girls) Linear Non-Linear Lio,Huang2008( equal no. of boys and girls) Non-linear Linear Ford,Miller/1996( 24 boys and 44 girls) More Effective Less Effective
  • 7. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 18 Attitudes and Perceptions: Perceptioncan determine the attitude it defines how you perceive the word.Attitude is what the individual thinks about the perception and perception is the human subjective experience of information provided by the senses. A number of studies suggest that there are gender differences in attitudes towards web-based interaction and perceptions. The first survey result state that 630 Anglo-American undergraduates completed the Student Computer and Internet Survey, the results of which indicated that females reported more computer anxiety and less computer self-efficacy than males. Schumacher and Morahan-Martin (2001) conducted a survey to identify gender differences in attitudes towards computers and the Internet. The survey was completed by 619 students,the results of which indicated that females reported more computer anxiety and less computer self-efficacy than males. Similar results were also found in the study by Koohang(2004), which investigated 154 students of undergraduate management program, and the results indicated that males had significantly higher positive perceptions than the females toward using the digital library [5].The studies reviewed so far in this section indicate that females tend to have more negative attitudes towards the use of the web than males and that they feel less able when using the web than their male peers. Table 2: Attitude and Perception Author/Year Male Female Jackson,Ervin/2001(630 students) Less computer anxiety More Computer anxiety Koohnag/2004 (245 students) Positive perception Negative perception Koohang,Durante/2003(125 students) No significant difference --- Hong/2002( 24 students) Asynchronous learning Synchronous learning 3.2 PRIOR KNOWLEDGE User’s prior knowledge includes system experience and domain knowledge and alsorefers to user’s understanding of the content area (Lazonder, 2000). Prior knowledge or domain knowledge also depends on web-based instruction, text structure, navigation facility and internet searching, number of studies suggests that prior knowledge also growing body of research low prior knowledge users and high prior knowledge users show different levels of familiarity and have different requirements. The first survey result state that 200 students participated in the web-based course and the authors found that the participants with more experience in the use of internet tools used less time to organize their work and visited fewer pages in each session [5]. The results showed that experts issued longer queries than non- experts and experts also used many more technical query terms than non-experts [8].Prior knowledge depends on the following categories: Web-based instruction, Text structure, Navigation facilities and Internet Searching: Web-based instruction:Some research has suggested that individuals with different levels of prior knowledge show preferences for different types of text structure and different kinds of navigation facilities.
  • 8. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 19 Text structure: Three types of text structure – hierarchical, non-linear, and mixed (hierarchical structure with cross referential links) has found and a number of studies have examined how text structure interacts with user’s prior knowledge; the findings suggest that experts and novices differ in their performance depending on the text structure used in Web- based instruction. Survey 1, McDonald and Stevenson (1998) examined the effects of text structure and prior knowledge on navigation performance [8, 9]. The results showed that the performance of knowledgeable participants was better than that of non-knowledgeable participants, as they had a better conception of the subject matter than non-knowledgeable participants. Survey 2,Calisir and Gurel (2003) also investigated the interaction of three types of text structure – linear, hierarchical and mixed in relation to the prior knowledge of users. However, in contrast to the study by McDonald and Stevenson (1998), they examined the influence of text structure and prior knowledge on learning performance, rather than on navigation performance. Survey 3,Amadieu, Tricot, and MarinéDo (2005) obtained similar results. Three types of structure were provided: hierarchical; network; and linear. The results indicated that low prior knowledge learners demonstrated better performance in the hierarchical structure, whereas the hierarchical structure seemed to obstruct the domain representation for high prior knowledge learners. The findings suggest that a hierarchical structure is most appropriate for non-knowledgeable subjects. The summary of text structure analysis is given below: Table 3: Text Structure Author/Year Knowledge participant Non-knowledge participant McDonald and steewan(1998) (Three structure non-linear, hierarchical and mixed) Better understanding of subject matter Less understanding of subject matter Calisir and Gurel (2003) (Three types of text structure – linear, hierarchical and mixed) Linear and Mixed Structure Hierarchical structure Amadieu, Tricot, and MarinéDo (2005)(Three types of structure hierarchical, networkand linear. Non-linear structure Hierarchical Structure Mitchell, Chen, and Macredie (2005) students reacted to Web-based instruction with 74 undergraduate students Non-linear Linear
  • 9. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 20 Navigation facilities: When considering the relationships between learning strategies and navigation facilities, student’s prior knowledge is an important factor in determining whether a particular navigation facility is likely to be useful. Most current Web-based instruction applications provide a range of navigation facilities to allow users to employ multiple approaches to support their learning. Hierarchical maps and alphabetical indices are most commonly used in Web-based instruction; each of them provides different functions in relation to information access. The characteristics of the different navigation facilities may influence how users develop their learning strategies, making navigation support a critical issue. Farrell and Moore (2001) investigated with the use of different navigation facilities (linear, main menu and search engine) influence user’s achievement and attitude [2, 3]. 200 students were placed into three groups based on their knowledge levels (low, middle, and high) with the results indicating that high-knowledge users commonly tended to use search engines to locate specific topics. Conversely, low-knowledge users seem to benefit from hierarchical maps, which can facilitate the integration of individual topics [4]. Internet Searching: The goal of each fact-finding task was to find one specific answer to a simple question while the broader tasks required the participants to find several documents that would satisfy the task. The results indicated that no significant differences were noted between experts and novices regarding the fact-finding, several studies also argue that prior knowledge plays a substantial role in internet searching, which covers three aspects: search strategies; search performance; and search perception. Regarding search strategies, Tabatabai and Luconi(1998) investigated different strategies used by three experts and three novices. The results showed that experts used more keywords while novices used the ‘Back’ key more often; used fewer search engines; and missed some highly relevant sites [5]. Table 4: Internet searching Author/Year Expert Novices Tabatabi and Luconi/1998 More keywords Back key 2006 One specific answer Broader answer Thatcher/2008 Web experience Cognitive search 3.3 COGNITIVE STYLES Cognitive style also plays an essential role in web-based instruction, learning preference, learning performance and internet searching. Field Dependence is a user’s perception or comprehension of information is influenced by the surrounding perceptual or contextual field. Web-based instruction:Web based instruction isthe relationships between the degree of Field Dependence and student’s learning performance and learning preferences. Learning performance: Students Cognitive styles are determined by using cognitive style analysis (Riding, 1991) and their learning performance are in breadth first and depth first
  • 10. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 21 versions. Ford and Chen (2000) found that Field Dependent learners in the breadth-first version performed better than those in the depth-first version. Conversely, Field Independent students performed better in the depth-first version than those in the breadth-first version [5]. Graff (2003) determine an individual’s cognitive style, and the relationship between cognitive style and performance in two versions of the system – long-page and short-page versions [4]. The study’s findings indicated that Field Independent students achieved superior scores in the long-page condition whereas Field Dependent students were superior in the short-page condition [5]. Learning preferences: Learning preferences are the choices that learners show in certain types of learning environments and activities such as the selection of certain navigation paths or facilities. Studies state that field independent and field dependent students show different learning preferences. Lee, Cheng, Rai, and Depickere (2005) investigated student’s learning preferences in WebCT. The study’s findings indicate that field dependent students were accustomed to linear learning whereas field independent students tended to have a preference for non-linear learning. Internet searching: In this analysis GEFT was used to identify the participant’s cognitive styles and participants were asked to find answers from the Web for two search questions. The results showed that there were a statistically significant correlation between GEFT scores and the time spent for searching and the URLs visited. The participants with the higher GEFT scores conducted the longer search sessions, and visited more URLs. In contrast, the participants with the lower GEFT scores had the shorter search sessions.Kim, Yun, and Kim (2004) compare search strategies of different cognitive style groups and the results showed that the Field Dependent group demonstrated significantly more repeated search attempt and, more use of search operators [4,5]. 4. PROPOSED MODEL 4.2 WEB INTELLIGENCE ARCHITECTURE The proposed model solves the problem discussed above and provides easier technique to find behaviour and increased the reliability of the system. The model is divided into two parts in first part web intelligent system is used to record the web logs from server or client using ISP. Second part uses the N-gram technique to combine content and usage mining. The framework should enable the collection of online data from various Internet Service Providers (ISPs), optionally analyzing the data in real-time, andtransmitting the relevant data cleaning purpose. Previous review results had some limitation like:Inconsistent results:The results reported in existing studies are not fully consistent. There are contradictory findings as to whether gender differences influence user’s attitudes and perceptions towards Web-based interaction and whether cognitive styles affect user’s learning performance. In the future, we are developing a standard template for the questionnaires so that the accuracy of the results can be improved. Lack of mixed methods and limited application:The survey suggests that quantitative methods are favoured when seeking to find the overall effectiveness of the systems. It is clear that quantitative and qualitative methods have different strengths and weaknesses. However, existing study mixes quantitative and qualitative methods. Fig.2. Proposed Architecture. As illustrated in Fig. 1, individual surfers' activities are managed by various ISP’s and are recorded by each ISP. The data is cleaned and filtered according to requirements. Filtered data is transmitted to relay and is further propagated to a persistent data store, where it can be further analyzed by Big-Data analysis tools.
  • 11. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online) Stage-1 Data sets consisting of web log records for 5063 users are University website. Web log is unprocessed text file which is recorded from t Server. E.g. Log file of DePaul University ( De-Paul University (or any other log file) will be used for analysis. The pattern of log file is shown below: <Date><Time><C-ip><Cs-Username><S <Cs-Method><Cs-Uri-Stem><Cs- Web Page Persistent Data Store Big Data Analytics Pre- Processing Relay(Real time Analysis) International Journal of Information Technology & Management Information System (IJITMIS), 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 22 1 Stage- Fig. 2 Web intelligent System Data sets consisting of web log records for 5063 users are collected from De Paul University website. Web log is unprocessed text file which is recorded from t Log file of DePaul University (www.cdm.maya.depaul.edu).Recorded log file rsity (or any other log file) will be used for analysis. The pattern of log file is Username><S-Sitename><S-Computername><S-Ip><S -Uri-Query><Sc-Status><Time-Taken><Cs-Version> Web Page EOI Parameter (Behavioral eter) N-gram Generation Extraction Classification/ Prediction Contextual F (Human Behavior) Classification of Web PLog File International Journal of Information Technology & Management Information System (IJITMIS), (2014), © IAEME -2 collected from De Paul University website. Web log is unprocessed text file which is recorded from the IIS Web Recorded log file rsity (or any other log file) will be used for analysis. The pattern of log file is Ip><S-Port> Version> EOI Parameter ehavioralParam eter) gram Feature eneration and action Classification/ Prediction Contextual Factor (Human Behavior) Classification of Web Pages
  • 12. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 23 The structure of log file: Here we are suggesting few parameters that indicate the active involvement of the subject in an EOI. Where each parameter in itself may have a limited predictive value, the combination of these parameters may yield an accurate prediction or evidence. A. Intensity of surfing/accessing It measures the intensity of the user's Internet surfing activities and measuringthe browsing intensity value by the number of pages that the user visited in a given time. When a user shows an increased interest in a given event, we can assume that he will visit related web pages, more intensively than usual. Consequently, historical data of the user's surfing intensity should be used when searching for anomalies. We are measuring browser intensity of users by field CS-Uri-Stem and CS-Version of log file. B. Frequency of revisiting/refreshing a given page It measures the number of revisit/refresh operations performed by the user on each page. Through this information the system may locate stressful behavior, where the user strives for immediate updates regarding his topic of interest. He may repeatedly and frequently revisit the same page, or simply push the 'refresh' button on the browser. Significant peaks in this parameter may be observed at real-time and it is calculated by the CS-Uri-Stem and Time-Taken field of log file. C. Irregular/Unusual hours of activity It measures irregular surfing hours and irregular lengths of surfing sessions. Examination of a user's historical data may reveal a regular pattern, concerning his surfing hours. This parameter requires analyzing the user's historical data to learn the regular surfing hours and session-lengths. The irregular hours are calculated by Time-Taken filed of log file and deviations from such patterns can be found by anomaly detection methods. D. Interaction level (Passive (high)/Active (low)) It measures the level of the user's interaction, ranging from 'low' (passive only), to 'high' (mostly active). In passive surfing the user suffices with reading pages, whereas in active surfing he may chat, write email, commit responses or talkbacks, do Internet shopping,
  • 13. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 24 and so on and it is calculated by S-code and Cs-Method filed of log file. Regarding our 'terrorist' scenario, we hypothesize that, as the deadline comes closer, the subject will lower his or her active profile, and will focus on passive consumption of relevant information. E. Diversity of interest topics/content topics It measures user's range of interest topics, surfers are often attracted to diverse topics such as news, sports, music, gaming or finances. When the subject is focused on an urgent issue, we assume that it will affect his or her surfing pattern, restricting the range of visited sites to a specific topic. The diversity measure can be learned from user’shistorical data, using clustering methods and it is calculated by S-Sitename, CS-Uri-Stem and Cs-Uri-Query field of log file. Significant deviations show up as anomalies or outliers. F. Classification of webpage Web pages are index pages and content pages. An index page is a page used by the user for navigation of the web site. It normally contains little information except links. A content page is a page containing information the user would be interested in and its content offers something other than links. Algorithm step • Two threshold count threshold and link threshold • Set χ =1/(mean reference length of all pages) • t= -ln(1- )/χ • For each page p • If P’s file type is not HTML orP’s end of session count > count _threshold • Mark P as a content page else • P’s number of links > link _threshold • Mark p as an index page else • If P’s reference length < t • Mark P as an index page else mark P as a content page Correlation with EOI timing We assume that our five behavioral parameters are correlated with the timing of the EOI. When the timing of the EOI is known to the investigator, as in forensic investigations, such correlations can provide supportive evidence in a rather straightforward manner. However, when the timing of the EOI is unknown to the investigator, as in pre-emptive investigations, the behavioural parameters can still be used for prediction. 4.2 IMPROVED NAVIGATION PATTERN Here we are using the N gram model which assumes that the last N pages browsed affect the probability of the next page to be visited. The model is based on the theory of probabilistic grammars providing it with a sound theoretical foundation for future enhancements. We propose a new model for handling the problem of mining log data which directly captures the semantics of the user navigation sessions. We model the user navigation records, inferred from log data, as a hypertext probabilistic grammar whose higher probability generated strings correspond to the user’s preferred trails. There are two contexts in which such model is potentially useful. On the one hand, it can help the service provider to understand the user’s needs and as a result improve the quality of its service. The quality of
  • 14. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 25 service can be improved by providing adaptive pages suited to the individual user, by building dynamic pages in advance to reduce waiting time. On the other hand, such a model can be useful to the individual web user by acting as a personal assistant integrated with his/her web browser. Model has the advantage of being compact, self-contained, coherent, and based on the well-established work probabilistic grammars. In fact the size of the model depends only on the size of the web site being analysed and the amount of data collected. Extensive experiments with both real and random data were conducted and the results show that, in practice, the algorithm runs in linear time in the size of the grammar. Our model has potential use both in helping the web site designer to understand the preferences of the sitevisitor’s, and in helping individual users. To better understand their own navigation patterns and increase their knowledge of the web’s content.Our approach has the following characteristics: 1) Extracting search-focused information from web pages. 2) Taking key n- grams as the representations of search-focused information. 3) Employing data mining for extraction model using search log data. 4) Employing learning to search-focused key n-grams as features. 4.2.1 KEY N-GRAMEXTRACTION Extraction step requires data pre-processing, training data generation and N-gram feature generation and N-gram extraction with task classification. Pre-processing: We assume that the objects to be searched and ranked by the search engine are web pages. During pre-processing, a web page in HTML format is parsed and represented as a sequence of tags/words. Algorithm step • Read records in Logtable, For each record in Logtable • Read fields (Sc_code, Sc_method) • If Sc_code = ‘**’and Sc_ method = ‘**’ Then • Get IP address and URL_link • If suffix.URL_Link= {*.gif,*.jpg,*.css} Then • Delete suffix.URL_link • Save IP_sddress and URL_Link • End if Else , Read next record End Training Data Generation: We can consider automatically extracting queries from the page. Head pages generally include a number of associated queries in the search log data. Such data can naturally be used as training data for the automatic extraction of queries, particularly for tail pages. We treat the n-grams in each of the document’s queries as its labelled key n-grams. For example, when a document is “ABDC” associated with the query “ABC”, we consider unigrams “A”, “B”, “C”, and bigrams “AB” are key n-grams with the assumption that they should be ranked higher than unigram “D”, and bigrams “BD”and “DC”, by the extraction model. N-gram Features Generation: Web pages contain rich formatting information compared to plain text. We utilize both textual and formatting information to create features in the extraction model in order to accurately extract key n-grams. Feature generation based on two parameter1. Frequency features 2. Appearance features.
  • 15. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 26 1. Frequency Features The original/normalized term frequencies of an n-gram within several fields, tags and attributes are utilized. • Frequency in Fields: Frequency in fields is: URL, page title, meta- keyword and meta-description. • Frequency within Structure Tags: The frequencies of n-gram in texts within a header, table or list indicated by HTML tags including <h1>, . . . ,<h6>, <table>, <li> and <dd>. • Frequency within Highlight Tags: Texts highlighted or emphasized by HTML tags including <a>, <b>, <i>, <em> and <strong>. • Frequency within Attributes of Tags: These are hidden texts which are not visible to users. Specifically, title, alt, href and src tag attributes are used. • Frequencies in other Contexts: It includes: page headers, page meta-data, page body and HTML file. 2. Appearance Features The appearances of n-grams are also important for position, coverage and distribution .indicators of their importance.Position indicates when it first appears in the title, paragraph and document and Coverage indicate the coverage of an n-gram in the title or a header and distribution are used to distribute across different parts of a page. N-Gram Extraction and Task Classification: Features for each n-gram are then extracted, an extraction model is trained.Key n-gram extraction is formalized as a learning to rank problem.In learning, a ranking model is trained which rank n-grams and task user’s current task will be finalized.The main aim task classification algorithm is to find the user’s task and is classified into two main group’s casual user and careful user, in casual searching the user wants to find the precise and credible information. Algorithm step • Frequently visited URLs as indicators for the task type classification (Cs-Uri-Stem) field. • Web task threshold (t=5ms). • Storing all frequently visited URLs and counting the occurrence of the Frequently Visited URLs. • If frequently visited URLs are more than or equals to 5 then setting the user task is careful user, otherwise the user task is casual user. • If frequently visited URL have query (Cs-Uri-Query) and that query will be same then setting the user task is casual otherwise the user task is careful user. • Total no. of the URL in casual searching was higher than total no. of URL in careful searching. 5. APPLICATION AND FUTURE TRENDS AND CONCLUSION 5.1 APPLICATION Web-wide tracking – DoubleClick: ‘Web-wide tracking’, is tracking an individual across all sites he visits is one of the most intriguing and controversial technologies, it provides an understanding of an individual’s lifestyle and habits. The value of this technology in applications such as cyber-threat analysis and homeland defense is quite clear, and it might
  • 16. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 27 be only a matter of time before these organizations are asked to provide this information. Understanding Web communities – AOL: Applying web mining to the data collected from community interactions provides AOL with a very good understanding of its communities, which it has used for targeted marketing through ads and e-mail solicitations. The idea is to treat the community as a highly specialized focus group, understand its needs and opinions on new and existing products; and also test strategies for influencing opinions. Web Catching: The Web caching aims to improve the performance of web-based systems by storing and reusing web objects that are likely to be used in the near future. It has proven to be an effective technique in reducing network traffic, decreasing the access latency and lowering the server load[18] .Web caching has focused on the use of historic information about web objects to aid the cache replacement policies. Web Prefetching: Web prefetching is a technique for reducing web latency based on predicting the next future web objects to be accessed by the user and prefetching them during times. The prefetching technique has two main components: The prediction engine and the prefetching engine. The prediction engine runs a prediction algorithm to predict the next user’s request [18]. 5.2 FUTURE DIRECTION Fraud and Threat analysis: The anonymity provided by the Web has led to a significant increase in attempted fraud, from unauthorized use of individual credit cards to hacking into credit card databases for blackmail purposes. Yet another example is auction fraud, which has been increasing on popular sites like eBay. Since all these frauds are being perpetrated through the Internet, Web mining is the perfect analysis technique for detecting and preventing them. Web mining and Privacy: While there are many benefits to be gained from Web mining, a clear drawback is the potential for severe violations of privacy. Public attitude towards privacy seems to be almost schizophrenic – i.e. people say one thing and do quite the opposite. The research issue generated by this attitude is the need to develop approaches, methodologies and tools that can be used to verify and validate that a Web service is indeed using an end-user’s information in a manner consistent with its stated policies. 5.3 CONCLUSION This paper will present a state-of-the art review of the current research associated with these human factors. This review will be important for practitioners who want to develop a sound understanding of the needs and preferences of users with various characteristics such as intensity of surfing, interest, gender difference and topic similarity. Our model has potential use both in helping the web site designer to understand the preferences of the site visitor’s, and their behaviour and access pattern that will be used to decide the human information behaviour. The model also analyzes the users’ web surfing patterns and traces the terrorists and criminals activities. In this paper we are using the N-grams methods to search log data, and the characteristics of key n-grams can be applied to the other data set. The extracted key n-grams are used as features of the relevance ranking model for finding users current task and their access behaviour. This approach also applicable to understand the navigation patterns and increase their knowledge of the web’s content and it also applicable in a posterior forensic investigation. The model will also help designers to develop web-based personalized applications that can accommodate user’s individual differences and used for detecting and avoiding the terror threats caused by terrorists all over the world.
  • 17. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 28 REFERENCES [1] Ford, N., Miller, D., & Moss, N, “ Web search strategies and human individual differences: Cognitive and demographic factors, internet attitudes, and approaches” . Journal of the American Society for Information Science and Technology, pp. 741– 756. 2005. [2] Graff, M. (2003). “Learning from web-based instructional Systems and cognitive style”. British Journal of Educational Technology, 34(4), 407–418. [3] Chi E. H.; Pirolli P.; Chen K.; and Pitkow J. 2001. “Using information scent to model user information needs and actions and the Web” . In Proceedings of the SIGCHI conference on Human factors in computing systems,490- 497, Seattle, Washington, United States: AC/M 22/11/2007). [4] Kim K. and Allen B. 2002. Cognitive and task influences on web searching behavior. Journal of the American Society forInformation Science and Technology, 53(2):109- 119: JohnWiley& Sons. [5] Sherry y. chen, Robert Macradie,” web based interaction: A review of three important human factors”, International journal of information management, journal homepage: www.elsevier.com/locate/ijinfomgt pp. 1-9, 2010. [6] G. Eason, B. Noble, and I. N. Sneddon, “On certain integrals of Lipschitz-Hankel type involving products of Besselfunctions,” Phil. Trans. Roy. Soc. London, vol. A247, pp. 529– 551, April 1955. [7] White R. W. and Drucker S. M. 2007. Investigating behavioral variability in web search. In Proceedings of the16th international conference on World Wide Web, 21- 30,Banff, Alberta, Canada: ACM. [8] K.R.Suneetha, K.R.Krishnamoorthy,“Identifying User behavior by Analyzing Web Server Access File” IJCSNA International Journal of Computer Science and Network Security, Vol. 9 No.4 April 2009 [9] Alaa El-Halees “Mining Students Data to AnalyzingLearning Behavior: a Case Study”, http://eref.uqu.edu.sa/files/eref2/folder6/f158.pdf [10] R.Cooley, B.Mobasher, and J.Srivastav, “Web mining: Information and Pattern Discovery on the World Wide Web”,Proc. IEEE Intl. Conf. Tools with Al, Newport Beach, CA, pp.558-56, 1997 [11] Mahesh thyloreramkrishna, LathaKomalGowdar, LalatessSomashekarHavanur, “Web mining: Key Accomplishments, Application, and Future Directions”, International conference on Data Storage and Data Engineering, pp. 186-191, 2010 [12] Jinhyuk Choi, Jeongseok Seo, Geehyuk Lee “Analysis of web usage pattern using various contextual factors” Association of advancement of artificial intelligence pp. 1- 9, 2009. [13] R. Cooley, B. Mobasher, J. Srivastava, “Web Mining Information and Pattern Discovery on the World Wide Web”, InProceedings of the 9th IEEE International Conference on Tools With Artificial Intelligence, Newport Beach, CA, 1997. [14] J.Srivastava, R. Cooley, M. Deshpande and P- N.Tan, “Web Usage Mining: Discovery and Applications of usage pattern From Web Data”, SIGKDD Explorations, Vol.1, Issue 2, 2000. [15] Cooley, R., Mobasher, B.,&Srivastava, J. (1999). “Data preparation for mining world wide web browsing patterns” Journal of Knowledge and Information Systems, 1 (1), 5-32.
  • 18. International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 – 6405(Print), ISSN 0976 – 6413(Online), Volume 5, Issue 1, January - April (2014), © IAEME 29 [16] R. Kosala, H. Blockeel, “Web Mining Research: A Survey”,in SIGKDD Explorations 2(1), ACM, July 2000. [17] JaideepSrivastava, Robert Cooleyz ,MukundDeshpande, Pang-Ning Tan, “Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data”SIGKD Explorations. ACM SIGKDD, pp. 1-10, Jan 2000. [18] Sandhaya Gawade , Hitesh Gupta, “Review of Algorithms for Web Pre-fetching andCaching, International Journal of Advanced Research in Computer and Communication Engineering Vol. 1, Issue 2, pp. 1-4, April 2012. [19] RozitaJamiliOsfouei, “Behaviour mining of female students by analysing log files”, In Proceeding of IEEE fifth international Conferences on Digital InformationManagement ICDM 2010, Canada pp. 5-8. July 2010. [20] T. Anand, S. Padmapriya, E. kirubakram, “Terror Tracking Using Advanced Web Mining Perspective”, In Proceeding of IEEE Fourth international Conferences on Intelligent agent and multimedia. pp. 1-4, 2009. [21] Jos’eBorges and Mark Levene, “Data Mining of User Navigation Patterns” Department of Computer Science, University College London, Gower Street, London, pp. 1-19, April 2000. [22] Chen Wan, KepingBi,Yunhua Hu, “Extracting Search-Focused Key N-Grams for Relevance Ranking in Web Search” WSDM’12, February 8–12, 2012, Seattle, Washington, USA, ACM. pp. 1-10.2012. [23] Prof. Sindhu P Menon and Dr. Nagaratna P Hegde, “Research on Classification Algorithms and its Impact on Web Mining”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 4, 2013, pp. 495 - 504, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [24] Alamelu Mangai J, Santhosh Kumar V and Sugumaran V, “Recent Research in Web Page Classification – A Review”, International Journal of Computer Engineering & Technology (IJCET), Volume 1, Issue 1, 2010, pp. 112 - 122, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [25] Suresh Subramanian and Dr. Sivaprakasam, “Genetic Algorithm with a Ranking Based Objective Function and Inverse Index Representation for Web Data Mining”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 5, 2013, pp. 84 - 90, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [26] Purvi Dubey and Asst. Prof. Sourabh Dave, “Effective Web Mining Technique for Retrieval Information on the World Wide Web”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 6, 2013, pp. 156 - 160, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [27] Hemprasad Badgujar and Dr. R.C.Thool, “His: Human Identification Schemes on Web”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 2, 2013, pp. 198 - 212, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.