Applying web mining application for user behavior understanding
1. APPLYING WEB MINING
APPLICATION FOR USER
BEHAVIOR UNDERSTANDING
Dr. Zakaria Suliman Zubi
Associate Professor
Computer Science Department
Faculty Of Science
Sirte University, Libya
LOGO
3. LOGO
Abstract
Web usage mining (WUM) focuses on the discovering of potential knowledge from
browsing patterns of the users. Which leads us to find the correlation between pages in the
analysis stage.
The primary data source used in web usage mining is the server log-files (web-logs).
Browsing web pages by the user leaves a lot of information in the log-file. Analyzing logfiles information drives us to understand the behavior of the user.
Web log is an essential part for the web mining to extract usage patterns and study the
visiting characteristics of user.
Our paper focus on the use of web mining techniques to classify web pages type according
to user visits.
This classification helps us to understand the user behavior.
We also uses some classification and association rule techniques for discovering the
potential knowledge from the browsing patterns.
5. LOGO
INTRODUCTION
The Internet offers a huge, widely global information center for
News, advertising, consume information, financial management,
education, government, and e-commerce .
The aim of using web mining techniques for understanding user
behavior is to profile user characteristics.
Web mining can be organized into three main categories: web
content mining, web structure mining, and web usage mining.
6. LOGO
INTRODUCTION
Cont..
Web Mining
Web Structure Mining
Web Content Mining
Web Usage Mining
1-Web content mining analyzes web content such as text,
multimedia data, and structured data (within web pages or linked
across web pages).
2 -Web structure mining is the process of using graph and
network mining theory and methods to analyze the nodes and
connection structures on the Web.
3- Web Usage Mining is a special type of web mining tool, which
can discover the knowledge in the hidden browsing patterns and
analyses the visiting characteristics of the users.
7. LOGO
INTRODUCTION Cont..
The Primary Data of Web Usage Mining
1-Web server logs .
2-Data about visitors of the sites.
3-Registration forms.
Fig 2:portion of a typical server log
A standard log-file had the following format
remotehost; logname; username; date; request; status; bytes[ where:
remotehost: is the remote hostname or its IP address;
logname:is the remote log name of the user;
username: is the username with which the user has authenticated himself,
date: is the date and time of the request,
request: is the exact request line as it came from the client,
status: is the HTTP status code returned to the client, and
bytes: is the content-length of the document transferred.
9. LOGO
THE PHASES OF WEB USAGE MINING
Web usage mining is a complete process that
includes various stages of data mining cycle, including
Data Preprocessing, Pattern Discovery & Pattern
Analysis.
Initially, at the data preprocessing stage web log is
preprocessed to clean, integrate and transform into a
common log.
In the pattern discovery: Data mining techniques
are applied to discover the interesting characteristics
in the hidden patterns.
Pattern Analysis is the final stage of web usage
mining which can validate interested patterns from the
output of pattern discovery that can be used to predict
user behavior.
10. LOGO THE PHASES OF WEB USAGE MINING
Data Preprocessing Process
Data Cleaning:
The log-file is first examined to remove
irrelevant entries such as those that represent
multimedia data and scripts or uninteresting
entries such as those that belongs to
top/bottom frames.
Pageview Identification:
Identification of
page views is heavily
dependent on the intra-page structure of the
site, as well as on the page contents and the
underlying site do-main knowledge. each
pageview can be viewed as a collection of
Web objects or resources representing a
specific “user event,”.
Data
Cleaning
Pageview
Identification
User
Identification
Session
Identification
11. LOGO THE PHASES OF WEB USAGE MINING
Data Preprocessing Process
User Identification:
Since several users may share a single
machine name, certain heuristics are
used to identify users . We use the
phrase user activity record to refer to the
sequence of logged activities belonging
to the same user.
Session Identification:
Aims to split the page access of each
user into separated sessions. It defines
the number of times the user has
accessed a web page and time out
defines a time limit for the access of
particular web page for more than 30
minutes if more the session will be
divided in more than one session.
Sample of user and sessions identification
12. LOGO THE PHASES OF WEB USAGE MINING
Pattern Discovery Process:
Discovering user access pattern from the user access log files is the main
purpose of using web usage mining .
Association Rule Mining:
Association rule mining discovery and statistical correlation analysis can
find groups of web pages types that are commonly accessed together
(Association rule mining can be used to discover correlation between pages
types found in a web log) this technique is applied to user and session
identification consisting of item where every item represents a page type ,we
will also use Apriori algorithm to find the correlation between pages based on
the confidence and support vectors.
What are the set of pages type frequently accessed together by the web users.
e.g
(Sport, News, Social)
What the page type will be fetched next.
e.g
Entertainment
13. LOGO THE PHASES OF WEB USAGE MINING
Classification
Classification techniques play an important role in Web analytics
applications for modeling the users according to various predefined
metrics.
In the Web domain, we are interested in developing a profile of users
belonging to a particular class or category . This requires extraction and
selection of features that best describe the properties of a given class or
category.
We will focus also on k-nearest neighbor (K-NN) which was
considered as a predictive technique for classification models. Whereas;
k represents a number of similar cases or the number of items in the
group.
14. LOGO THE PHASES OF WEB USAGE MINING
Pattern Analysis Process:
In this stage of process the discovered patterns will further
processed ,filtered ,possibly resulting in aggregate user models
that can be used as a visualizations tools ,the next figure
summarizes the whole process:
16. RESULTS OF USING ASSOCIATION RULES
LOGO
Log-file in a flat file format.
Import log-file database to our implemented
application.
17. RESULTS OF USING ASSOCIATION RULES
LOGO
Extract the transactional database of
web sever log for every user where
every transaction represents a session.
Find the association rules of user
behavior after applying the Aprori
algorithm to the transactional database of
the user.
19. LOGO
CONCLUSION
We used web data that contained all the information about the user. When
the user leaves accessing the web pages. This data is called web logs or (serverlogs)
A statistical methods such as classification, association rule mining discovery
and statistical correlation analysis which can find groups of web pages types
that are commonly accessed together are applied as well.
Classification is used to map the data item into one of several predefined
classes. The class will belongs into one category such as sport or politics or
education or..etc. We also uses the k-nearest neighbor (K-NN) algorithm as a
common classification method to select the best class.
Association rule mining was used to discover correlation between sites types
found in a web log.
The implemented application program was designed in C# programming
language.