This document discusses web usage mining. It begins by defining web mining and its three categories: web content mining, web structure mining, and web usage mining. The main focus is on web usage mining, which involves discovering user navigation patterns and predicting user behavior. The key processes of web usage mining are preprocessing raw data, pattern discovery using algorithms, and pattern analysis. Pattern discovery techniques discussed include statistical analysis, clustering, classification, association rules, and sequential patterns. Potential applications are personalized recommendations, system improvements, and business intelligence. The document concludes by discussing future research directions such as usage mining on the semantic web and analyzing discovered patterns.
2. Outline
Brief overview of Web mining
Web usage mining
Application areas of Web usage
mining
Future research directions
3. Web Mining
Web Mining is the application of
data mining techniques to discover
and retrieve useful information and
patterns from the World Wide Web
documents and services.
4. Web Mining Categories
Web Content Mining- extracting
knowledge from the content of the
Web
Web Structure Mining- discovering
the model underlying the link
structures of the Web
Web Usage Mining- discovering
user’s navigation pattern and
predicting user’s behavior
5. Web Usage Mining Processes
Preprocessing: conversion of the raw data
into the data abstraction (users, sessions,
episodes, clickstreams, and pageviews)
necessary for further applying the data
mining algorithm.
Pattern Discovery: is the key component of
WUM, which converges the algorithms and
techniques from data mining, machine
learning, statistics and pattern recognition
etc. research categories.
Pattern Analysis: Validation and
interpretation of the mined patterns
7. Web Usage Mining- Preprocessing
Data Cleaning: remove outliers and/or irrelative data
User Identification: associate page references with
different users
Session Identification: divide all pages accessed by a
user into sessions
Path Completion: add important page access records
that are missing in the access log due to browser and
proxy server caching
Formatting: format the sessions according to the type
of data mining to be accomplished.
8. Web Usage Mining -
Pattern Discovery Tasks
Statistical Analysis: frequency analysis, mean,
median, etc.
◦ Improve system performance
◦ Provide support for marketing decisions
◦ Simplify site modification task
Clustering:
◦ Clustering of users help to discover groups of users
with similar navigation patterns => provide
personalized Web content
◦ Clustering of pages help to discover groups of pages
having related content => search engine
9. Web Usage Mining -
Pattern Discovery Tasks (Cont.)
Classification: the technique to map a data
item into one of several predefined classes
◦ Develop profile of users belonging to a
particular class or category
Association Rules: discover correlations
among pages accessed together by a client
◦ Help the restructure of Web site
◦ Page prefetching
◦ Develop e-commerce marketing strategies
10. Web Usage Mining -
Pattern Discovery Tasks (Cont.)
Sequential Patterns: extract frequently occurring
intersession patterns such that the presence of a set
of items followed by another item in time order
◦ Predict future user visit patterns=>placing ads or
recommendations
◦ Page prefeteching
Dependency Modeling: determine if there are any
significant dependencies among the variables in the
Web domain
◦ Predict future Web resource consumption
◦ Develop business strategies to increase sales
◦ Improve navigational convenience of users
11. Web Usage Mining -
Pattern Analysis
Pattern Analysis is the final stage of WUM,
which involves the validation and
interpretation of the mined pattern
Validation: to eliminate the irrelative rules
or patterns and to extract the interesting
rules or patterns from the output of the
pattern discovery process
Interpretation: the output of mining
algorithms is mainly in mathematic form
and not suitable for direct human
interpretations
12. Web Usage Mining -
Pattern Analysis Methodologies and Tools
Visualization: help people to understand both real and
abstract concepts
◦ WebViz: Web is visualized as a direct graph
Query mechanism: allow analysts to extract only
relevant and useful patterns by specifying constraints.
◦ WEBMINER
On-Line Analytical Processing (OLAP): enable analysts
to perform ad hoc analysis of data in multiple
dimensions for decision-making
◦ WebLogMiner
13. Application Areas for
Web Usage Mining
Personalized: discover the preference and
needs ofindividual Web users in order to
provide personalized Web site for certain
types of users
Impersonalized: examine general user
navigation patterns in order to understand
how general users use the site
◦ System Improvement
◦ Site Modification
◦ Business Intelligence
◦ Web Characterization
14. Future Research Directions
Usage Mining on Semantic Web
◦ Help to build semantic Web
◦ With semantic Web, WUM can be
improved
Multimedia Web Data Mining
◦ Representation, problem solving and
learning from Multimedia data is
indeed a challenge
15. Future Research Directions
(Cont.)
Analysis of Discovered Patterns
◦ Research on efficient, flexible and
powerful analysis tools
More Applications
◦ Temporal evolutions of usage behavior
◦ Improving Web services
◦ Detect credit card fraud
◦ Privacy issues
16. Conclusion
Web usage and data mining to find patterns is a
growing area with the growth of Web-based
applications
Application of web usage data can be used to
better understand web usage, and apply this
specific knowledge to better serve users
Web usage patterns and data mining can be the basis
for a great deal of future research