Data mining refers to analyzing data from different perspectives to extract useful information. There are various types of data mining including business, scientific, and internet data mining. Web mining is a main application of data mining that involves the automated discovery of useful information from web documents and services. It has three domains: web content mining which extracts patterns from online information, web structure mining which describes how content is organized, and web usage mining which analyzes web access logs to understand user behavior. Common web mining techniques include clustering, classification, association rules, path analysis, and sequential patterns. Web mining tools like Mozenda can routinely extract, store, and publish web data to be used in various applications.
3. Introduction
Data Mining refers to the process of analysing
the data from different perspectives and
summarizing it into useful information.
◊ Analyze data.
◊ Categorize data.
◊ Summarize relationship.
◊ Describing structural patterns.
3
4. Definition: Data Mining
Data mining is the process of finding
correlation or patterns among fields in large
relational databases.
Business Data Mining.
Scientific Data Mining.
Internet Data Mining.
4
5. Major elements of Data Mining
E T L.
Store and manage multidimensional database.
Provide access to.
Analyze the data.
Presentation of data.
5
6. What is Web Mining?
Main application
for DataMining
“Broadly defined
as the automated
Discovery and
analysis of useful
information from
Documents,
services using
data mining
6
7. Need For Web Mining.
Information
Data
World Wide Web is a
popular and interactive
medium, ideal for
publishing information.
It is huge, diverse and
dynamic and thus
raises issue of
scalability, multimedia
and temporal data
respectively
7
10. 1. Web Content Mining.
An automatic process that extracts patterns from
online information, such as the HTML files,
images, or Emails, and it already goes beyond
only keywords extraction or some simple
statistics of words and phrases in documents.
Process of information or resource discovery from
millions of source across the WWW
10
11. a. Agent based approaches.
Artificial intelligence system that can “act
autonomously or semi – autonomously on behalf
of a particular user, to discover and organize Web
based information.”
b. Data approaches.
“Integrating and organizing the heterogeneous
and semi – structured data on the Web into more
structured
and
high
level
collections
of
resources.”
11
12. 2. Web Structure Mining.
describes organizations of content
Intra – page structure information includes the
arrangement of various tags.
Example : HTML or XML tags.
<html> tag becomes the root of the tree.
12
13. 3. Web Usage Mining.
Web servers record and accumulate data.
Analysing the web access logs.
Understand the user behaviour and the Web
structure.
13
14. Web Mining Techniques.
i.Clustering / Classification
Used to develop profiles of items with
similar characteristics.
Ability enhances the discovery of
relationships
Eg : Classification of Web access logs
14
15. ii. Association Rules.
Rules that govern databases of transactions
Used to predict the correlation of items.
Presence of one set items in a transaction
implies.
15
16. iii. Path Analysis.
Generation of graph that “represents
relation[s] defined on Web pages.”
Physical layout of a Web site.
Sitemap.
16
17. iv. Sequential Patterns.
Web access server transaction logs.
Discover sequential patterns
Example: user visit patterns over a certain
period.
17
19. Web mining as a tool:
Promising tool effective search engine
Discovers information from mounds.
Predicts user visit habits.
Designers gets more reliable information.
Eg: Web sites with path helps to save time.
19
20. Current research:
As many researchers believe, it was Etzioni who first came
up with the term of Web mining in his paper . He brought
out a question: is it practical to mine Web data? He also
suggested dividing the Web mining to three processes. The
paper opened up a new active research field.
There are increasing number of researcher working on this
field and do some surveys around the data mining on the
Web. The Web mining was clearly categorized as Web
content mining, Web structure mining and Web usage
mining in till 1999. The research works have been well
classified since then.
There have been some works around content mining, and
structure mining, based on the research of Data mining and
Information Retrieval, Information Extraction, and Artificial
Intelligence.
20
21. In the usage mining research area, several groups did
distinguished work. R. Cooley et al. in University of Minnesota did
in-depth research to all the procedure of usage mining. They
proposed a mining prototype WebMiner and derived a system
WebSIFT to perform the usage mining, which is relatively
practical. O. Zaiane et al. [15] proposed the idea of how to
implement the OLAP technique on the Web mining.
Their works on the multimedia data also provided a valuable
solution for content mining. M. Spiliopoulou et al. focused on the
applications of the usage mining. His works on the navigation
pattern discovery and web site personalization has special
meaning for the e-commerce society and the Web marketplace
allocation, and will be very helpful for both Web user and
administrator. The Web Utilization Miner system is aninnovative
sequential mining system.
J. Borges et al. has explored some algorithms to mine the user
navigation pattern in [2] and his other papers. He proposed a data
mining model to achieve an efficient mining, which captures the
user navigation behavior pattern by using Ngrammar approach.
21
22. Mining Tool:Mozenda
Mozenda is a Software as a Service (SaaS) company
that enables users of all types to easily and affordably
extract and manage web data. With Mozenda, users
can set up agents that routinely extract data, store
data, and publish data to multiple destinations. Once
information is in the Mozenda systems users can
format, repurpose, and mashup the data to be used
in other online/offline applications or as intelligence.
All data in the Mozenda system is secure and is
hosted in class A data warehouses but can be
accessed over the web securely via the Mozenda Web
Console. With the addition of a fully featured REST
API, Companies can now seamlessly integrate their
data automation with the Mozenda application.
22
23. Conclusion:
Data mining is a useful tool with multiple
algorithms that can be tuned for specific
tasks.
It benefits business, medical, and science.
23
24. Reference:
www.datawarehousingonline
.com
Data base System – Elmasri,
Navathe.
Data Mining Technologies –
Arun K Pujari.
http://www.cse.aucegypt.ed
u/~rafea/CSCE564/sldes/W
ebMiningOverview.pdf
http://www.mozenda.com/w
eb-mining-software
24