UNIT - 5: Data Warehousing and Data Mining

DATA WAREHOUSING AND
DATA MINING
UNIT – 5
Prepared by
Mr. P. Nandakumar
Assistant Professor,
Department of IT, SVCET

UNIT-V
Mining Object, Spatial, Multimedia, Text, and Web Data:
Multidimensional Analysis and Descriptive Mining of Complex
Data Objects – Spatial Data Mining – Multimedia Data Mining –
Text Mining – Mining the World Wide Web.

Mining Object, Spatial, Multimedia,
Text, and Web Data
• In general terms, “Mining” is the process of extraction. In the context of computer science, Data
Mining can be referred to as knowledge mining from data, knowledge extraction, data/pattern
analysis, data archaeology, and data dredging.
• There are other kinds of data like semi-structured or unstructured data which includes spatial data,
multimedia data, text data, web data which require different methodologies for data mining.
• Mining Multimedia Data: Multimedia data objects include image data, video data, audio data,
website hyperlinks, and linkages. Multimedia data mining tries to find out interesting patterns from
multimedia databases.
• This includes the processing of the digital data and performs tasks like image processing, image
classification, video, and audio data mining, and pattern recognition.

Text, and Web Data
• Multimedia Data mining is becoming the most interesting research area because most of the social
media platforms like Twitter, Facebook data can be analyzed through this and derive interesting
trends and patterns.
• Mining Web Data: Web mining is essential to discover crucial patterns and knowledge from the
Web. Web content mining analyzes data of several websites which includes the web pages and the
multimedia data such as images in the web pages.
• Web mining is done to understand the content of web pages, unique users of the website, unique
hypertext links, web page relevance and ranking, web page content summaries, time that the users
spent on the particular website, and understand user search patterns. Web mining also finds out the
best search engine and determines the search algorithm used by it. So it helps improve search
efficiency and finds the best search engine for the users.

Text, and Web Data
• Mining Text Data: Text mining is the subfield of data mining, machine learning, Natural Language
processing, and statistics. Most of the information in our daily life is stored as text such as news
articles, technical papers, books, email messages, blogs.
• Text Mining helps us to retrieve high-quality information from text such as sentiment analysis,
document summarization, text categorization, text clustering. We apply machine learning models and
NLP techniques to derive useful information from the text.
• This is done by finding out the hidden patterns and trends by means such as statistical pattern
learning and statistical language modeling.
• In order to perform text mining, we need to preprocess the text by applying the techniques of
stemming and lemmatization in order to convert the textual data into data vectors.

Text, and Web Data
• Mining Spatiotemporal Data: The data that is related to both space and time is Spatiotemporal
data. Spatiotemporal data mining retrieves interesting patterns and knowledge from spatiotemporal
data.
• Spatiotemporal Data mining helps us to find the value of the lands, the age of the rocks and precious
stones, predict the weather patterns.
• Spatiotemporal data mining has many practical applications like GPS in mobile phones, timers,
Internet-based map services, weather services, satellite, RFID, sensor.

Text, and Web Data
• Mining Data Streams: Stream data is the data that can change dynamically and it is noisy,
inconsistent which contain multidimensional features of different data types. So this data is stored in
NoSql database systems.
• NoSQL Database is used to refer a non-SQL or non relational database. It provides a mechanism for
storage and retrieval of data other than tabular relations model used in relational databases. NoSQL
database doesn't use tables for storing data. It is generally used to store big data and real-time web
applications.
• The volume of the stream data is very high and this is the challenge for the effective mining of stream
data. While mining the Data Streams we need to perform the tasks such as clustering, outlier
analysis, and the online detection of rare events in data streams.

Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
• Multidimensional analysis is the analysis of dimension objects organized in meaningful hierarchies.
Multidimensional analysis allows users to observe data from various viewpoints. This enables them
to spot trends or exceptions in the data. A hierarchy is an ordered series of related dimensions.
• Multidimensional analysis gives the ability to view data from different viewpoints. This is especially
critical for business.
• In statistics, econometrics and related fields, multidimensional analysis (MDA) is a data analysis
process that groups data into two categories: data dimensions and measurements.
• The multidimensional data model is composed of logical cubes, measures, dimensions, hierarchies,
levels, and attributes. The simplicity of the model is inherent because it defines objects that represent
real-world business entities.

• The Complex data types require advanced data mining techniques. Some of the Complex data types
are sequence Data which includes the Time-Series, Symbolic Sequences, and Biological Sequences.
The additional preprocessing steps are needed for data mining of these complex data types.
1. Time-Series Data Mining:
• In time-series data, data is measured as the long series of the numerical or textual data at equal time
intervals per minute, per hour, or per day. Time-series data mining is performed on the data obtained
from the stock markets, scientific data, and medical data. In time series mining it is not possible to
find the data that exactly matches the given query. We employ the similarity search method that finds
the data sequences that are similar to the given query string. In the similarity search method,
subsequence matching is performed to find the subsequences that are similar to a given query string.

2. Sequential Pattern Mining in Symbolic Sequences:
• Symbolic sequences are composed of long nominal data sequences, which dynamically change their
behavior over time intervals. Examples of the Symbolic Sequences include online customer
shopping sequences as well as sequences of events of experiments. Mining of Symbolic Sequences is
called Sequential Mining. A sequential pattern is a subsequence that exists more frequently in a set
of sequences. so it finds the most frequent subsequence in a set of sequences to perform the mining.
Many scalable algorithms have been built to find out the frequent subsequence. There are also
algorithms to mine the multidimensional and multilevel sequential patterns.

3. Data mining of Biological Sequences:
• Biological sequences are the long sequences of nucleotides and data mining of biological sequences
is required to find the features of the DNA of humans. Biological sequence analysis is the first step of
data mining to compare the alignment of the biological sequences. Two species are similar to each
other only if their nucleotide (DNA, RNA) and protein sequences are close and similar. During the
data mining of Biological Sequences, the degree of similarity between nucleotide sequences is
measured. The degree of similarity obtained by sequence alignment of nucleotides is essential in
determining the homology between two sequences. There can be the situation of alignment of two or
more input biological sequences by identifying similar sequences with long subsequences. The amino
acids also called proteins sequences are also compared and aligned.

4. Graph Pattern Mining:
• Graph Pattern Mining can be done by using Apriori-based and pattern growth-based approaches. We
can mine the subgraphs of the graph and the set of closed graphs. A closed graph g is the graph that
doesn’t have a super graph that carries the same support count as g. Graph Pattern Mining is applied
to different types of graphs such as frequent graphs, coherent graphs, and dense graphs. We can also
improve the mining efficiency by applying the user constraints on the graph patterns. Graph patterns
are two types. Homogeneous graphs where nodes or links of the graph are of the same type by having
similar features. In Heterogeneous graph patterns, the nodes and links are of different types.

5. Statistical Modeling of Networks:
• A network is a collection of nodes where each node represents the data and the nodes are linked
through edges, representing relationships between data objects. If all the nodes and links connecting
the nodes are of the same type, then the network is homogeneous such as a friend network or a web
page network. If the nodes and links connecting the nodes are of different types, then the network is
heterogeneous such as health-care networks (linking the different parameters such as doctors, nurses,
patients, diseases together in the network). Graph Pattern Mining can be further applied to the
network to derive the knowledge and useful patterns from the network.

6. Mining Spatial Data:
• Spatial data is the geo space-related data that is stored in large data repositories. The spatial data is
represented in “vector” format and geo-referenced multimedia format. A spatial database is
constructed from large geographic data warehouses by integrating geographical data of multiple
sources of areas. we can construct spatial data cubes that contain information about the spatial
dimensions and measures. It is possible to perform the OLAP operations on the spatial data for
spatial data analysis. Spatial data mining is performed on spatial data warehouses, spatial databases,
and other geospatial data repositories. Spatial Data mining discovers knowledge about the
geographic areas. The preprocessing of spatial data involves several operations like spatial
clustering, spatial classification, spatial modeling, and outlier detection in spatial data.

7. Mining Cyber-Physical System Data:
• Cyber-Physical System Data can be mined by constructing a graph or network of data. A cyber-
physical system (CPS) is a heterogeneous network that consists of a large number of interconnected
nodes that store patients or medical information. The links in the CPS network represent the
relationship between the nodes . cyber-physical systems store dynamic, inconsistent, and
interdependent data that contains spatiotemporal information. Mining cyber-physical data links the
situation as a query to access the data from a large information database and it involves real-time
calculations and analysis to prompt responses from the CPS system. CPS analysis requires rare-event
detection and anomaly analysis in cyber-physical data streams, in cyber-physical networks, and the
processing of Cyber-Physical Data involves the integration of stream data with real-time automated
control processes.

8. Mining Multimedia Data:
• Multimedia data objects include image data, video data, audio data, website hyperlinks, and linkages.
Multimedia data mining tries to find out interesting patterns from multimedia databases. This
includes the processing of the digital data and performs tasks like image processing, image
classification, video, and audio data mining, and pattern recognition. Multimedia Data mining is
becoming the most interesting research area because most of the social media platforms like Twitter,
Facebook data can be analyzed through this and derive interesting trends and patterns.

9. Mining Web Data:
• Web mining is essential to discover crucial patterns and knowledge from the Web. Web content
mining analyzes data of several websites which includes the web pages and the multimedia data such
as images in the web pages. Web mining is done to understand the content of web pages, unique
users of the website, unique hypertext links, web page relevance and ranking, web page content
summaries, time that the users spent on the particular website, and understand user search patterns.
Web mining also finds out the best search engine and determines the search algorithm used by it. So
it helps improve search efficiency and finds the best search engine for the users.

10. Mining Text Data:
• Text mining is the subfield of data mining, machine learning, Natural Language processing, and
statistics. Most of the information in our daily life is stored as text such as news articles, technical
papers, books, email messages, blogs. Text Mining helps us to retrieve high-quality information from
text such as sentiment analysis, document summarization, text categorization, text clustering. We
apply machine learning models and NLP techniques to derive useful information from the text. This
is done by finding out the hidden patterns and trends by means such as statistical pattern learning and
statistical language modeling. In order to perform text mining, we need to preprocess the text by
applying the techniques of stemming and lemmatization in order to convert the textual data into data
vectors.

11. Mining Spatiotemporal Data:
• The data that is related to both space and time is Spatiotemporal data. Spatiotemporal data mining
retrieves interesting patterns and knowledge from spatiotemporal data. Spatiotemporal Data mining
helps us to find the value of the lands, the age of the rocks and precious stones, predict the weather
patterns. Spatiotemporal data mining has many practical applications like GPS in mobile phones,
timers, Internet-based map services, weather services, satellite, RFID, sensor.

12. Mining Data Streams:
• Stream data is the data that can change dynamically and it is noisy, inconsistent which contain
multidimensional features of different data types. So this data is stored in NoSql database systems.
The volume of the stream data is very high and this is the challenge for the effective mining of stream
data. While mining the Data Streams we need to perform the tasks such as clustering, outlier
analysis, and the online detection of rare events in data streams.

Spatial Data Mining
A spatial database saves a huge amount of space-related data, including maps, preprocessed remote
sensing or medical imaging records, and VLSI chip design data. Spatial databases have several features
that distinguish them from relational databases.
They carry topological and/or distance information, usually organized by sophisticated,
multidimensional spatial indexing structures that are accessed by spatial data access methods and often
require spatial reasoning, geometric computation, and spatial knowledge representation techniques.
Spatial data mining refers to the extraction of knowledge, spatial relationships, or other interesting
patterns not explicitly stored in spatial databases. Such mining demands the unification of data mining
with spatial database technologies. It can be used for learning spatial records, discovering spatial
relationships and relationships among spatial and nonspatial records, constructing spatial knowledge
bases, reorganizing spatial databases, and optimizing spatial queries.

Spatial Data Mining
It is expected to have broad applications in geographic data systems, marketing, remote sensing, image
database exploration, medical imaging, navigation, traffic control, environmental studies, and many
other areas where spatial data are used.
A central challenge to spatial data mining is the exploration of efficient spatial data mining techniques
because of the large amount of spatial data and the difficulty of spatial data types and spatial access
methods. Statistical spatial data analysis has been a popular approach to analyzing spatial data and
exploring geographic information.
The term geostatistics is often associated with continuous geographic space, whereas the term spatial
statistics is often associated with discrete space. In a statistical model that manages non-spatial records,
one generally considers statistical independence among different areas of data.

Spatial Data Mining
There is no such separation among spatially distributed records because, actually spatial objects are
interrelated, or more exactly spatially co-located, in the sense that the closer the two objects are placed,
the more likely they send the same properties. For example, natural resources, climate, temperature, and
economic situations are likely to be similar in geographically closely located regions.
Such a property of close interdependency across nearby space leads to the notion of spatial
autocorrelation. Based on this notion, spatial statistical modeling methods have been developed with
success. Spatial data mining will create spatial statistical analysis methods and extend them for large
amounts of spatial data, with more emphasis on effectiveness, scalability, cooperation with database
and data warehouse systems, enhanced user interaction, and the discovery of new kinds of knowledge.

Multimedia Data Mining
Multimedia mining is a subfield of data mining that is used to find interesting information of implicit
knowledge from multimedia databases. Mining in multimedia is referred to as automatic annotation or
annotation mining. Mining multimedia data requires two or more data types, such as text and video or
text video and audio.
Multimedia data mining is an interdisciplinary field that integrates image processing and understanding,
computer vision, data mining, and pattern recognition. Multimedia data mining discovers interesting
patterns from multimedia databases that store and manage large collections of multimedia objects,
including image data, video data, audio data, sequence data and hypertext data containing text, text
markups, and linkages. Issues in multimedia data mining include content-based retrieval and similarity
search, generalization and multidimensional analysis. Multimedia data cubes contain additional
dimensions and measures for multimedia information.

The framework that manages different types of multimedia data stored, delivered, and utilized in
different ways is known as a multimedia database management system. There are three classes of
multimedia databases: static, dynamic, and dimensional media. The content of the Multimedia Database
management system is as follows:
• Media data:The actual data representing an object.
• Media format data: Information such as sampling rate, resolution, encoding scheme etc., about the
format of the media data after it goes through the acquisition, processing and encoding phase.
• Media keyword data:Keywords description relating to the generation of data. It is also known as
content descriptive data. Example: date, time and place of recording.
• Media feature data: Content dependent data such as the distribution of colours, kinds of texture and
different shapes present in data.

Types of multimedia applications based on data management characteristics are:
1. Repository applications: A Large amount of multimedia data and meta-data (Media format date,
Media keyword data, Media feature data) that is stored for retrieval purposes, e.g., Repository of
satellite images, engineering drawings, radiology scanned pictures.
2. Presentation applications: They involve delivering multimedia data subject to temporal
constraints. Optimal viewing or listening requires DBMS to deliver data at a certain rate, offering
the quality of service above a certain threshold. Here data is processed as it is delivered. Example:
Annotating of video and audio data, real-time editing analysis.
3. Collaborative work using multimedia information involves executing a complex task by
merging drawings and changing notifications. Example: Intelligent healthcare network.

Challenges with Multimedia Database:
1. Modelling: Working in this area can improve database versus information retrieval techniques; thus,
documents constitute a specialized area and deserve special consideration.
2. Design: The conceptual, logical and physical design of multimedia databases has not yet been
addressed fully as performance and tuning issues at each level are far more complex as they consist of
a variety of formats like JPEG, GIF, PNG, MPEG, which is not easy to convert from one form to
another.
3. Storage: Storage of multimedia database on any standard disk presents the problem of representation,
compression, mapping to device hierarchies, archiving and buffering during input-output operation. In
DBMS, a BLOB (Binary Large Object) facility allows untyped bitmaps to be stored and retrieved.
4. Performance: Physical limitations dominate an application involving video playback or audio-video
synchronization. The use of parallel processing may alleviate some problems, but such techniques are
not yet fully developed. Apart from this, a multimedia database consumes a lot of processing time and
bandwidth.
5. Queries and retrieval: For multimedia data like images, video, and audio accessing data through
query open up many issues like efficient query formulation, query execution and optimization, which
need to be worked upon.

Where is Multimedia Database Applied?
Below are the following areas where a multimedia database is applied, such as:
• Documents and record management: Industries and businesses keep detailed records and various
documents. For example, insurance claim records.
• Knowledge dissemination: Multimedia database is a very effective tool for knowledge dissemination in
terms of providing several resources. For example, electronic books.
• Education and training: Computer-aided learning materials can be designed using multimedia sources
which are nowadays very popular sources of learning. Example: Digital libraries.
• Travelling: Marketing, advertising, retailing, entertainment and travel. For example, a virtual tour of
cities.
• Real-time control and monitoring: With active database technology, multimedia presentation of
information can effectively monitor and control complex tasks. For example, manufacturing operation
control.

Categories of Multimedia Data Mining
Multimedia mining refers to analyzing a large amount of multimedia information to extract patterns based
on their statistical relationships. Multimedia data mining is classified into two broad categories: static and
dynamic media. Static media contains text (digital library, creating SMS and MMS) and images (photos
and medical images). Dynamic media contains Audio (music and MP3 sounds) and Video (movies). The
below image shows the categories of multimedia data mining.

Application of Multimedia Mining

Process of Multimedia Data Mining

Architecture for Multimedia Data Mining

Text Mining in Data Mining
Text mining is a component of data mining that deals specifically with unstructured text data. It involves
the use of natural language processing (NLP) techniques to extract useful information and insights from
large amounts of unstructured text data. Text mining can be used as a preprocessing step for data mining or
as a standalone process for specific tasks.
By using text mining, the unstructured text data can be transformed into structured data that can be used
for data mining tasks such as classification, clustering, and association rule mining. This allows
organizations to gain insights from a wide range of data sources, such as customer feedback, social media
posts, and news articles.
What is the common usage of Text Mining?
Text mining is widely used in various fields, such as natural language processing, information retrieval,
and social media analysis. It has become an essential tool for organizations to extract insights from
unstructured text data and make data-driven decisions.

Conventional Process of Text Mining
• Gathering unstructured information from various sources accessible in various document organizations,
for example, plain text, web pages, PDF records, etc.
• Pre-processing and data cleansing tasks are performed to distinguish and eliminate inconsistency in the
data. The data cleansing process makes sure to capture the genuine text, and it is performed to eliminate
stop words stemming (the process of identifying the root of a certain word and indexing the data.
• Processing and controlling tasks are applied to review and further clean the data set.
• Pattern analysis is implemented in Management Information System.
• Information processed in the above steps is utilized to extract important and applicable data for a
powerful and convenient decision-making process and trend analysis.

Conventional Process of Text Mining

Procedures for Analyzing Text Mining
Text Summarization: To extract its partial content and reflect its whole content automatically.
Text Categorization: To assign a category to the text among categories predefined by users.
Text Clustering: To segment texts into several clusters, depending on the substantial relevance.

Text Mining Techniques
Information Retrieval: In the process of Information retrieval, we try to process the available documents
and the text data into a structured form so, that we can apply different pattern recognition and analytical
processes. It is a process of extracting relevant and associated patterns according to a given set of words or
text documents.
Information Extraction: It is a process of extracting meaningful words from documents.
• Feature Extraction – In this process, we try to develop some new features from existing ones. This
objective can be achieved by parsing an existing feature or combining two or more features based on
some mathematical operation.
• Feature Selection – In this process, we try to reduce the dimensionality of the dataset which is
generally a common issue while dealing with the text data by selecting a subset of features from the
whole dataset.
Natural Language Processing: Natural Language Processing includes tasks that are accomplished by
using Machine Learning and Deep Learning methodologies. It concerns the automatic processing and
analysis of unstructured text information.

Application Area of Text Mining
• Digital Library
• Academic and Research Field
• Life Science
• Social-Media
• Business Intelligence

Issues in Text Mining
Numerous issues happen during the text mining process:
1. The efficiency and effectiveness of decision-making.
2. The uncertain problem can come at an intermediate stage of text mining. In the pre-processing stage,
different rules and guidelines are characterized to normalize the text which makes the text-mining
process efficient. Before applying pattern analysis to the document, there is a need to change over
unstructured data into a moderate structure.
3. Sometimes original message or meaning can be changed due to alteration.
4. Another issue in text mining is many algorithms and techniques support multi-language text. It may
create ambiguity in text meaning. This problem can lead to false-positive results.
5. The utilization of synonyms, polysemy, and antonyms in the document text makes issues for the text
mining tools that take both in a similar setting. It is difficult to categorize such kinds of text/ words.

Advantages of Text Mining
1. Large Amounts of Data: Text mining allows organizations to extract insights from large amounts of
unstructured text data. This can include customer feedback, social media posts, and news articles.
2. Variety of Applications: Text mining has a wide range of applications, including sentiment analysis,
named entity recognition, and topic modeling. This makes it a versatile tool for organizations to gain
insights from unstructured text data.
3. Improved Decision Making: Text mining can be used to extract insights from unstructured text data,
which can be used to make data-driven decisions.
4. Cost-effective: Text mining can be a cost-effective way to extract insights from unstructured text
data, as it eliminates the need for manual data entry.
5. Broader benefits: Cost reductions, productivity increases, the creation of novel new services, and
new business models are just a few of the larger economic advantages mentioned by those consulted.

Disadvantages of Text Mining
Complexity: Text mining can be a complex process that requires advanced skills in natural language
processing and machine learning.
Quality of Data: The quality of text data can vary, which can affect the accuracy of the insights extracted
from text mining.
High Computational Cost: Text mining requires high computational resources, and it may be difficult for
smaller organizations to afford the technology.
Limited to Text Data: Text mining is limited to extracting insights from unstructured text data and cannot
be used with other data types.
Noise in text mining results: Text mining of documents may result in mistakes. It’s possible to find false
links or to miss others. In most situations, if the noise (error rate) is sufficiently low, the benefits of
automation exceed the chance of a larger mistake than that produced by a human reader.
Lack of transparency: Text mining is frequently viewed as a mysterious process where large corpora of
text documents are input and new information is produced. Text mining is in fact opaque when researchers
lack the technical know-how or expertise to comprehend how it operates, or when they lack access to
corpora or text mining tools.

Data Mining- World Wide Web
Over the last few years, the World Wide Web has become a significant source of information and
simultaneously a popular platform for business.
Web mining can define as the method of utilizing data mining techniques and algorithms to extract
useful information directly from the web, such as Web documents and services, hyperlinks, Web
content, and server logs.
The World Wide Web contains a large amount of data that provides a rich source to data mining. The
objective of Web mining is to look for patterns in Web data by collecting and examining data in order to
gain insights.

What is Web Mining?
Web mining can widely be seen as the application of adapted data mining techniques to the web,
whereas data mining is defined as the application of the algorithm to discover patterns on mostly
structured data embedded into a knowledge discovery process.
Web mining has a distinctive property to provide a set of various data types. The web has multiple
aspects that yield different approaches for the mining process, such as web pages consist of text, web
pages are linked via hyperlinks, and user activity can be monitored via web server logs.

1. Web Content Mining:
Web content mining can be used to extract useful data, information, knowledge from the web page
content. In web content mining, each web page is considered as an individual document. The individual
can take advantage of the semi-structured nature of web pages, as HTML provides information that
concerns not only the layout but also logical structure. The primary task of content mining is data
extraction, where structured data is extracted from unstructured websites. The objective is to facilitate
data aggregation over various web sites by using the extracted structured data. Web content mining can
be utilized to distinguish topics on the web. For Example, if any user searches for a specific task on the
search engine, then the user will get a list of suggestions.

2. Web Structured Mining:
The web structure mining can be used to find the link structure of hyperlink. It is used to identify that
data either link the web pages or direct link network. In Web Structure Mining, an individual considers
the web as a directed graph, with the web pages being the vertices that are associated with hyperlinks.
The most important application in this regard is the Google search engine, which estimates the ranking
of its outcomes primarily with the PageRank algorithm. It characterizes a page to be exceptionally
relevant when frequently connected by other highly related pages. Structure and content mining
methodologies are usually combined. For example, web structured mining can be beneficial to
organizations to regulate the network between two commercial sites.

3. Web Usage Mining:
Web usage mining is used to extract useful data, information, knowledge from the weblog records, and
assists in recognizing the user access patterns for web pages. In Mining, the usage of web resources, the
individual is thinking about records of requests of visitors of a website, that are often collected as web
server logs. While the content and structure of the collection of web pages follow the intentions of the
authors of the pages, the individual requests demonstrate how the consumers see these pages. Web
usage mining may disclose relationships that were not proposed by the creator of the pages.

Some of the methods to identify and analyze the web usage patterns are given below:
I. Session and visitor analysis:
The analysis of preprocessed data can be accomplished in session analysis, which incorporates the
guest records, days, time, sessions, etc. This data can be utilized to analyze the visitor's behavior.
The document is created after this analysis, which contains the details of repeatedly visited web pages,
common entry, and exit.
II. OLAP (Online Analytical Processing):
OLAP accomplishes a multidimensional analysis of advanced data.
OLAP can be accomplished on various parts of log related data in a specific period.
OLAP tools can be used to infer important business intelligence metrics

Challenges in Web Mining:
The web pretends incredible challenges for resources, and knowledge discovery based on the following
observations:
The complexity of web pages:
The site pages don't have a unifying structure. They are extremely complicated as compared to
traditional text documents. There are enormous amounts of documents in the digital library of the web.
These libraries are not organized according to a specific order.
The web is a dynamic data source:
The data on the internet is quickly updated. For example, news, climate, shopping, financial news,
sports, and so on.
Diversity of client networks:
The client network on the web is quickly expanding. These clients have different interests,
backgrounds, and usage purposes. There are over a hundred million workstations that are associated
with the internet and still increasing tremendously.

Relevancy of data:
It is considered that a specific person is generally concerned about a small portion of the web, while the rest
of the segment of the web contains the data that is not familiar to the user and may lead to unwanted results.
The web is too broad:
The size of the web is tremendous and rapidly increasing. It appears that the web is too huge for data
warehousing and data mining.
Application of Web Mining:
Web mining has an extensive application because of various uses of the web. The list of some applications of
web mining is given below.
Marketing and conversion tool
Data analysis on website and application accomplishment.
Audience behavior analysis
Advertising and campaign accomplishment analysis.
Testing and analysis of a site.

UNIT - 5: Data Warehousing and Data Mining

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à UNIT - 5: Data Warehousing and Data Mining

Similaire à UNIT - 5: Data Warehousing and Data Mining (20)

Plus de Nandakumar P

Plus de Nandakumar P (17)

Dernier

Dernier (20)

UNIT - 5: Data Warehousing and Data Mining