UNIT-V
Mining Object, Spatial, Multimedia, Text, and Web Data: Multidimensional Analysis and Descriptive Mining of Complex Data Objects – Spatial Data Mining – Multimedia Data Mining – Text Mining – Mining the World Wide Web.
The document outlines the key steps involved in image recognition:
1. Capturing an image digitally and representing it as pixels.
2. Conditioning the image by suppressing unwanted variations to highlight important patterns.
3. Labeling important patterns by detecting edges and applying thresholding to identify significant edges.
4. Grouping edges into lines and objects and extracting features like centroids to represent the image logically.
5. Matching objects in the image to stored models to recognize and identify the objects.
Data processing involves several key steps:
1) Data capture and collection
2) Data storage
3) Data conversion to a usable format
4) Data cleaning and validation
5) Data analysis including patterns, relationships, and groupings
There are different types of data processing including scientific, commercial, automatic/manual, batch, and real-time. The type used depends on factors like the data domain and needed speed and accuracy of analysis. Proper data processing is important for obtaining reliable results.
The KDD process involves several steps: data cleaning to remove noise, data integration of multiple sources, data selection of relevant data, data transformation into appropriate forms for mining, applying data mining techniques to extract patterns, evaluating patterns for interestingness, and representing mined knowledge visually. The KDD process aims to discover useful knowledge from various data types including databases, data warehouses, transactional data, time series, sequences, streams, spatial, multimedia, graphs, engineering designs, and web data.
This document discusses various shape features and descriptors that can be used to represent shapes. It covers properties shape features should have like being invariant to translation, rotation, and scale. Common shape descriptors are classified as either contour-based or region-based. Simple geometric features like center of gravity, circularity ratio, and rectangularity are described first. One-dimensional functions for shape representation include the centroid distance function, area function, and chord length function. Other shape features mentioned are basic and differential chain codes, chain code histograms, and shape matrices. The conclusion discusses how shape signatures can be made invariant and robust to noise.
This document provides an introduction to data mining. It defines data mining as extracting useful information from large datasets. Key domains that benefit include market analysis, risk management, and fraud detection. Common data mining techniques are discussed such as association, classification, clustering, prediction, and decision trees. Both open source tools like RapidMiner, WEKA, and R, as well commercial tools like SQL Server, IBM Cognos, and Dundas BI are introduced for performing data mining.
A look back at how the practice of data science has evolved over the years, modern trends, and where it might be headed in the future. Starting from before anyone had the title "data scientist" on their resume, to the dawn of the cloud and big data, and the new tools and companies trying to push the state of the art forward. Finally, some wild speculation on where data science might be headed.
Presentation given to Seattle Data Science Meetup on Friday July 24th 2015.
Organizations need business intelligence systems to provide useful information to users in a timely manner by processing, storing and analyzing patterns in customer, supplier, partner and employee data. There are three main types of business intelligence tools: reporting tools that generate structured reports, data mining tools that find patterns and relationships in data, and knowledge management tools that store and share employee knowledge. Data warehouses and data marts standardize and prepare operational and third party data for analysis to address inconsistent and missing data problems. Reporting and data mining applications then use business intelligence tools to analyze this structured data and deliver insights.
Pattern Recognition is the branch of machine learning a computer science which deals with the regularities and patterns in the data that can further be used to classify and categorize the data with the help of Pattern Recognition System.
“The assignment of a physical object or event to one of several pre-specified categories”-- Duda & Hart
Pattern Recognition System is responsible for generating patterns and similarities among given problem/data space, that can further be used to generate solutions to complex problems effectively and efficiently.
Certain problems that can be solved by humans, can also be made to be solved by machine by using this process.
The document outlines the key steps involved in image recognition:
1. Capturing an image digitally and representing it as pixels.
2. Conditioning the image by suppressing unwanted variations to highlight important patterns.
3. Labeling important patterns by detecting edges and applying thresholding to identify significant edges.
4. Grouping edges into lines and objects and extracting features like centroids to represent the image logically.
5. Matching objects in the image to stored models to recognize and identify the objects.
Data processing involves several key steps:
1) Data capture and collection
2) Data storage
3) Data conversion to a usable format
4) Data cleaning and validation
5) Data analysis including patterns, relationships, and groupings
There are different types of data processing including scientific, commercial, automatic/manual, batch, and real-time. The type used depends on factors like the data domain and needed speed and accuracy of analysis. Proper data processing is important for obtaining reliable results.
The KDD process involves several steps: data cleaning to remove noise, data integration of multiple sources, data selection of relevant data, data transformation into appropriate forms for mining, applying data mining techniques to extract patterns, evaluating patterns for interestingness, and representing mined knowledge visually. The KDD process aims to discover useful knowledge from various data types including databases, data warehouses, transactional data, time series, sequences, streams, spatial, multimedia, graphs, engineering designs, and web data.
This document discusses various shape features and descriptors that can be used to represent shapes. It covers properties shape features should have like being invariant to translation, rotation, and scale. Common shape descriptors are classified as either contour-based or region-based. Simple geometric features like center of gravity, circularity ratio, and rectangularity are described first. One-dimensional functions for shape representation include the centroid distance function, area function, and chord length function. Other shape features mentioned are basic and differential chain codes, chain code histograms, and shape matrices. The conclusion discusses how shape signatures can be made invariant and robust to noise.
This document provides an introduction to data mining. It defines data mining as extracting useful information from large datasets. Key domains that benefit include market analysis, risk management, and fraud detection. Common data mining techniques are discussed such as association, classification, clustering, prediction, and decision trees. Both open source tools like RapidMiner, WEKA, and R, as well commercial tools like SQL Server, IBM Cognos, and Dundas BI are introduced for performing data mining.
A look back at how the practice of data science has evolved over the years, modern trends, and where it might be headed in the future. Starting from before anyone had the title "data scientist" on their resume, to the dawn of the cloud and big data, and the new tools and companies trying to push the state of the art forward. Finally, some wild speculation on where data science might be headed.
Presentation given to Seattle Data Science Meetup on Friday July 24th 2015.
Organizations need business intelligence systems to provide useful information to users in a timely manner by processing, storing and analyzing patterns in customer, supplier, partner and employee data. There are three main types of business intelligence tools: reporting tools that generate structured reports, data mining tools that find patterns and relationships in data, and knowledge management tools that store and share employee knowledge. Data warehouses and data marts standardize and prepare operational and third party data for analysis to address inconsistent and missing data problems. Reporting and data mining applications then use business intelligence tools to analyze this structured data and deliver insights.
Pattern Recognition is the branch of machine learning a computer science which deals with the regularities and patterns in the data that can further be used to classify and categorize the data with the help of Pattern Recognition System.
“The assignment of a physical object or event to one of several pre-specified categories”-- Duda & Hart
Pattern Recognition System is responsible for generating patterns and similarities among given problem/data space, that can further be used to generate solutions to complex problems effectively and efficiently.
Certain problems that can be solved by humans, can also be made to be solved by machine by using this process.
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
The document describes Chapter 6 of the book "Data Mining: Concepts and Techniques" which covers the topics of classification and prediction. It defines classification and prediction and discusses key issues in classification such as data preparation, evaluating methods, and decision tree induction. Decision tree induction creates a tree model by recursively splitting the training data on attributes and their values to make predictions. The chapter also covers other classification methods like Bayesian classification, rule-based classification, and support vector machines. It describes the process of model construction from training data and then using the model to classify new, unlabeled data.
This document defines and describes pattern recognition. It discusses what patterns are, pattern recognition systems, the pattern recognition procedure, approaches to pattern recognition including statistical, syntactic/structural, and neural network approaches. It also covers the components of a pattern recognition system including sensing, segmentation/grouping, feature extraction, classification, and post-processing. Finally, it discusses the design cycle for pattern recognition systems including data collection, feature/model choice, training, evaluation, and computational complexity.
Types of Data compression, Lossy Compression, Lossless compression and many more. How data is compressed etc. A little extensive than CIE O level Syllabus
This document discusses steganography, which is hiding messages within seemingly harmless carriers or covers so that no one apart from the intended recipient knows a message has been sent. It provides examples of steganography in text, images, and audio, as well as methods used for each. These include techniques like least significant bit insertion and temporal sampling rates. The document also covers steganalysis, which aims to detect hidden communications by analyzing changes in the statistical properties of covers.
presentation on recent data mining Techniques ,and future directions of research from the recent research papers made in Pre-master ,in Cairo University under supervision of Dr. Rabie
This document provides an overview of various image enhancement techniques. It begins with an introduction to image enhancement and its objectives. It then outlines and describes several categories of enhancement methods, including spatial-frequency domain methods, point operations, histogram operations, spatial operations, and transform operations. Specific techniques discussed in detail include contrast stretching, clipping, thresholding, median filtering, unsharp masking, and principal component analysis for multispectral images. The document also covers color image enhancement and techniques for pseudocoloring.
This document provides an overview of key concepts in digital image processing, including:
1. It discusses fundamental steps like image acquisition, enhancement, color image processing, and wavelets and multiresolution processing.
2. Image enhancement techniques process images to make them more suitable for specific applications.
3. Color image processing has increased in importance due to more digital images on the internet. Wavelets allow images to be represented at various resolution levels.
What is watermarking?
The watermark describes information that can be used for proofing of ownership or tamper proofing.
Digital Watermarking describes methods and technologies that hide information, for example, a number or text, in digital media, such as images, video.
Why Watermarking
Real-world datasets can tolerate a small amount of error without degrading their usability
Effective means for proofing of ownership
Effective means for tamper proofing
Types of watermarking:
Visible watermarking
Invisible Watermarking
Application:
Source Tracking.
Broadcast Monitoring.
Content protection for audio and video content.
Forensics and piracy deterrence.
Communication of ownership and copyright.
Document and image security.
Copy prevention or control
The requirement of Watermarking:
Imperceptibility
Robustness
Security
Complexity
Verification
This lecture gives various definitions of Data Mining. It also gives why Data Mining is required. Various examples on Classification , Cluster and Association rules are given.
Input devices of a computer are many. some common examples are mouse, keyboard, web cam, etc... But apart from these devices there are devices like OMR , MICR, SMART CARD, Magnetic strip which are also connected to computer for specific functions. These slides will take you through the uses, advantages and mechanism of of input devices.
This document provides an introduction to image segmentation. It discusses how image segmentation partitions an image into meaningful regions based on measurements like greyscale, color, texture, depth, or motion. Segmentation is often an initial step in image understanding and has applications in identifying objects, guiding robots, and video compression. The document describes thresholding and clustering as two common segmentation techniques and provides examples of segmentation based on greyscale, texture, motion, depth, and optical flow. It also discusses region-growing, edge-based, and active contour model approaches to segmentation.
Image registration is a process that aligns pixels in two images to correspond to the same point in a scene. It allows images to be combined or focused in a way that improves information extraction. Some applications of image registration include stereo imaging, remote sensing, comparing images over time, and finding where a template matches an image. Template matching is used to find the best match between a template and image by measuring similarity or mismatch between them. Cross-correlation is commonly used as a similarity measure for template matching.
This document discusses data preprocessing techniques. It defines data preprocessing as transforming raw data into an understandable format. Major tasks in data preprocessing are described as data cleaning, integration, transformation, and reduction. Data cleaning involves handling missing data, noisy data, and inconsistencies. Data integration combines data from multiple sources. Data transformation techniques include smoothing, aggregation, generalization, and normalization. The goal of data reduction is to reduce the volume of data while maintaining analytical results.
The document discusses data mining and its processes. It states that data mining involves extracting useful information and patterns from large amounts of data through processes like data cleaning, integration, transformation, mining, and presentation. This extracted knowledge can then be applied to various domains such as fraud detection, market analysis, and science exploration.
This document discusses data mining and its applications. It begins by defining data mining as the process of discovering patterns in large amounts of data through techniques from artificial intelligence, machine learning, statistics, and databases. The overall goal is to extract useful information from data. It then describes common data mining tasks like classification, clustering, association rule learning, and regression. Examples of data mining applications given include using it in healthcare to improve patient outcomes and reduce costs, and in education to help institutions better understand and support their students.
This document provides an overview of big data analytics and data visualization. It discusses key concepts like data wrangling, exploring patterns, drawing conclusions, and communicating findings. Common techniques are also summarized, including classification, clustering, association rules, and predictive analytics. Specific algorithms like decision trees, k-means clustering, and hierarchical clustering are explained. The CRISP-DM process model and applications of analytics in areas like customer understanding and process optimization are also covered at a high level. Visualization is presented as an important part of the overall analytics process.
The document defines metadata as data about data that provides a summary and roadmap for a data warehouse. It discusses three main types of metadata: business metadata which contains ownership and definition information; technical metadata which includes database structure and attributes; and operational metadata which tracks data currency and lineage. Finally, the document outlines the key roles of metadata as a directory to locate data warehouse content and map data transformations, and notes that correctly defining stored metadata presents a challenge.
This document provides an overview of a computer vision course, including administrative details, topics, and expectations. The instructor is Guodong Guo from UW-Madison. Key topics covered include computer vision fundamentals and applications, publications in top journals and conferences, and course requirements such as homework, exams, and a final project. Meeting times are on Mondays from 5-7:30pm and the instructor's office hours are Tuesdays and Thursdays from 1-2pm.
Image Restoration And Reconstruction
Mean Filters
Order-Statistic Filters
Spatial Filtering: Mean Filters
Adaptive Filters
Adaptive Mean Filters
Adaptive Median Filters
All types of mining and trends indata miningRupal Kharya
This document discusses trends in data mining, including the need for scalable and interactive methods to handle vast amounts of data, tighter integration with database and data warehouse systems, and expanding applications to new domains like biology, software engineering, and web mining. It also outlines ongoing challenges in mining complex data types like text, multimedia, spatial, and streaming data, as well as emerging areas like real-time and distributed data mining.
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
The document describes Chapter 6 of the book "Data Mining: Concepts and Techniques" which covers the topics of classification and prediction. It defines classification and prediction and discusses key issues in classification such as data preparation, evaluating methods, and decision tree induction. Decision tree induction creates a tree model by recursively splitting the training data on attributes and their values to make predictions. The chapter also covers other classification methods like Bayesian classification, rule-based classification, and support vector machines. It describes the process of model construction from training data and then using the model to classify new, unlabeled data.
This document defines and describes pattern recognition. It discusses what patterns are, pattern recognition systems, the pattern recognition procedure, approaches to pattern recognition including statistical, syntactic/structural, and neural network approaches. It also covers the components of a pattern recognition system including sensing, segmentation/grouping, feature extraction, classification, and post-processing. Finally, it discusses the design cycle for pattern recognition systems including data collection, feature/model choice, training, evaluation, and computational complexity.
Types of Data compression, Lossy Compression, Lossless compression and many more. How data is compressed etc. A little extensive than CIE O level Syllabus
This document discusses steganography, which is hiding messages within seemingly harmless carriers or covers so that no one apart from the intended recipient knows a message has been sent. It provides examples of steganography in text, images, and audio, as well as methods used for each. These include techniques like least significant bit insertion and temporal sampling rates. The document also covers steganalysis, which aims to detect hidden communications by analyzing changes in the statistical properties of covers.
presentation on recent data mining Techniques ,and future directions of research from the recent research papers made in Pre-master ,in Cairo University under supervision of Dr. Rabie
This document provides an overview of various image enhancement techniques. It begins with an introduction to image enhancement and its objectives. It then outlines and describes several categories of enhancement methods, including spatial-frequency domain methods, point operations, histogram operations, spatial operations, and transform operations. Specific techniques discussed in detail include contrast stretching, clipping, thresholding, median filtering, unsharp masking, and principal component analysis for multispectral images. The document also covers color image enhancement and techniques for pseudocoloring.
This document provides an overview of key concepts in digital image processing, including:
1. It discusses fundamental steps like image acquisition, enhancement, color image processing, and wavelets and multiresolution processing.
2. Image enhancement techniques process images to make them more suitable for specific applications.
3. Color image processing has increased in importance due to more digital images on the internet. Wavelets allow images to be represented at various resolution levels.
What is watermarking?
The watermark describes information that can be used for proofing of ownership or tamper proofing.
Digital Watermarking describes methods and technologies that hide information, for example, a number or text, in digital media, such as images, video.
Why Watermarking
Real-world datasets can tolerate a small amount of error without degrading their usability
Effective means for proofing of ownership
Effective means for tamper proofing
Types of watermarking:
Visible watermarking
Invisible Watermarking
Application:
Source Tracking.
Broadcast Monitoring.
Content protection for audio and video content.
Forensics and piracy deterrence.
Communication of ownership and copyright.
Document and image security.
Copy prevention or control
The requirement of Watermarking:
Imperceptibility
Robustness
Security
Complexity
Verification
This lecture gives various definitions of Data Mining. It also gives why Data Mining is required. Various examples on Classification , Cluster and Association rules are given.
Input devices of a computer are many. some common examples are mouse, keyboard, web cam, etc... But apart from these devices there are devices like OMR , MICR, SMART CARD, Magnetic strip which are also connected to computer for specific functions. These slides will take you through the uses, advantages and mechanism of of input devices.
This document provides an introduction to image segmentation. It discusses how image segmentation partitions an image into meaningful regions based on measurements like greyscale, color, texture, depth, or motion. Segmentation is often an initial step in image understanding and has applications in identifying objects, guiding robots, and video compression. The document describes thresholding and clustering as two common segmentation techniques and provides examples of segmentation based on greyscale, texture, motion, depth, and optical flow. It also discusses region-growing, edge-based, and active contour model approaches to segmentation.
Image registration is a process that aligns pixels in two images to correspond to the same point in a scene. It allows images to be combined or focused in a way that improves information extraction. Some applications of image registration include stereo imaging, remote sensing, comparing images over time, and finding where a template matches an image. Template matching is used to find the best match between a template and image by measuring similarity or mismatch between them. Cross-correlation is commonly used as a similarity measure for template matching.
This document discusses data preprocessing techniques. It defines data preprocessing as transforming raw data into an understandable format. Major tasks in data preprocessing are described as data cleaning, integration, transformation, and reduction. Data cleaning involves handling missing data, noisy data, and inconsistencies. Data integration combines data from multiple sources. Data transformation techniques include smoothing, aggregation, generalization, and normalization. The goal of data reduction is to reduce the volume of data while maintaining analytical results.
The document discusses data mining and its processes. It states that data mining involves extracting useful information and patterns from large amounts of data through processes like data cleaning, integration, transformation, mining, and presentation. This extracted knowledge can then be applied to various domains such as fraud detection, market analysis, and science exploration.
This document discusses data mining and its applications. It begins by defining data mining as the process of discovering patterns in large amounts of data through techniques from artificial intelligence, machine learning, statistics, and databases. The overall goal is to extract useful information from data. It then describes common data mining tasks like classification, clustering, association rule learning, and regression. Examples of data mining applications given include using it in healthcare to improve patient outcomes and reduce costs, and in education to help institutions better understand and support their students.
This document provides an overview of big data analytics and data visualization. It discusses key concepts like data wrangling, exploring patterns, drawing conclusions, and communicating findings. Common techniques are also summarized, including classification, clustering, association rules, and predictive analytics. Specific algorithms like decision trees, k-means clustering, and hierarchical clustering are explained. The CRISP-DM process model and applications of analytics in areas like customer understanding and process optimization are also covered at a high level. Visualization is presented as an important part of the overall analytics process.
The document defines metadata as data about data that provides a summary and roadmap for a data warehouse. It discusses three main types of metadata: business metadata which contains ownership and definition information; technical metadata which includes database structure and attributes; and operational metadata which tracks data currency and lineage. Finally, the document outlines the key roles of metadata as a directory to locate data warehouse content and map data transformations, and notes that correctly defining stored metadata presents a challenge.
This document provides an overview of a computer vision course, including administrative details, topics, and expectations. The instructor is Guodong Guo from UW-Madison. Key topics covered include computer vision fundamentals and applications, publications in top journals and conferences, and course requirements such as homework, exams, and a final project. Meeting times are on Mondays from 5-7:30pm and the instructor's office hours are Tuesdays and Thursdays from 1-2pm.
Image Restoration And Reconstruction
Mean Filters
Order-Statistic Filters
Spatial Filtering: Mean Filters
Adaptive Filters
Adaptive Mean Filters
Adaptive Median Filters
All types of mining and trends indata miningRupal Kharya
This document discusses trends in data mining, including the need for scalable and interactive methods to handle vast amounts of data, tighter integration with database and data warehouse systems, and expanding applications to new domains like biology, software engineering, and web mining. It also outlines ongoing challenges in mining complex data types like text, multimedia, spatial, and streaming data, as well as emerging areas like real-time and distributed data mining.
Brief description of the 3 mining techniques and we give a brief description of the differences between them and the similarities. Finally we talked about the shared techniques.
This document provides an overview of data mining. It discusses the introduction to data mining, its importance and applications. The key techniques of data mining discussed include classification, prediction, clustering, association and summarization. Examples of data mining applications mentioned are in healthcare, banking/finance, retail and web mining. The document concludes with discussing future trends in data mining involving new algorithms and data types, as well as computing resources like cloud computing.
This document provides an introduction to data mining techniques. It discusses how data mining emerged due to the problem of data explosion and the need to extract knowledge from large datasets. It describes data mining as an interdisciplinary field that involves methods from artificial intelligence, machine learning, statistics, and databases. It also summarizes some common data mining frameworks and processes like KDD, CRISP-DM and SEMMA.
This document provides an overview of data mining including:
- Data mining techniques like classification, prediction, clustering which are used to analyze patterns in data.
- The importance of data mining for applications in fields like banking, retail, and healthcare to discover useful knowledge from large datasets.
- Issues with data mining like security, performance, and methodology challenges as well as future trends like using more advanced algorithms and computing resources to handle diverse and large datasets.
This document outlines the learning objectives and resources for a course on data mining and analytics. The course aims to:
1) Familiarize students with key concepts in data mining like association rule mining and classification algorithms.
2) Teach students to apply techniques like association rule mining, classification, cluster analysis, and outlier analysis.
3) Help students understand the importance of applying data mining concepts across different domains.
The primary textbook listed is "Data Mining: Concepts and Techniques" by Jiawei Han and Micheline Kamber. Topics that will be covered include introduction to data mining, preprocessing, association rules, classification algorithms, cluster analysis, and applications.
1. Data mining involves extracting useful patterns and knowledge from large amounts of data. It can help uncover hidden patterns and relationships to help organizations make better decisions.
2. The document discusses various data mining techniques like classification, clustering, association rule mining and describes how each technique can be applied.
3. It also covers important aspects of data mining like the steps in the knowledge discovery process, different types of databases, visualization techniques, and major issues in data mining.
The document discusses the field of data mining. It begins by defining data mining and describing its branches including classification, clustering, and association rule mining. It then discusses the growth of data in various domains that has created opportunities for data mining applications. The document outlines the history and development of data mining from empirical science to computational science to data science. It provides examples of data mining applications in various domains like healthcare, energy, climate science, and agriculture. Finally, it discusses future directions and challenges for the field of data mining.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document introduces data mining. It defines data mining as the process of extracting useful information from large databases. It discusses technologies used in data mining like statistics and machine learning. It also covers data mining models and tasks such as classification, regression, clustering, and forecasting. Finally, it provides an overview of the data mining process and examples of data mining tools.
The document provides an overview of key concepts in data science and big data including:
1) It defines data science, data scientists, and their roles in extracting insights from structured, semi-structured, and unstructured data.
2) It explains different data types like structured, semi-structured, unstructured and their characteristics from a data analytics perspective.
3) It describes the data value chain involving data acquisition, analysis, curation, storage, and usage to generate value from data.
4) It introduces concepts in big data like the 3V's of volume, velocity and variety, and technologies like Hadoop and its ecosystem that are used for distributed processing of large datasets.
This document provides an introduction to data mining and data warehousing. It defines data mining as the process of extracting knowledge from large amounts of data. The evolution of database technologies is discussed, from early file processing systems to today's data warehousing and data mining capabilities. Key aspects of data mining systems and processes are described, including the typical architecture of a data mining system with components like data sources, data mining engines, and knowledge bases. The document also discusses the types of data that data mining can be applied to, such as relational databases and data warehouses.
Data Mining System and Applications: A Reviewijdpsjournal
In the Information Technology era information plays vital role in every sphere of the human life. It is very important to gather data from different data sources, store and maintain the data, generate information, generate knowledge and disseminate data, information and knowledge to every stakeholder. Due to vast use of computers and electronics devices and tremendous growth in computing power and storage capacity, there is explosive growth in data collection. The storing of the data in data warehouse enables entire enterprise to access a reliable current database. To analyze this vast amount of data and drawing fruitful conclusions and inferences it needs the special tools called data mining tools. This paper gives overview of the data mining systems and some of its applications.
This document provides an overview of key concepts in data science and big data, including:
- Data science involves extracting knowledge and insights from structured, semi-structured, and unstructured data.
- The data value chain describes the process of acquiring data, analyzing it, curating it for storage, and using it.
- Big data is characterized by its volume, velocity, variety, and veracity. Hadoop is an open-source framework that allows distributed processing of large datasets across computer clusters.
A Novel Data mining Technique to Discover Patterns from Huge Text CorpusIJMER
Today, we have far more information than we can handle: from business transactions and scientific
data, to satellite pictures, text reports and military intelligence. Information retrieval is simply not enough
anymore for decision-making. Confronted with huge collections of data, we have now created new needs to
help us make better managerial choices. These needs are automatic summarization of data, extraction of the
"essence" of information stored, and the discovery of patterns in raw data. With this, Data mining with
inventory pattern came into existence and got popularized. Data mining finds these patterns and relationships
using data analysis tools and techniques to build models.
1. The document discusses various advanced data analytics techniques including data mining, online analytical processing (OLAP), pivot tables, power pivot, power view in Excel, and different types of data mining techniques like classification, clustering, regression, association rules, outlier detection, sequential patterns, and prediction.
2. It provides details on each technique including definitions, applications, and examples.
3. The key data analytics techniques covered are data mining, OLAP, pivot tables, power pivot and power view in Excel, and various classification methods for advanced data analysis.
Similaire à UNIT - 5: Data Warehousing and Data Mining (20)
UNIT - 4: Data Warehousing and Data MiningNandakumar P
UNIT-IV
Cluster Analysis: Types of Data in Cluster Analysis – A Categorization of Major Clustering Methods – Partitioning Methods – Hierarchical methods – Density, Based Methods – Grid, Based Methods – Model, Based Clustering Methods – Clustering High, Dimensional Data – Constraint, Based Cluster Analysis – Outlier Analysis.
UNIT 3: Data Warehousing and Data MiningNandakumar P
UNIT-III Classification and Prediction: Issues Regarding Classification and Prediction – Classification by Decision Tree Introduction – Bayesian Classification – Rule Based Classification – Classification by Back propagation – Support Vector Machines – Associative Classification – Lazy Learners – Other Classification Methods – Prediction – Accuracy and Error Measures – Evaluating the Accuracy of a Classifier or Predictor – Ensemble Methods – Model Section.
UNIT 2: Part 2: Data Warehousing and Data MiningNandakumar P
This document provides an overview of data pre-processing techniques used in data mining. It discusses common steps in data pre-processing including data cleaning, integration, transformation, reduction, and discretization. Specific techniques covered include handling missing and noisy data, data normalization, attribute selection, dimensionality reduction, and the Apriori and FP-Growth algorithms for frequent pattern mining. The goals of data pre-processing are to improve data quality, handle inconsistencies, and prepare the data for analysis.
UNIT 2: Part 1: Data Warehousing and Data MiningNandakumar P
This document provides an introduction to data mining and discusses key concepts such as why data is mined from both commercial and scientific viewpoints. It describes some of the largest databases in the world and different data mining tasks like classification, clustering, association rule learning etc. Specific applications of data mining discussed include direct marketing, fraud detection, credit risk assessment, customer churn prediction. The document also introduces concepts of predictive and descriptive data mining, supervised and unsupervised learning.
UNIT - 1 Part 2: Data Warehousing and Data MiningNandakumar P
DBMS Schemas for Decision Support , Star Schema, Snowflake Schema, Fact Constellation Schema, Schema Definition, Data extraction, clean up and transformation tools.
UNIT - 1 : Part 1: Data Warehousing and Data MiningNandakumar P
The document provides an overview of data warehousing and data mining. It discusses how data warehousing transforms data into information to support decision making. It contrasts operational systems optimized for transactions with data warehouses designed for analysis. Data warehouses integrate data from multiple sources and support multidimensional analysis and ad-hoc queries. The document also introduces data mining as a way to extract intelligence from warehouse data.
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONNandakumar P
UNIT-V INTRODUCTION TO NUMPY, PANDAS, MATPLOTLIB
Exploratory Data Analysis (EDA), Data Science life cycle, Descriptive Statistics, Basic tools (plots, graphs and summary statistics) of EDA, Philosophy of EDA. Data Visualization: Scatter plot, bar chart, histogram, boxplot, heat maps, etc.
UNIT - 2 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONNandakumar P
UNIT-II CONTROL STRUCTURES& COLLECTIONS
Control Structures: Boolean expressions, Selection control and Iterative control. Arrays - Creation, Behavior of Arrays, Operations on Arrays, Built-In Methods of Arrays. List –Creation, Behavior of Lists, Operations on Lists, Built-In Methods of Lists. Tuple -Creation, Behavior of Tuples, Operations on Tuples, Built-In Methods of Tuples. Dictionary – Creation, Behavior of Dictionary, Operations on Dictionary, Built-In Methods of Dictionary. Sets – Creation, Behavior of Sets, Operations on Sets, Built-In Methods of Sets, Frozen set.
Problem Solving: A Food Co-op’s Worker Scheduling Simulation.
UNIT-1 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON Nandakumar P
Unit 1 : INTRODUCTION TO PROBLEM SOLVING, EXPRESSION AND DATA TYPES
Fundamentals: what is computer science - Computer Algorithms - Computer Hardware - Computer software - Computational problem solving using the Python programming language - Overview of Python, Environmental Setup, First program in Python, Python I/O Statement. Expressions and Data Types: Literals, Identifiers and Variables, Operators, Expressions. Data types, Numbers, Type Conversion, Random Number.
Problem solving: Restaurant Tab calculation and Age in seconds.
Python tutorial notes for all the beginners. It is covered with core topics in python with example programs. It is useful for all types of students (school, college (lower and higher level)) and also for teachers, lecturers, assistant professors, and professors.
This document summarizes key concepts related to time and clocks in distributed systems. It discusses how physical clocks work, including obtaining accurate time from sources like atomic clocks and synchronizing clocks across distributed systems. It also covers logical clocks and how they are used to order events in a way that preserves causality. Other distributed computing topics summarized include mutual exclusion algorithms, elections, and atomic transactions including concurrency control methods like two-phase locking and optimistic concurrency control.
Unit-4 Professional Ethics in EngineeringNandakumar P
About an engineer's responsibility and rights he/she having nowadays. This PPT will give them a basic approach towards engineer's work towards public needs that develop the society in this updated world.
Unit-3 Professional Ethics in EngineeringNandakumar P
This document discusses safety and risk assessment in engineering. It defines safety and risk, and examines factors that influence risk perception such as voluntarism, control, and information. It also discusses techniques for assessing and reducing risk, including fault tree analysis, failure mode and effects analysis, and scenario analysis. The document concludes with case studies on the Three Mile Island and Chernobyl nuclear accidents and emphasizes the importance of disaster planning, training, and ensuring safe exits in product design.
About Naming Concepts in Distributed systems.
More about its services, its types & the approaches of implementation for Name Space & Name Resolution and Locating Entities Approaches with example diagrams.
This document discusses peer-to-peer systems and middleware for managing distributed resources at a large scale. It describes key characteristics of peer-to-peer systems like nodes contributing equal resources and decentralized operation. Middleware systems like Pastry and Tapestry are overlay networks that route requests to distributed objects across nodes through knowledge at each node. They provide simple APIs and support scalability, load balancing, and dynamic node availability.
This document outlines the topics and structure of an ethics course for engineers. It will cover frameworks for analyzing professional and ethical issues, various views on ethics, and the rights and responsibilities of professionals. The course will be 70% lectures and 30% discussion. Students will be graded based on midterm and final exams (70%) and case study assignments (30%). Key topics will include moral reasoning, codes of ethics, utilitarianism, and virtue ethics. Case studies will explore real-world examples like the Ford Pinto and Bhopal disaster. The goal is for students to develop skills for confronting ethical dilemmas in their professional careers.
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 𝟏)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐄𝐏𝐏 𝐂𝐮𝐫𝐫𝐢𝐜𝐮𝐥𝐮𝐦 𝐢𝐧 𝐭𝐡𝐞 𝐏𝐡𝐢𝐥𝐢𝐩𝐩𝐢𝐧𝐞𝐬:
- Understand the goals and objectives of the Edukasyong Pantahanan at Pangkabuhayan (EPP) curriculum, recognizing its importance in fostering practical life skills and values among students. Students will also be able to identify the key components and subjects covered, such as agriculture, home economics, industrial arts, and information and communication technology.
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐍𝐚𝐭𝐮𝐫𝐞 𝐚𝐧𝐝 𝐒𝐜𝐨𝐩𝐞 𝐨𝐟 𝐚𝐧 𝐄𝐧𝐭𝐫𝐞𝐩𝐫𝐞𝐧𝐞𝐮𝐫:
-Define entrepreneurship, distinguishing it from general business activities by emphasizing its focus on innovation, risk-taking, and value creation. Students will describe the characteristics and traits of successful entrepreneurs, including their roles and responsibilities, and discuss the broader economic and social impacts of entrepreneurial activities on both local and global scales.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
1. DATA WAREHOUSING AND
DATA MINING
UNIT – 5
Prepared by
Mr. P. Nandakumar
Assistant Professor,
Department of IT, SVCET
2. UNIT-V
Mining Object, Spatial, Multimedia, Text, and Web Data:
Multidimensional Analysis and Descriptive Mining of Complex
Data Objects – Spatial Data Mining – Multimedia Data Mining –
Text Mining – Mining the World Wide Web.
3. Mining Object, Spatial, Multimedia,
Text, and Web Data
• In general terms, “Mining” is the process of extraction. In the context of computer science, Data
Mining can be referred to as knowledge mining from data, knowledge extraction, data/pattern
analysis, data archaeology, and data dredging.
• There are other kinds of data like semi-structured or unstructured data which includes spatial data,
multimedia data, text data, web data which require different methodologies for data mining.
• Mining Multimedia Data: Multimedia data objects include image data, video data, audio data,
website hyperlinks, and linkages. Multimedia data mining tries to find out interesting patterns from
multimedia databases.
• This includes the processing of the digital data and performs tasks like image processing, image
classification, video, and audio data mining, and pattern recognition.
4. Mining Object, Spatial, Multimedia,
Text, and Web Data
• Multimedia Data mining is becoming the most interesting research area because most of the social
media platforms like Twitter, Facebook data can be analyzed through this and derive interesting
trends and patterns.
• Mining Web Data: Web mining is essential to discover crucial patterns and knowledge from the
Web. Web content mining analyzes data of several websites which includes the web pages and the
multimedia data such as images in the web pages.
• Web mining is done to understand the content of web pages, unique users of the website, unique
hypertext links, web page relevance and ranking, web page content summaries, time that the users
spent on the particular website, and understand user search patterns. Web mining also finds out the
best search engine and determines the search algorithm used by it. So it helps improve search
efficiency and finds the best search engine for the users.
5. Mining Object, Spatial, Multimedia,
Text, and Web Data
• Mining Text Data: Text mining is the subfield of data mining, machine learning, Natural Language
processing, and statistics. Most of the information in our daily life is stored as text such as news
articles, technical papers, books, email messages, blogs.
• Text Mining helps us to retrieve high-quality information from text such as sentiment analysis,
document summarization, text categorization, text clustering. We apply machine learning models and
NLP techniques to derive useful information from the text.
• This is done by finding out the hidden patterns and trends by means such as statistical pattern
learning and statistical language modeling.
• In order to perform text mining, we need to preprocess the text by applying the techniques of
stemming and lemmatization in order to convert the textual data into data vectors.
6. Mining Object, Spatial, Multimedia,
Text, and Web Data
• Mining Spatiotemporal Data: The data that is related to both space and time is Spatiotemporal
data. Spatiotemporal data mining retrieves interesting patterns and knowledge from spatiotemporal
data.
• Spatiotemporal Data mining helps us to find the value of the lands, the age of the rocks and precious
stones, predict the weather patterns.
• Spatiotemporal data mining has many practical applications like GPS in mobile phones, timers,
Internet-based map services, weather services, satellite, RFID, sensor.
7. Mining Object, Spatial, Multimedia,
Text, and Web Data
• Mining Data Streams: Stream data is the data that can change dynamically and it is noisy,
inconsistent which contain multidimensional features of different data types. So this data is stored in
NoSql database systems.
• NoSQL Database is used to refer a non-SQL or non relational database. It provides a mechanism for
storage and retrieval of data other than tabular relations model used in relational databases. NoSQL
database doesn't use tables for storing data. It is generally used to store big data and real-time web
applications.
• The volume of the stream data is very high and this is the challenge for the effective mining of stream
data. While mining the Data Streams we need to perform the tasks such as clustering, outlier
analysis, and the online detection of rare events in data streams.
8. Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
• Multidimensional analysis is the analysis of dimension objects organized in meaningful hierarchies.
Multidimensional analysis allows users to observe data from various viewpoints. This enables them
to spot trends or exceptions in the data. A hierarchy is an ordered series of related dimensions.
• Multidimensional analysis gives the ability to view data from different viewpoints. This is especially
critical for business.
• In statistics, econometrics and related fields, multidimensional analysis (MDA) is a data analysis
process that groups data into two categories: data dimensions and measurements.
• The multidimensional data model is composed of logical cubes, measures, dimensions, hierarchies,
levels, and attributes. The simplicity of the model is inherent because it defines objects that represent
real-world business entities.
9. Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
• The Complex data types require advanced data mining techniques. Some of the Complex data types
are sequence Data which includes the Time-Series, Symbolic Sequences, and Biological Sequences.
The additional preprocessing steps are needed for data mining of these complex data types.
1. Time-Series Data Mining:
• In time-series data, data is measured as the long series of the numerical or textual data at equal time
intervals per minute, per hour, or per day. Time-series data mining is performed on the data obtained
from the stock markets, scientific data, and medical data. In time series mining it is not possible to
find the data that exactly matches the given query. We employ the similarity search method that finds
the data sequences that are similar to the given query string. In the similarity search method,
subsequence matching is performed to find the subsequences that are similar to a given query string.
10. Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
2. Sequential Pattern Mining in Symbolic Sequences:
• Symbolic sequences are composed of long nominal data sequences, which dynamically change their
behavior over time intervals. Examples of the Symbolic Sequences include online customer
shopping sequences as well as sequences of events of experiments. Mining of Symbolic Sequences is
called Sequential Mining. A sequential pattern is a subsequence that exists more frequently in a set
of sequences. so it finds the most frequent subsequence in a set of sequences to perform the mining.
Many scalable algorithms have been built to find out the frequent subsequence. There are also
algorithms to mine the multidimensional and multilevel sequential patterns.
11. Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
3. Data mining of Biological Sequences:
• Biological sequences are the long sequences of nucleotides and data mining of biological sequences
is required to find the features of the DNA of humans. Biological sequence analysis is the first step of
data mining to compare the alignment of the biological sequences. Two species are similar to each
other only if their nucleotide (DNA, RNA) and protein sequences are close and similar. During the
data mining of Biological Sequences, the degree of similarity between nucleotide sequences is
measured. The degree of similarity obtained by sequence alignment of nucleotides is essential in
determining the homology between two sequences. There can be the situation of alignment of two or
more input biological sequences by identifying similar sequences with long subsequences. The amino
acids also called proteins sequences are also compared and aligned.
12. Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
4. Graph Pattern Mining:
• Graph Pattern Mining can be done by using Apriori-based and pattern growth-based approaches. We
can mine the subgraphs of the graph and the set of closed graphs. A closed graph g is the graph that
doesn’t have a super graph that carries the same support count as g. Graph Pattern Mining is applied
to different types of graphs such as frequent graphs, coherent graphs, and dense graphs. We can also
improve the mining efficiency by applying the user constraints on the graph patterns. Graph patterns
are two types. Homogeneous graphs where nodes or links of the graph are of the same type by having
similar features. In Heterogeneous graph patterns, the nodes and links are of different types.
13. Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
5. Statistical Modeling of Networks:
• A network is a collection of nodes where each node represents the data and the nodes are linked
through edges, representing relationships between data objects. If all the nodes and links connecting
the nodes are of the same type, then the network is homogeneous such as a friend network or a web
page network. If the nodes and links connecting the nodes are of different types, then the network is
heterogeneous such as health-care networks (linking the different parameters such as doctors, nurses,
patients, diseases together in the network). Graph Pattern Mining can be further applied to the
network to derive the knowledge and useful patterns from the network.
14. Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
6. Mining Spatial Data:
• Spatial data is the geo space-related data that is stored in large data repositories. The spatial data is
represented in “vector” format and geo-referenced multimedia format. A spatial database is
constructed from large geographic data warehouses by integrating geographical data of multiple
sources of areas. we can construct spatial data cubes that contain information about the spatial
dimensions and measures. It is possible to perform the OLAP operations on the spatial data for
spatial data analysis. Spatial data mining is performed on spatial data warehouses, spatial databases,
and other geospatial data repositories. Spatial Data mining discovers knowledge about the
geographic areas. The preprocessing of spatial data involves several operations like spatial
clustering, spatial classification, spatial modeling, and outlier detection in spatial data.
15. Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
7. Mining Cyber-Physical System Data:
• Cyber-Physical System Data can be mined by constructing a graph or network of data. A cyber-
physical system (CPS) is a heterogeneous network that consists of a large number of interconnected
nodes that store patients or medical information. The links in the CPS network represent the
relationship between the nodes . cyber-physical systems store dynamic, inconsistent, and
interdependent data that contains spatiotemporal information. Mining cyber-physical data links the
situation as a query to access the data from a large information database and it involves real-time
calculations and analysis to prompt responses from the CPS system. CPS analysis requires rare-event
detection and anomaly analysis in cyber-physical data streams, in cyber-physical networks, and the
processing of Cyber-Physical Data involves the integration of stream data with real-time automated
control processes.
16. Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
8. Mining Multimedia Data:
• Multimedia data objects include image data, video data, audio data, website hyperlinks, and linkages.
Multimedia data mining tries to find out interesting patterns from multimedia databases. This
includes the processing of the digital data and performs tasks like image processing, image
classification, video, and audio data mining, and pattern recognition. Multimedia Data mining is
becoming the most interesting research area because most of the social media platforms like Twitter,
Facebook data can be analyzed through this and derive interesting trends and patterns.
17. Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
9. Mining Web Data:
• Web mining is essential to discover crucial patterns and knowledge from the Web. Web content
mining analyzes data of several websites which includes the web pages and the multimedia data such
as images in the web pages. Web mining is done to understand the content of web pages, unique
users of the website, unique hypertext links, web page relevance and ranking, web page content
summaries, time that the users spent on the particular website, and understand user search patterns.
Web mining also finds out the best search engine and determines the search algorithm used by it. So
it helps improve search efficiency and finds the best search engine for the users.
18. Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
10. Mining Text Data:
• Text mining is the subfield of data mining, machine learning, Natural Language processing, and
statistics. Most of the information in our daily life is stored as text such as news articles, technical
papers, books, email messages, blogs. Text Mining helps us to retrieve high-quality information from
text such as sentiment analysis, document summarization, text categorization, text clustering. We
apply machine learning models and NLP techniques to derive useful information from the text. This
is done by finding out the hidden patterns and trends by means such as statistical pattern learning and
statistical language modeling. In order to perform text mining, we need to preprocess the text by
applying the techniques of stemming and lemmatization in order to convert the textual data into data
vectors.
19. Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
11. Mining Spatiotemporal Data:
• The data that is related to both space and time is Spatiotemporal data. Spatiotemporal data mining
retrieves interesting patterns and knowledge from spatiotemporal data. Spatiotemporal Data mining
helps us to find the value of the lands, the age of the rocks and precious stones, predict the weather
patterns. Spatiotemporal data mining has many practical applications like GPS in mobile phones,
timers, Internet-based map services, weather services, satellite, RFID, sensor.
20. Multidimensional Analysis and Descriptive
Mining of Complex Data Objects
12. Mining Data Streams:
• Stream data is the data that can change dynamically and it is noisy, inconsistent which contain
multidimensional features of different data types. So this data is stored in NoSql database systems.
The volume of the stream data is very high and this is the challenge for the effective mining of stream
data. While mining the Data Streams we need to perform the tasks such as clustering, outlier
analysis, and the online detection of rare events in data streams.
21. Spatial Data Mining
A spatial database saves a huge amount of space-related data, including maps, preprocessed remote
sensing or medical imaging records, and VLSI chip design data. Spatial databases have several features
that distinguish them from relational databases.
They carry topological and/or distance information, usually organized by sophisticated,
multidimensional spatial indexing structures that are accessed by spatial data access methods and often
require spatial reasoning, geometric computation, and spatial knowledge representation techniques.
Spatial data mining refers to the extraction of knowledge, spatial relationships, or other interesting
patterns not explicitly stored in spatial databases. Such mining demands the unification of data mining
with spatial database technologies. It can be used for learning spatial records, discovering spatial
relationships and relationships among spatial and nonspatial records, constructing spatial knowledge
bases, reorganizing spatial databases, and optimizing spatial queries.
22. Spatial Data Mining
It is expected to have broad applications in geographic data systems, marketing, remote sensing, image
database exploration, medical imaging, navigation, traffic control, environmental studies, and many
other areas where spatial data are used.
A central challenge to spatial data mining is the exploration of efficient spatial data mining techniques
because of the large amount of spatial data and the difficulty of spatial data types and spatial access
methods. Statistical spatial data analysis has been a popular approach to analyzing spatial data and
exploring geographic information.
The term geostatistics is often associated with continuous geographic space, whereas the term spatial
statistics is often associated with discrete space. In a statistical model that manages non-spatial records,
one generally considers statistical independence among different areas of data.
23. Spatial Data Mining
There is no such separation among spatially distributed records because, actually spatial objects are
interrelated, or more exactly spatially co-located, in the sense that the closer the two objects are placed,
the more likely they send the same properties. For example, natural resources, climate, temperature, and
economic situations are likely to be similar in geographically closely located regions.
Such a property of close interdependency across nearby space leads to the notion of spatial
autocorrelation. Based on this notion, spatial statistical modeling methods have been developed with
success. Spatial data mining will create spatial statistical analysis methods and extend them for large
amounts of spatial data, with more emphasis on effectiveness, scalability, cooperation with database
and data warehouse systems, enhanced user interaction, and the discovery of new kinds of knowledge.
24. Multimedia Data Mining
Multimedia mining is a subfield of data mining that is used to find interesting information of implicit
knowledge from multimedia databases. Mining in multimedia is referred to as automatic annotation or
annotation mining. Mining multimedia data requires two or more data types, such as text and video or
text video and audio.
Multimedia data mining is an interdisciplinary field that integrates image processing and understanding,
computer vision, data mining, and pattern recognition. Multimedia data mining discovers interesting
patterns from multimedia databases that store and manage large collections of multimedia objects,
including image data, video data, audio data, sequence data and hypertext data containing text, text
markups, and linkages. Issues in multimedia data mining include content-based retrieval and similarity
search, generalization and multidimensional analysis. Multimedia data cubes contain additional
dimensions and measures for multimedia information.
25. Multimedia Data Mining
The framework that manages different types of multimedia data stored, delivered, and utilized in
different ways is known as a multimedia database management system. There are three classes of
multimedia databases: static, dynamic, and dimensional media. The content of the Multimedia Database
management system is as follows:
• Media data:The actual data representing an object.
• Media format data: Information such as sampling rate, resolution, encoding scheme etc., about the
format of the media data after it goes through the acquisition, processing and encoding phase.
• Media keyword data:Keywords description relating to the generation of data. It is also known as
content descriptive data. Example: date, time and place of recording.
• Media feature data: Content dependent data such as the distribution of colours, kinds of texture and
different shapes present in data.
26. Multimedia Data Mining
Types of multimedia applications based on data management characteristics are:
1. Repository applications: A Large amount of multimedia data and meta-data (Media format date,
Media keyword data, Media feature data) that is stored for retrieval purposes, e.g., Repository of
satellite images, engineering drawings, radiology scanned pictures.
2. Presentation applications: They involve delivering multimedia data subject to temporal
constraints. Optimal viewing or listening requires DBMS to deliver data at a certain rate, offering
the quality of service above a certain threshold. Here data is processed as it is delivered. Example:
Annotating of video and audio data, real-time editing analysis.
3. Collaborative work using multimedia information involves executing a complex task by
merging drawings and changing notifications. Example: Intelligent healthcare network.
27. Multimedia Data Mining
Challenges with Multimedia Database:
1. Modelling: Working in this area can improve database versus information retrieval techniques; thus,
documents constitute a specialized area and deserve special consideration.
2. Design: The conceptual, logical and physical design of multimedia databases has not yet been
addressed fully as performance and tuning issues at each level are far more complex as they consist of
a variety of formats like JPEG, GIF, PNG, MPEG, which is not easy to convert from one form to
another.
3. Storage: Storage of multimedia database on any standard disk presents the problem of representation,
compression, mapping to device hierarchies, archiving and buffering during input-output operation. In
DBMS, a BLOB (Binary Large Object) facility allows untyped bitmaps to be stored and retrieved.
4. Performance: Physical limitations dominate an application involving video playback or audio-video
synchronization. The use of parallel processing may alleviate some problems, but such techniques are
not yet fully developed. Apart from this, a multimedia database consumes a lot of processing time and
bandwidth.
5. Queries and retrieval: For multimedia data like images, video, and audio accessing data through
query open up many issues like efficient query formulation, query execution and optimization, which
need to be worked upon.
28. Multimedia Data Mining
Where is Multimedia Database Applied?
Below are the following areas where a multimedia database is applied, such as:
• Documents and record management: Industries and businesses keep detailed records and various
documents. For example, insurance claim records.
• Knowledge dissemination: Multimedia database is a very effective tool for knowledge dissemination in
terms of providing several resources. For example, electronic books.
• Education and training: Computer-aided learning materials can be designed using multimedia sources
which are nowadays very popular sources of learning. Example: Digital libraries.
• Travelling: Marketing, advertising, retailing, entertainment and travel. For example, a virtual tour of
cities.
• Real-time control and monitoring: With active database technology, multimedia presentation of
information can effectively monitor and control complex tasks. For example, manufacturing operation
control.
29. Multimedia Data Mining
Where is Multimedia Database Applied?
Below are the following areas where a multimedia database is applied, such as:
• Documents and record management: Industries and businesses keep detailed records and various
documents. For example, insurance claim records.
• Knowledge dissemination: Multimedia database is a very effective tool for knowledge dissemination in
terms of providing several resources. For example, electronic books.
• Education and training: Computer-aided learning materials can be designed using multimedia sources
which are nowadays very popular sources of learning. Example: Digital libraries.
• Travelling: Marketing, advertising, retailing, entertainment and travel. For example, a virtual tour of
cities.
• Real-time control and monitoring: With active database technology, multimedia presentation of
information can effectively monitor and control complex tasks. For example, manufacturing operation
control.
30. Multimedia Data Mining
Categories of Multimedia Data Mining
Multimedia mining refers to analyzing a large amount of multimedia information to extract patterns based
on their statistical relationships. Multimedia data mining is classified into two broad categories: static and
dynamic media. Static media contains text (digital library, creating SMS and MMS) and images (photos
and medical images). Dynamic media contains Audio (music and MP3 sounds) and Video (movies). The
below image shows the categories of multimedia data mining.
34. Text Mining in Data Mining
Text mining is a component of data mining that deals specifically with unstructured text data. It involves
the use of natural language processing (NLP) techniques to extract useful information and insights from
large amounts of unstructured text data. Text mining can be used as a preprocessing step for data mining or
as a standalone process for specific tasks.
By using text mining, the unstructured text data can be transformed into structured data that can be used
for data mining tasks such as classification, clustering, and association rule mining. This allows
organizations to gain insights from a wide range of data sources, such as customer feedback, social media
posts, and news articles.
What is the common usage of Text Mining?
Text mining is widely used in various fields, such as natural language processing, information retrieval,
and social media analysis. It has become an essential tool for organizations to extract insights from
unstructured text data and make data-driven decisions.
35. Text Mining in Data Mining
Conventional Process of Text Mining
• Gathering unstructured information from various sources accessible in various document organizations,
for example, plain text, web pages, PDF records, etc.
• Pre-processing and data cleansing tasks are performed to distinguish and eliminate inconsistency in the
data. The data cleansing process makes sure to capture the genuine text, and it is performed to eliminate
stop words stemming (the process of identifying the root of a certain word and indexing the data.
• Processing and controlling tasks are applied to review and further clean the data set.
• Pattern analysis is implemented in Management Information System.
• Information processed in the above steps is utilized to extract important and applicable data for a
powerful and convenient decision-making process and trend analysis.
36. Text Mining in Data Mining
Conventional Process of Text Mining
37. Text Mining in Data Mining
Procedures for Analyzing Text Mining
Text Summarization: To extract its partial content and reflect its whole content automatically.
Text Categorization: To assign a category to the text among categories predefined by users.
Text Clustering: To segment texts into several clusters, depending on the substantial relevance.
38. Text Mining in Data Mining
Text Mining Techniques
Information Retrieval: In the process of Information retrieval, we try to process the available documents
and the text data into a structured form so, that we can apply different pattern recognition and analytical
processes. It is a process of extracting relevant and associated patterns according to a given set of words or
text documents.
Information Extraction: It is a process of extracting meaningful words from documents.
• Feature Extraction – In this process, we try to develop some new features from existing ones. This
objective can be achieved by parsing an existing feature or combining two or more features based on
some mathematical operation.
• Feature Selection – In this process, we try to reduce the dimensionality of the dataset which is
generally a common issue while dealing with the text data by selecting a subset of features from the
whole dataset.
Natural Language Processing: Natural Language Processing includes tasks that are accomplished by
using Machine Learning and Deep Learning methodologies. It concerns the automatic processing and
analysis of unstructured text information.
39. Text Mining in Data Mining
Application Area of Text Mining
• Digital Library
• Academic and Research Field
• Life Science
• Social-Media
• Business Intelligence
40. Text Mining in Data Mining
Issues in Text Mining
Numerous issues happen during the text mining process:
1. The efficiency and effectiveness of decision-making.
2. The uncertain problem can come at an intermediate stage of text mining. In the pre-processing stage,
different rules and guidelines are characterized to normalize the text which makes the text-mining
process efficient. Before applying pattern analysis to the document, there is a need to change over
unstructured data into a moderate structure.
3. Sometimes original message or meaning can be changed due to alteration.
4. Another issue in text mining is many algorithms and techniques support multi-language text. It may
create ambiguity in text meaning. This problem can lead to false-positive results.
5. The utilization of synonyms, polysemy, and antonyms in the document text makes issues for the text
mining tools that take both in a similar setting. It is difficult to categorize such kinds of text/ words.
41. Text Mining in Data Mining
Advantages of Text Mining
1. Large Amounts of Data: Text mining allows organizations to extract insights from large amounts of
unstructured text data. This can include customer feedback, social media posts, and news articles.
2. Variety of Applications: Text mining has a wide range of applications, including sentiment analysis,
named entity recognition, and topic modeling. This makes it a versatile tool for organizations to gain
insights from unstructured text data.
3. Improved Decision Making: Text mining can be used to extract insights from unstructured text data,
which can be used to make data-driven decisions.
4. Cost-effective: Text mining can be a cost-effective way to extract insights from unstructured text
data, as it eliminates the need for manual data entry.
5. Broader benefits: Cost reductions, productivity increases, the creation of novel new services, and
new business models are just a few of the larger economic advantages mentioned by those consulted.
42. Text Mining in Data Mining
Disadvantages of Text Mining
Complexity: Text mining can be a complex process that requires advanced skills in natural language
processing and machine learning.
Quality of Data: The quality of text data can vary, which can affect the accuracy of the insights extracted
from text mining.
High Computational Cost: Text mining requires high computational resources, and it may be difficult for
smaller organizations to afford the technology.
Limited to Text Data: Text mining is limited to extracting insights from unstructured text data and cannot
be used with other data types.
Noise in text mining results: Text mining of documents may result in mistakes. It’s possible to find false
links or to miss others. In most situations, if the noise (error rate) is sufficiently low, the benefits of
automation exceed the chance of a larger mistake than that produced by a human reader.
Lack of transparency: Text mining is frequently viewed as a mysterious process where large corpora of
text documents are input and new information is produced. Text mining is in fact opaque when researchers
lack the technical know-how or expertise to comprehend how it operates, or when they lack access to
corpora or text mining tools.
43. Data Mining- World Wide Web
Over the last few years, the World Wide Web has become a significant source of information and
simultaneously a popular platform for business.
Web mining can define as the method of utilizing data mining techniques and algorithms to extract
useful information directly from the web, such as Web documents and services, hyperlinks, Web
content, and server logs.
The World Wide Web contains a large amount of data that provides a rich source to data mining. The
objective of Web mining is to look for patterns in Web data by collecting and examining data in order to
gain insights.
44. Data Mining- World Wide Web
What is Web Mining?
Web mining can widely be seen as the application of adapted data mining techniques to the web,
whereas data mining is defined as the application of the algorithm to discover patterns on mostly
structured data embedded into a knowledge discovery process.
Web mining has a distinctive property to provide a set of various data types. The web has multiple
aspects that yield different approaches for the mining process, such as web pages consist of text, web
pages are linked via hyperlinks, and user activity can be monitored via web server logs.
46. Data Mining- World Wide Web
1. Web Content Mining:
Web content mining can be used to extract useful data, information, knowledge from the web page
content. In web content mining, each web page is considered as an individual document. The individual
can take advantage of the semi-structured nature of web pages, as HTML provides information that
concerns not only the layout but also logical structure. The primary task of content mining is data
extraction, where structured data is extracted from unstructured websites. The objective is to facilitate
data aggregation over various web sites by using the extracted structured data. Web content mining can
be utilized to distinguish topics on the web. For Example, if any user searches for a specific task on the
search engine, then the user will get a list of suggestions.
47. Data Mining- World Wide Web
2. Web Structured Mining:
The web structure mining can be used to find the link structure of hyperlink. It is used to identify that
data either link the web pages or direct link network. In Web Structure Mining, an individual considers
the web as a directed graph, with the web pages being the vertices that are associated with hyperlinks.
The most important application in this regard is the Google search engine, which estimates the ranking
of its outcomes primarily with the PageRank algorithm. It characterizes a page to be exceptionally
relevant when frequently connected by other highly related pages. Structure and content mining
methodologies are usually combined. For example, web structured mining can be beneficial to
organizations to regulate the network between two commercial sites.
48. Data Mining- World Wide Web
3. Web Usage Mining:
Web usage mining is used to extract useful data, information, knowledge from the weblog records, and
assists in recognizing the user access patterns for web pages. In Mining, the usage of web resources, the
individual is thinking about records of requests of visitors of a website, that are often collected as web
server logs. While the content and structure of the collection of web pages follow the intentions of the
authors of the pages, the individual requests demonstrate how the consumers see these pages. Web
usage mining may disclose relationships that were not proposed by the creator of the pages.
49. Data Mining- World Wide Web
Some of the methods to identify and analyze the web usage patterns are given below:
I. Session and visitor analysis:
The analysis of preprocessed data can be accomplished in session analysis, which incorporates the
guest records, days, time, sessions, etc. This data can be utilized to analyze the visitor's behavior.
The document is created after this analysis, which contains the details of repeatedly visited web pages,
common entry, and exit.
II. OLAP (Online Analytical Processing):
OLAP accomplishes a multidimensional analysis of advanced data.
OLAP can be accomplished on various parts of log related data in a specific period.
OLAP tools can be used to infer important business intelligence metrics
50. Data Mining- World Wide Web
Challenges in Web Mining:
The web pretends incredible challenges for resources, and knowledge discovery based on the following
observations:
The complexity of web pages:
The site pages don't have a unifying structure. They are extremely complicated as compared to
traditional text documents. There are enormous amounts of documents in the digital library of the web.
These libraries are not organized according to a specific order.
The web is a dynamic data source:
The data on the internet is quickly updated. For example, news, climate, shopping, financial news,
sports, and so on.
Diversity of client networks:
The client network on the web is quickly expanding. These clients have different interests,
backgrounds, and usage purposes. There are over a hundred million workstations that are associated
with the internet and still increasing tremendously.
51. Data Mining- World Wide Web
Relevancy of data:
It is considered that a specific person is generally concerned about a small portion of the web, while the rest
of the segment of the web contains the data that is not familiar to the user and may lead to unwanted results.
The web is too broad:
The size of the web is tremendous and rapidly increasing. It appears that the web is too huge for data
warehousing and data mining.
Application of Web Mining:
Web mining has an extensive application because of various uses of the web. The list of some applications of
web mining is given below.
Marketing and conversion tool
Data analysis on website and application accomplishment.
Audience behavior analysis
Advertising and campaign accomplishment analysis.
Testing and analysis of a site.