SlideShare une entreprise Scribd logo
1  sur  14
KUMARAGURU COLLEGE OF TECHNOLOGY
           COIMBATORE




DATA WAREHOUSING AND DATA MINING

             Presented by


             K.Santhosh (07bcs43)
             E-Mail ID:ksanthoshselvam@gmail.com
             Contact No: 9788153199
             V.Siddharth (07bcs50)
             E-Mail ID:siddharthindian@yahoo.com
             Contact No: 9843286841
DATA WAREHOUSING AND DATA MINING


ABSTRACT:


       Fast, accurate and scalable data analysis techniques are needed to extract useful
information from huge pile of data. Data warehouse is a single, integrated source of
decision support information formed by collecting data from multiple sources, internal to
the organization as well as external, and transforming and summarizing this information
to enable improved decision making. Data warehouse is designed for easy access by users
to large amounts of information, and data access is typically supported by specialized
analytical tools and applications. Typical applications include decision support systems
and execution information system.
      Data mining is the exploration and analysis of large quantities of data in order to
discover valid, novel, potentially useful, and ultimately understandable patterns in data. It
is
“An information extraction activity whose goal is to discover hidden facts contained
in databases”.
        The process of extracting valid, previously unknown, comprehensible and
actionable information from large databases and using it to make crucial business
decisions.
Data mining finds patterns and subtle relationships in data and infers rules that allow the
prediction of future results. A data mining model is a description of a specific aspect of a
dataset. It produces output values for an assigned set of input values. Typical applications
include market segmentation, customer profiling, fraud detection, evaluation of retail
promotions, and credit risk analysis.”
DATA WAREHOUSING AND DATA MINING



Introduction:
Everyday increasingly, organizations are analyzing current and historical data to identify
useful patterns and support business strategies.
A large amount of the right information is the key to survival in today’s competitive
environment. And this kind of information can be made available only if there’s totally
integrated enterprise data warehouse.


What is data warehousing?


A data warehouse is a subject-oriented, integrated, non-volatile & time-variant
collection of data in support of management’s decisions

NEED FOR A DATA WAREHOUSE :
• IT or business staff spending a lot of time developing special reports for decision-
makers.
• Lots of PC-based or small server systems obtaining extracts of data incapable of
presenting a holistic view of the entire gamut of information.
• Same data present on different systems, in different department and users may be
unaware of this fact.
• Difficulty in getting meaningful information in a timely manner.
• Multiple systems giving different answer to the business questions.
• Less analysis by decision makers and policy planners due to non-availability of
sophisticated tools and easily decipherable, timely and comprehensive information
PURPOSE OF A DATA WAREHOUSE :
Better business intelligence for end users.
• Reduction in time to locate, access and analyze information.
• Consolidation of disparate information sources.
• Replacement of older, less-responsive decision support systems
• Faster time to market for products and services
• Strategic advantage over competitors
Data Warehouse Characteristics:
   1.Subject-orientedWH is organized around the major subjects of             the enterprise
   rather than the major application areas. This is reflected in the need to store decision-
   support data rather than application-oriented data.

   2.Integratedbecause the source data come together from different enterprise-wide
   applications systems. The source data is often inconsistent using..The integrated data
   source must be made consistent to present a unified view of the data to the users

   3.Time-variantthe source data in the WH is only accurate and valid at some point in
   time or over some time interval. The time-variance of the data warehouse is also
   shown in the extended time that the data is held, the implicit or explicit association of
   time with all data, and the fact that the data represents a series of snapshots

   4.Non-volatiledata is not update in real time but is refresh from OS on a regular
   basis. New data is always added as a supplement to DB, rather than replacement.
   The DB continually absorbs this new data, incrementally integrating it with previous
   data



DATA WAREHOUSE LIFE CYCLE:
Data warehousing is a concept. It is not a product that can be purchased off the shelf. It is
a set of hardware and software components integrated together which can be used to
analyze the massive amount of data stored in an efficient manner. It is a process through
which one can build a successful data warehouse. Following are the five steps towards
building a successful data warehouse.

   1.JUSTIFICATION

   2.REQUIREMENT ANALYSIS

   3.DESIGN

   4.DEVELOPMENT AND IMPLEMENTATION

   5.DEPLOYMENT



Main Components:
   1Operational data sourcesfor the DW is supplied from mainframe operational data
   held in first generation hierarchical and network databases, departmental data held in
   proprietary file systems, private data held on workstaions and private serves and
   external systems such as the Internet, commercially available DB, or DB assoicated
   with and organization’s suppliers or customers
   2Operational datastore(ODS)is a repository of current and integrated operational
   data used for analysis. It is often structured and supplied with data in the same way as
   the data warehouse, but may in fact simply act as a staging area for data to be moved
   into the warehouse
   3load manageralso called the frontend component, it performance all the operations
   associated with the extraction and loading of data into the warehouse. These
   operations include simple transformations of the data to prepare the data for entry into
   the warehouse
   4warehouse managerperforms all the operations associated with the management of
   the data in the warehouse. The operations performed by this component include
   analysis of data to ensure consistency, transformation and merging of source data,
   creation of indexes and views, generation of denormalizations and aggregations, and
   archiving and backing-up data
5query manageralso called backend component, it performs all the operations
  associated with the management of user queries. The operations performed by this
  component include directing queries to the appropriate tables and scheduling the
  execution of queries
  6detailed, lightly and lightly summarized data,archive/backup data
  7meta-data
  8end-user access toolscan be categorized into five main groups: data reporting and
  query tools, application development tools, executive information system (EIS) tools,
  online analytical processing (OLAP) tools, and data mining tools


Data Flows
  1Inflow- The processes associated with the extraction, cleansing, and loading of the
  data from the source systems into the data warehouse.
  2upflow- The process associated with adding value to the data in the warehouse
  through summarizing, packaging , packaging, and distribution of the data
  3downflow- The processes associated with archiving and backing-up of data in the
  warehouse
  4outflow- The process associated with making the data availabe to the end-users
  5Meta-flow- The processes associated with the management of the meta-data
Tools and Technologies:
  1The critical steps in the construction of a data warehouse:
     a. Extraction
     b. Cleansing
     c. Transformation
  1after the critical steps, loading the results into target system can be carried out either
  by separate products, or by a single, categories:
  2code generators
  3database data replication tools
  4dynamic transformation engines
The importance of managing meta-data(integration):
   1The integration of meta-data, that is ”data about data”
   2Meta-data is used for a variety of purposes and the management of it is a critical
   issue in achieving a fully integrated data warehouse
   3The major purpose of meta-data is to show the pathway back to where the data
   began, so that the warehouse administrators know the history of any item in the
   warehouse
   4The meta-data associated with data transformation and loading must describe the
   source data and any changes that were made to the data
   5The meta-data associated with data management describes the data as it is stored in
   the warehouse
   6The meta-data is required by the query manager to generate appropriate queries, also
   is associated with the user of queries


Data Warehousing Issues
    1Semantic Integration: When getting data from
      multiple sources, must eliminate mismatches,
         e.g., different currencies, DB schemas.
    2Heterogeneous Sources: Must access data from
      a variety of source formats and repositories.
          Replication capabilities can be exploited here.
    3Load, Refresh, Purge: Must load data,
      periodically refresh it, and purge too-old data.
    4Metadata Management: Must keep track of
      source, loading time, and other information for
      all data in the warehouse.
Star Schema:
       A logical structure that has a fact table containing factual data in the center,
surrounded by dimension tables containing reference data (which can be denormalized)
Snowflake Schema:
A variant of the star schema where dimension tables do not contain denormalized
data.
Starflake Schema:
        A hybrid structure that contains a mixture of star and snowflake schemas.




The benefits of data warehousing:
   1The potential benefits of data warehousing are high returns on investment.
   2substantial competitive advantage..
   3Increased productivity of corporate decision-makers..
   4More cost effective decision making
   5Better enterprise intelligence
   6Enhanced customer service
   7Better asset/liability management
   8Business process reengineering
   9Empowerment of all employees
Applications:
On Line Transaction Processing:
   OLTP systems are the major kinds of enterprise applications:
   Examples:
                   Order entry systems, Inventory control systems, Reservation
                   systems, Point-of-sale systems, Tracking systems, etc.


Executive information system (EIS) :
Present information at the highest level of summarization using corporate business
measures. They are designed for extreme ease-of-use and, in many cases, only a mouse is
required. Graphics are usually generously incorporated to provide at-a-glance indications
of performance
Decision Support Systems (DSS) :
They ideally present information in graphical and tabular form, providing the user with
the ability to drill down on selected information. Note the increased detail and data
manipulation options presented.




                                   DATA MINING
What is data mining?
    Data Mining refers to the process of analyzing the data from different perspectives
and summarizing it into useful information. Data mining software is one of the numbers
of tools used for analyzing data. It allows users to analyze from many different
dimensions or angles, categorize it, and summarize the relationship identified.
   1Data Mining is about techniques for finding and describing Structural Patterns in
   data.
Definition:
  Data mining is the process of finding correlation or patterns among fields in large
relational databases.
The process of extracting valid, previously unknown, comprehensible, and actionable
information from large databases and using it to make crucial business decisions.
(Simoudis, 1996)


Different Types of Data Mining:


       1Business Data Mining
       2Scientific Data Mining
       3Internet Data Mining


Five major elements of Data Mining:
1.Extract, transform, and load transaction data on to the data warehouse system.
   2.Store and manage data in multidimensional database system.
   3.Provide access to business analysts and information technology Professionals.
   4.Analyze the data by application software.
   5.Present the data in useful format such as graph or table.




Requirements of Data Mining:
   1Handling of different type of data
   2Efficiency and scalability of algorithm
   3Usefulness, certainty and expressiveness of result
   4Expression of various kinds of mining results
   5Interactive mining knowledge at multiple levels
   6Mining information from different sources of data
   7Protection of privacy and data security


Various kinds of data on which Data Mining is applied :
   1Relational database
   2Data warehouse
   3Transactional database
   4Multimedia database
   5Spatial and temporal data
   6Object-relational database


Data mining applications:
  The Main application for Data Mining is WEB MINING.
What is Web Mining?
                          “Web mining can be broadly defined as the automated discovery
and analysis of useful information from the Web documents and services using data
mining techniques.”
Web mining is the application of data mining or other information process
techniques to WWW, to find useful patterns. People can take advantage of these patterns
to access WWW more efficiently.


NEED FOR WEB MINING:
        Now a day, the World Wide Web is a popular and interactive medium, ideal for
publishing information. It is huge, diverse and dynamic and thus raises issue of
scalability, multimedia and temporal data respectively, due to those situations; the users
are currently “drowning” in an information overload that expands at rate that far outpaces
human ability to process and exploit it.
Domains of Web Mining:
               There are three domains that pertain to Web mining:

              1. Web Contents Mining

              2. Web Structure Mining

              3. Web Usage Mining

1. Web Content Mining
       Web content mining is an automatic process that extracts patterns from on-line
information, such as the HTML files, images, or E-mails, and it already goes beyond only
keyword extraction or some simple statistics of words and phrases in documents. Web
content mining is the "process of information or resource discovery from millions of
sources across the World Wide Web ". There are two approaches in Web content mining:

           1Agent-based approaches

           2Database approaches



 Agent-Based approaches:

       The agent-based approach involves artificial intelligence systems that can "act
autonomously or semi-autonomously on behalf of a particular user, to discover and
organize Web-based information ". Some intelligent Web agents can use a user profile to
search for relevant information, then organize and interpret the discovered information
(e.g., Harvest).
  Database approaches:
        The database approach focuses on "integrating and organizing the heterogeneous
and semi-structured data on the Web into more structured and high-level collections of
resources." These "metadata, are organized into structured collections (e.g., relational or
object-oriented databases) and can be analyzed".

2. Web Structure Mining
        The Data which describes organization of content.Intra-page structure information
includes the arrangement of various HTML or XML tags within a given page. This can
be represented as tree structure, where the <html> tag becomes the root of tree. The
principal kind of inter-page structure information is hyper-links connecting one page to
another.

3. Web Usage Mining
        Web servers record and accumulate data about user interactions whenever
requests for resources are received. Analyzing the Web access logs of different Web sites
can help to understand the user behavior and the Web structure, by improving design of
the colossal collection of resources.

Web Mining Techniques
        The common techniques for Web mining are:

    1Clustering/classification

    2Association rules

    3Path analysis

    4Sequential patterns.

  1. Clustering/classification

        This technique is used to develop profiles of items with similar characteristics.
This ability enhances the discovery of relationships that are otherwise not obvious. Eg:
Classification of Web access logs allows a company to discover the average age of
customers who order a certain product.

 2. Association rules

       Rules that govern "databases of transactions where each transaction consists of a
set of items." This technique is used to predict the correlation of items "where the
presence of one set of items in a transaction implies (with a certain degree of confidence)
the presence of other items."

 3. Path analysis

       A Technique that involves the generation of some form of graph that "represents
relation[s] defined on Web pages." This can be the physical layout of a Web site in which
the Web pages are nodes and the hypertext links between these pages are directed edges.
Eg: what paths do users travel before they go to a particular URL.

 4. Sequential patterns

       Applied to     Web access server transaction logs. The purpose is to discover
sequential patterns that indicate user visit patterns over a certain period.

Web mining as a tool:
                        Web mining can be a promising tool to address ineffective search
engines, which produce incomplete indexing, unverified reliability of retrieved
information. Web mining discovers information from mounds of data on the WWW, but
it also monitors and predicts user visit habits. This gives designers more reliable
information in structuring and designing a Web site. Web mining technology can help
librarians design Web sites with paths that can be traveled easily by end users, saving
time and effort. Eg: Web mining technology and academic librarianship
Conclusion:
   Data Warehousing provides the means to change the raw data into information for
   making effective business decisions-the emphasis on information, not data.The Data
   warehouse is the hub for decision support data.
    Data mining is a useful tool with multiple algorithms that can be tuned for specific
tasks. It can benefit business, medicine, and science. It needs more efficient algorithms to
speed up data mining process.Web mining is a huge, interdisciplinary and vary
dynamic/scientific area, converging from several research communities such as database,
information retrieval and artificial intelligence especially from machine learning and
natural language processing. This area is so broad today partly due to the interests of
various research communities.


References:
   1www.datawarehousingonline.com
   2Data Base Systems-Elmasri, Navathe
   3Data Mining Technologies-Arun K.Pujari
   4Data Mining and Data Warehousing and OLAP-A.Berson, S.J.Smith
   5Database Management System-Sylbardcards

Contenu connexe

Tendances

Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
Phi Jack
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
pcherukumalla
 
Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)
Muhammad Fahad
 

Tendances (20)

Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
Data mining
Data mining Data mining
Data mining
 
Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data mining
 
Application areas of data mining
Application areas of data miningApplication areas of data mining
Application areas of data mining
 
Data mining
Data miningData mining
Data mining
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
 

En vedette

tybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notestybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notes
WE-IT TUTORIALS
 
Introduction to computer graphics
Introduction to computer graphicsIntroduction to computer graphics
Introduction to computer graphics
Amandeep Kaur
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
Saif Ullah
 

En vedette (9)

Data mining
Data miningData mining
Data mining
 
3D Graphics & Rendering in Computer Graphics
3D Graphics & Rendering in Computer Graphics3D Graphics & Rendering in Computer Graphics
3D Graphics & Rendering in Computer Graphics
 
3D Geometric Transformations
3D Geometric Transformations3D Geometric Transformations
3D Geometric Transformations
 
Computer Graphics Notes (B.Tech, KUK, MDU)
Computer Graphics Notes (B.Tech, KUK, MDU)Computer Graphics Notes (B.Tech, KUK, MDU)
Computer Graphics Notes (B.Tech, KUK, MDU)
 
tybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notestybsc it asp.net full unit 1,2,3,4,5,6 notes
tybsc it asp.net full unit 1,2,3,4,5,6 notes
 
Notes 2D-Transformation Unit 2 Computer graphics
Notes 2D-Transformation Unit 2 Computer graphicsNotes 2D-Transformation Unit 2 Computer graphics
Notes 2D-Transformation Unit 2 Computer graphics
 
Introduction to computer graphics
Introduction to computer graphicsIntroduction to computer graphics
Introduction to computer graphics
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 

Similaire à Data Mining

ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptx
ParnalSatle
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
sumit621
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
work
 

Similaire à Data Mining (20)

ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptx
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Unit 5
Unit 5 Unit 5
Unit 5
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and Implementation
 
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
 
Unit 1
Unit 1Unit 1
Unit 1
 
BVRM 402 IMS UNIT V
BVRM 402 IMS UNIT VBVRM 402 IMS UNIT V
BVRM 402 IMS UNIT V
 
BVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptxBVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptx
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Decoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdfDecoding the Role of a Data Engineer.pdf
Decoding the Role of a Data Engineer.pdf
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
Abstract
AbstractAbstract
Abstract
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)
 
9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Advances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyAdvances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing Technology
 
DMDW 1st module.pdf
DMDW 1st module.pdfDMDW 1st module.pdf
DMDW 1st module.pdf
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 

Dernier

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 

Dernier (20)

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 

Data Mining

  • 1. KUMARAGURU COLLEGE OF TECHNOLOGY COIMBATORE DATA WAREHOUSING AND DATA MINING Presented by K.Santhosh (07bcs43) E-Mail ID:ksanthoshselvam@gmail.com Contact No: 9788153199 V.Siddharth (07bcs50) E-Mail ID:siddharthindian@yahoo.com Contact No: 9843286841
  • 2. DATA WAREHOUSING AND DATA MINING ABSTRACT: Fast, accurate and scalable data analysis techniques are needed to extract useful information from huge pile of data. Data warehouse is a single, integrated source of decision support information formed by collecting data from multiple sources, internal to the organization as well as external, and transforming and summarizing this information to enable improved decision making. Data warehouse is designed for easy access by users to large amounts of information, and data access is typically supported by specialized analytical tools and applications. Typical applications include decision support systems and execution information system. Data mining is the exploration and analysis of large quantities of data in order to discover valid, novel, potentially useful, and ultimately understandable patterns in data. It is “An information extraction activity whose goal is to discover hidden facts contained in databases”. The process of extracting valid, previously unknown, comprehensible and actionable information from large databases and using it to make crucial business decisions. Data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. A data mining model is a description of a specific aspect of a dataset. It produces output values for an assigned set of input values. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis.”
  • 3. DATA WAREHOUSING AND DATA MINING Introduction: Everyday increasingly, organizations are analyzing current and historical data to identify useful patterns and support business strategies. A large amount of the right information is the key to survival in today’s competitive environment. And this kind of information can be made available only if there’s totally integrated enterprise data warehouse. What is data warehousing? A data warehouse is a subject-oriented, integrated, non-volatile & time-variant collection of data in support of management’s decisions NEED FOR A DATA WAREHOUSE : • IT or business staff spending a lot of time developing special reports for decision- makers. • Lots of PC-based or small server systems obtaining extracts of data incapable of presenting a holistic view of the entire gamut of information. • Same data present on different systems, in different department and users may be unaware of this fact. • Difficulty in getting meaningful information in a timely manner. • Multiple systems giving different answer to the business questions. • Less analysis by decision makers and policy planners due to non-availability of sophisticated tools and easily decipherable, timely and comprehensive information
  • 4. PURPOSE OF A DATA WAREHOUSE : Better business intelligence for end users. • Reduction in time to locate, access and analyze information. • Consolidation of disparate information sources. • Replacement of older, less-responsive decision support systems • Faster time to market for products and services • Strategic advantage over competitors Data Warehouse Characteristics: 1.Subject-orientedWH is organized around the major subjects of the enterprise rather than the major application areas. This is reflected in the need to store decision- support data rather than application-oriented data. 2.Integratedbecause the source data come together from different enterprise-wide applications systems. The source data is often inconsistent using..The integrated data source must be made consistent to present a unified view of the data to the users 3.Time-variantthe source data in the WH is only accurate and valid at some point in time or over some time interval. The time-variance of the data warehouse is also shown in the extended time that the data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots 4.Non-volatiledata is not update in real time but is refresh from OS on a regular basis. New data is always added as a supplement to DB, rather than replacement. The DB continually absorbs this new data, incrementally integrating it with previous data DATA WAREHOUSE LIFE CYCLE: Data warehousing is a concept. It is not a product that can be purchased off the shelf. It is a set of hardware and software components integrated together which can be used to
  • 5. analyze the massive amount of data stored in an efficient manner. It is a process through which one can build a successful data warehouse. Following are the five steps towards building a successful data warehouse. 1.JUSTIFICATION 2.REQUIREMENT ANALYSIS 3.DESIGN 4.DEVELOPMENT AND IMPLEMENTATION 5.DEPLOYMENT Main Components: 1Operational data sourcesfor the DW is supplied from mainframe operational data held in first generation hierarchical and network databases, departmental data held in proprietary file systems, private data held on workstaions and private serves and external systems such as the Internet, commercially available DB, or DB assoicated with and organization’s suppliers or customers 2Operational datastore(ODS)is a repository of current and integrated operational data used for analysis. It is often structured and supplied with data in the same way as the data warehouse, but may in fact simply act as a staging area for data to be moved into the warehouse 3load manageralso called the frontend component, it performance all the operations associated with the extraction and loading of data into the warehouse. These operations include simple transformations of the data to prepare the data for entry into the warehouse 4warehouse managerperforms all the operations associated with the management of the data in the warehouse. The operations performed by this component include analysis of data to ensure consistency, transformation and merging of source data, creation of indexes and views, generation of denormalizations and aggregations, and archiving and backing-up data
  • 6. 5query manageralso called backend component, it performs all the operations associated with the management of user queries. The operations performed by this component include directing queries to the appropriate tables and scheduling the execution of queries 6detailed, lightly and lightly summarized data,archive/backup data 7meta-data 8end-user access toolscan be categorized into five main groups: data reporting and query tools, application development tools, executive information system (EIS) tools, online analytical processing (OLAP) tools, and data mining tools Data Flows 1Inflow- The processes associated with the extraction, cleansing, and loading of the data from the source systems into the data warehouse. 2upflow- The process associated with adding value to the data in the warehouse through summarizing, packaging , packaging, and distribution of the data 3downflow- The processes associated with archiving and backing-up of data in the warehouse 4outflow- The process associated with making the data availabe to the end-users 5Meta-flow- The processes associated with the management of the meta-data Tools and Technologies: 1The critical steps in the construction of a data warehouse: a. Extraction b. Cleansing c. Transformation 1after the critical steps, loading the results into target system can be carried out either by separate products, or by a single, categories: 2code generators 3database data replication tools 4dynamic transformation engines
  • 7. The importance of managing meta-data(integration): 1The integration of meta-data, that is ”data about data” 2Meta-data is used for a variety of purposes and the management of it is a critical issue in achieving a fully integrated data warehouse 3The major purpose of meta-data is to show the pathway back to where the data began, so that the warehouse administrators know the history of any item in the warehouse 4The meta-data associated with data transformation and loading must describe the source data and any changes that were made to the data 5The meta-data associated with data management describes the data as it is stored in the warehouse 6The meta-data is required by the query manager to generate appropriate queries, also is associated with the user of queries Data Warehousing Issues 1Semantic Integration: When getting data from multiple sources, must eliminate mismatches, e.g., different currencies, DB schemas. 2Heterogeneous Sources: Must access data from a variety of source formats and repositories. Replication capabilities can be exploited here. 3Load, Refresh, Purge: Must load data, periodically refresh it, and purge too-old data. 4Metadata Management: Must keep track of source, loading time, and other information for all data in the warehouse. Star Schema: A logical structure that has a fact table containing factual data in the center, surrounded by dimension tables containing reference data (which can be denormalized) Snowflake Schema:
  • 8. A variant of the star schema where dimension tables do not contain denormalized data. Starflake Schema: A hybrid structure that contains a mixture of star and snowflake schemas. The benefits of data warehousing: 1The potential benefits of data warehousing are high returns on investment. 2substantial competitive advantage.. 3Increased productivity of corporate decision-makers.. 4More cost effective decision making 5Better enterprise intelligence 6Enhanced customer service 7Better asset/liability management 8Business process reengineering 9Empowerment of all employees Applications: On Line Transaction Processing: OLTP systems are the major kinds of enterprise applications: Examples: Order entry systems, Inventory control systems, Reservation systems, Point-of-sale systems, Tracking systems, etc. Executive information system (EIS) : Present information at the highest level of summarization using corporate business measures. They are designed for extreme ease-of-use and, in many cases, only a mouse is required. Graphics are usually generously incorporated to provide at-a-glance indications of performance Decision Support Systems (DSS) :
  • 9. They ideally present information in graphical and tabular form, providing the user with the ability to drill down on selected information. Note the increased detail and data manipulation options presented. DATA MINING What is data mining? Data Mining refers to the process of analyzing the data from different perspectives and summarizing it into useful information. Data mining software is one of the numbers of tools used for analyzing data. It allows users to analyze from many different dimensions or angles, categorize it, and summarize the relationship identified. 1Data Mining is about techniques for finding and describing Structural Patterns in data. Definition: Data mining is the process of finding correlation or patterns among fields in large relational databases. The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions. (Simoudis, 1996) Different Types of Data Mining: 1Business Data Mining 2Scientific Data Mining 3Internet Data Mining Five major elements of Data Mining:
  • 10. 1.Extract, transform, and load transaction data on to the data warehouse system. 2.Store and manage data in multidimensional database system. 3.Provide access to business analysts and information technology Professionals. 4.Analyze the data by application software. 5.Present the data in useful format such as graph or table. Requirements of Data Mining: 1Handling of different type of data 2Efficiency and scalability of algorithm 3Usefulness, certainty and expressiveness of result 4Expression of various kinds of mining results 5Interactive mining knowledge at multiple levels 6Mining information from different sources of data 7Protection of privacy and data security Various kinds of data on which Data Mining is applied : 1Relational database 2Data warehouse 3Transactional database 4Multimedia database 5Spatial and temporal data 6Object-relational database Data mining applications: The Main application for Data Mining is WEB MINING. What is Web Mining? “Web mining can be broadly defined as the automated discovery and analysis of useful information from the Web documents and services using data mining techniques.”
  • 11. Web mining is the application of data mining or other information process techniques to WWW, to find useful patterns. People can take advantage of these patterns to access WWW more efficiently. NEED FOR WEB MINING: Now a day, the World Wide Web is a popular and interactive medium, ideal for publishing information. It is huge, diverse and dynamic and thus raises issue of scalability, multimedia and temporal data respectively, due to those situations; the users are currently “drowning” in an information overload that expands at rate that far outpaces human ability to process and exploit it. Domains of Web Mining: There are three domains that pertain to Web mining: 1. Web Contents Mining 2. Web Structure Mining 3. Web Usage Mining 1. Web Content Mining Web content mining is an automatic process that extracts patterns from on-line information, such as the HTML files, images, or E-mails, and it already goes beyond only keyword extraction or some simple statistics of words and phrases in documents. Web content mining is the "process of information or resource discovery from millions of sources across the World Wide Web ". There are two approaches in Web content mining: 1Agent-based approaches 2Database approaches Agent-Based approaches: The agent-based approach involves artificial intelligence systems that can "act autonomously or semi-autonomously on behalf of a particular user, to discover and organize Web-based information ". Some intelligent Web agents can use a user profile to
  • 12. search for relevant information, then organize and interpret the discovered information (e.g., Harvest). Database approaches: The database approach focuses on "integrating and organizing the heterogeneous and semi-structured data on the Web into more structured and high-level collections of resources." These "metadata, are organized into structured collections (e.g., relational or object-oriented databases) and can be analyzed". 2. Web Structure Mining The Data which describes organization of content.Intra-page structure information includes the arrangement of various HTML or XML tags within a given page. This can be represented as tree structure, where the <html> tag becomes the root of tree. The principal kind of inter-page structure information is hyper-links connecting one page to another. 3. Web Usage Mining Web servers record and accumulate data about user interactions whenever requests for resources are received. Analyzing the Web access logs of different Web sites can help to understand the user behavior and the Web structure, by improving design of the colossal collection of resources. Web Mining Techniques The common techniques for Web mining are: 1Clustering/classification 2Association rules 3Path analysis 4Sequential patterns. 1. Clustering/classification This technique is used to develop profiles of items with similar characteristics. This ability enhances the discovery of relationships that are otherwise not obvious. Eg:
  • 13. Classification of Web access logs allows a company to discover the average age of customers who order a certain product. 2. Association rules Rules that govern "databases of transactions where each transaction consists of a set of items." This technique is used to predict the correlation of items "where the presence of one set of items in a transaction implies (with a certain degree of confidence) the presence of other items." 3. Path analysis A Technique that involves the generation of some form of graph that "represents relation[s] defined on Web pages." This can be the physical layout of a Web site in which the Web pages are nodes and the hypertext links between these pages are directed edges. Eg: what paths do users travel before they go to a particular URL. 4. Sequential patterns Applied to Web access server transaction logs. The purpose is to discover sequential patterns that indicate user visit patterns over a certain period. Web mining as a tool: Web mining can be a promising tool to address ineffective search engines, which produce incomplete indexing, unverified reliability of retrieved information. Web mining discovers information from mounds of data on the WWW, but it also monitors and predicts user visit habits. This gives designers more reliable information in structuring and designing a Web site. Web mining technology can help librarians design Web sites with paths that can be traveled easily by end users, saving time and effort. Eg: Web mining technology and academic librarianship
  • 14. Conclusion: Data Warehousing provides the means to change the raw data into information for making effective business decisions-the emphasis on information, not data.The Data warehouse is the hub for decision support data. Data mining is a useful tool with multiple algorithms that can be tuned for specific tasks. It can benefit business, medicine, and science. It needs more efficient algorithms to speed up data mining process.Web mining is a huge, interdisciplinary and vary dynamic/scientific area, converging from several research communities such as database, information retrieval and artificial intelligence especially from machine learning and natural language processing. This area is so broad today partly due to the interests of various research communities. References: 1www.datawarehousingonline.com 2Data Base Systems-Elmasri, Navathe 3Data Mining Technologies-Arun K.Pujari 4Data Mining and Data Warehousing and OLAP-A.Berson, S.J.Smith 5Database Management System-Sylbardcards