SlideShare a Scribd company logo
1 of 5
Download to read offline
Data Mining Lecture 3
Define multi-dimensional data
model
From Tables and Spreadsheets to Data Cubes
A data warehouse is based on
 multidimensional data model which views
data in the form of a data cube
A data cube allows data to be modeled and
viewed in multiple dimensions (such as sales)
 Dimension tables, such as item (item_name,
brand, type), or time(day, week, month,
quarter, year)
 Fact table contains measures (such as
Euros_sold) and keys to each of the related
dimension tables
Definitions
 an n-Dimensional base cube is called a base
cuboid
 The top most 0-D cuboid, which holds the
highest-level of summarisation, is called the
apex cuboid
 The lattice of cuboids forms a data cube
Cube: A Lattice of Cuboids
Conceptual Modeling of Data
Warehouses
Modeling data warehouses: dimensions & measures
Star schema
- A fact table in the middle connected to a set
of dimension tables
Snowflake schema
- A refinement of star schema where some
dimensional hierarchy is normalized into a set
of smaller dimension tables, forming a shape
similar to snowflake
Fact constellations
- Multiple fact tables share dimension tables,
viewed as a collection of stars, therefore
called galaxy schema or fact
o constellation
DMQL: Language Primitives
Cube Definition (Fact Table)
 define cube <cube_name> [<dimension_list>]:
<measure_list>
Dimension Definition (Dimension Table)
 define dimension <dimension_name> as
(<attribute_or_subdimension_list>)
Special Case (Shared Dimension Tables)
 First time as “cube definition”
 define dimension <dimension_name> as
<dimension_name_first_time> in cube
<cube_name_first_time>
Defining a Star Schema in
DMQL
define cube sales_star [time, item, branch, location]:
Euros_sold = sum(sales_in_Euros),
avg_sales = avg(sales_in_Euros),
units_sold = count(*)
define dimension time as
(time_key, day, day_of_week, month,
quarter, year)
define dimension item as
(item_key, item_name, brand, type,
supplier_type)
define dimension branch as
(branch_key, branch_name, branch_type)
define dimension location as
(location_key, street, city, county, province,
country)
Defining a Snowflake Schema in
DMQL
define cube sales_snowflake [time, item, branch,
location]:
Euros_sold = sum(sales_in_Euros),
avg_sales = avg(sales_in_Euros),
units_sold = count(*)
define dimension time as
(time_key, day, day_of_week, month,
quarter, year)
define dimension item as
(item_key, item_name, brand, type,
supplier(supplier_key, supplier_type))
define dimension branch as
(branch_key, branch_name, branch_type)
define dimension location as
(location_key, street, city(city_key, county,
province, country))
Defining a Fact Constellation in
DMQL
define cube sales [time, item, branch, location]:
Euros_sold = sum(sales_in_Euros), avg_sales =
avg(sales_in_Euros),
units_sold = count(*)
define dimension time as (time_key, day,
day_of_week, month, quarter, year)
define dimension item as (item_key, item_name,
brand, type, supplier_type)
define dimension branch as (branch_key,
branch_name, branch_type)
define dimension location as (location_key, street,
city, province_or_state,
country)
define cube shipping [time, item, shipper,
from_location, to_location]:
Euro_cost = sum(cost_in_Euros), unit_shipped =
count(*)
define dimension time as time in cube sales
define dimension item as item in cube sales
define dimension shipper as (shipper_key,
shipper_name, location as location
in cube sales, shipper_type)
define dimension from_location as location in cube
sales
define dimension to_location as location in cube sales
Measures: Three Categories
Distributive
 if the result derived by applying the function
to n aggregate values is the same as that
derived by applying the function on all the
data without partitioning.
o E.g., count(), sum(), min(), max()
Algebraic
 if it can be computed by an algebraic function
with M arguments (where M is a bounded
integer), each of which is obtained by
applying a distributive aggregate function.
o E.g., avg(), min_N(),
standard_deviation()
Holistic
 if there is no constant bound on the storage
size needed to describe a sub-aggregate.
o E.g., median(), mode(), rank()
A Concept Hierarchy: Dimension
(location)
Concept hierarchy allows data to be handled at
varying levels of abstractions.
Multidimensional Data
Sales volume as a function of product,
month, and Country
A Sample Data Cube
Cuboids Corresponding to the Cube
Browsing a Data Cube
Typical OLAP Operations
Roll up (drill-up): summarise data
 by climbing up hierarchy or by dimension
reduction
Drill down (roll down): reverse of roll-up
 from higher level summary to lower level
summary or detailed data, or introducing new
dimensions
Slice and dice
 project and select
Pivot (rotate)
 reorient the cube, visualisation, 3D to series
of 2D planes.
Other operations
 drill across: involving (across) more than one
fact table
 drill through: through the bottom level of the
cube to its back-end relational tables (using
SQL)
A Star-Net Query Model
Design of a Data Warehouse: A
Business
Analysis Framework
Four views regarding the design of a data
warehouse
- Top-down view
o allows selection of the relevant
information necessary for the data
warehouse
- Data source view
o exposes the information being
captured, stored, and managed by
operational systems
- Data warehouse view
o consists of fact tables and dimension
tables
- Business query view
o sees the perspectives of data in the
warehouse from the view of enduser
Data Warehouse Design Process
Top-down, bottom-up approaches or a
combination of both
 Top-down: Starts with overall design and
planning (mature)
 Bottom-up: Starts with experiments and
prototypes (rapid)
From software engineering point of view
 Steps: planning, data collection, DW design,
testing and evaluation, DW deployment
 Waterfall: structured and systematic analysis
at each step before proceeding to the next
 Spiral: rapid generation of increasingly
functional systems, short turn around time,
quick turn around
Typical data warehouse design process
 Choose a business process to model, e.g.,
orders, invoices, etc.
 Choose the grain (atomic level of data) of the
business process
 Choose the dimensions that will apply to each
fact table record
 Choose the measure that will populate each
fact table record
Multi-Tiered Architecture
Three Data Warehouse Models
Enterprise warehouse
 collects all of the information about subjects
spanning the entire organisation
Data Mart
 a subset of corporate-wide data that is of
value to a specific group of users. Its scope is
confined to specific, selected groups, such as
marketing data mart
- Independent vs. dependent (directly from
warehouse) data mart
Virtual warehouse
 A set of views over operational databases
 Only some of the possible summary views
may be materialised
Data Warehouse Development:
A Recommended Approach
OLAP Server Architectures
Relational OLAP (ROLAP)
 Use relational or extended-relational DBMS
to store and manage warehouse data and
OLAP middleware to support missing pieces
 Include optimisation of DBMS backend,
implementation of aggregation navigation
logic, and additional tools and services
 greater scalability
Multidimensional OLAP (MOLAP)
 Array-based multidimensional storage engine
(sparse matrix techniques)
 fast indexing to pre-computed summarised
data
Hybrid OLAP (HOLAP)
 User flexibility, e.g., low level: relational,
high-level: array
Specialised SQL servers
 specialised support for SQL queries over
star/snowflake schemas
Home Work 1a
Suppose that a data warehouse for a Big University
consists of the following 4
dimensions: students, module, semester, and lecturer
and 2 measures count
and avg_grade. When at the lowest conceptual level
(e.g., for a given student,
module, semester, and lecturer combination), the
avg_grade measure stores
the actual module grade of the student. At higher
conceptual levels,
avg_grade stores the average grade for the given
combination.
1. Draw a snowflake schema diagram for the data
warehouse.
2. Starting with the base cuboid [student, module,
semester, lecturer], what
specific OLAP operations (e.g., roll-up from semester
to year) should one
perform in order to list the average grade of CS
modules for each Big
University student.
3. If each dimension has 5 levels (including all), such
as “student < major <
status < university < all”, how many cuboids will this
cube contain (including
the base and apex cuboids

More Related Content

What's hot

Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsMotaz Saad
 
Foundations of analytics.ppt
Foundations of analytics.pptFoundations of analytics.ppt
Foundations of analytics.pptSurekha98
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewSivashankar Ganapathy
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional ModelingSunita Sahu
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technologyDataminingTools Inc
 
Data warehouse project on retail store
Data warehouse project on retail storeData warehouse project on retail store
Data warehouse project on retail storeSiddharth Chaudhary
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesSlideTeam
 
Business Analytics and Decision Making
Business Analytics and Decision MakingBusiness Analytics and Decision Making
Business Analytics and Decision MakingExcel Strategies LLC
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olapSalah Amean
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 

What's hot (20)

Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence Tools
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Foundations of analytics.ppt
Foundations of analytics.pptFoundations of analytics.ppt
Foundations of analytics.ppt
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Classification of data
Classification of dataClassification of data
Classification of data
 
Data warehouse project on retail store
Data warehouse project on retail storeData warehouse project on retail store
Data warehouse project on retail store
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation Slides
 
Business Analytics and Decision Making
Business Analytics and Decision MakingBusiness Analytics and Decision Making
Business Analytics and Decision Making
 
Google BigQuery
Google BigQueryGoogle BigQuery
Google BigQuery
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data mining
Data mining Data mining
Data mining
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 

Similar to Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)

Data Warehousing for students educationpptx
Data Warehousing for students educationpptxData Warehousing for students educationpptx
Data Warehousing for students educationpptxjainyshah20
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptSubrata Kumer Paul
 
Data Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationData Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationShivarkarSandip
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodellingmeghu123
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processingVijayasankariS
 
dataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdfdataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdfAnilGupta681764
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data modelmoni sindhu
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
Olap fundamentals
Olap fundamentalsOlap fundamentals
Olap fundamentalsAmit Sharma
 
Data Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.pptData Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.pptMutiaSari53
 

Similar to Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable) (20)

My2dw
My2dwMy2dw
My2dw
 
Data Warehousing for students educationpptx
Data Warehousing for students educationpptxData Warehousing for students educationpptx
Data Warehousing for students educationpptx
 
Data warehouse logical design
Data warehouse logical designData warehouse logical design
Data warehouse logical design
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
 
OLAPCUBE.pptx
OLAPCUBE.pptxOLAPCUBE.pptx
OLAPCUBE.pptx
 
Data Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptxData Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptx
 
Data Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationData Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP Operation
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodelling
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processing
 
dataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdfdataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdf
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Cs1011 dw-dm-1
Cs1011 dw-dm-1Cs1011 dw-dm-1
Cs1011 dw-dm-1
 
Olap fundamentals
Olap fundamentalsOlap fundamentals
Olap fundamentals
 
Data Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.pptData Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.ppt
 
3dw
3dw3dw
3dw
 
Business Intelligence: A Review
Business Intelligence: A ReviewBusiness Intelligence: A Review
Business Intelligence: A Review
 
Data cube
Data cubeData cube
Data cube
 

Recently uploaded

How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 

Recently uploaded (20)

How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)

  • 1. Data Mining Lecture 3 Define multi-dimensional data model From Tables and Spreadsheets to Data Cubes A data warehouse is based on  multidimensional data model which views data in the form of a data cube A data cube allows data to be modeled and viewed in multiple dimensions (such as sales)  Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year)  Fact table contains measures (such as Euros_sold) and keys to each of the related dimension tables Definitions  an n-Dimensional base cube is called a base cuboid  The top most 0-D cuboid, which holds the highest-level of summarisation, is called the apex cuboid  The lattice of cuboids forms a data cube Cube: A Lattice of Cuboids Conceptual Modeling of Data Warehouses Modeling data warehouses: dimensions & measures Star schema - A fact table in the middle connected to a set of dimension tables Snowflake schema - A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake Fact constellations - Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact o constellation
  • 2. DMQL: Language Primitives Cube Definition (Fact Table)  define cube <cube_name> [<dimension_list>]: <measure_list> Dimension Definition (Dimension Table)  define dimension <dimension_name> as (<attribute_or_subdimension_list>) Special Case (Shared Dimension Tables)  First time as “cube definition”  define dimension <dimension_name> as <dimension_name_first_time> in cube <cube_name_first_time> Defining a Star Schema in DMQL define cube sales_star [time, item, branch, location]: Euros_sold = sum(sales_in_Euros), avg_sales = avg(sales_in_Euros), units_sold = count(*) define dimension time as (time_key, day, day_of_week, month, quarter, year) define dimension item as (item_key, item_name, brand, type, supplier_type) define dimension branch as (branch_key, branch_name, branch_type) define dimension location as (location_key, street, city, county, province, country) Defining a Snowflake Schema in DMQL define cube sales_snowflake [time, item, branch, location]: Euros_sold = sum(sales_in_Euros), avg_sales = avg(sales_in_Euros), units_sold = count(*) define dimension time as (time_key, day, day_of_week, month, quarter, year) define dimension item as (item_key, item_name, brand, type, supplier(supplier_key, supplier_type)) define dimension branch as (branch_key, branch_name, branch_type) define dimension location as (location_key, street, city(city_key, county, province, country)) Defining a Fact Constellation in DMQL define cube sales [time, item, branch, location]: Euros_sold = sum(sales_in_Euros), avg_sales = avg(sales_in_Euros), units_sold = count(*) define dimension time as (time_key, day, day_of_week, month, quarter, year) define dimension item as (item_key, item_name, brand, type, supplier_type) define dimension branch as (branch_key, branch_name, branch_type) define dimension location as (location_key, street, city, province_or_state, country) define cube shipping [time, item, shipper, from_location, to_location]: Euro_cost = sum(cost_in_Euros), unit_shipped = count(*) define dimension time as time in cube sales define dimension item as item in cube sales define dimension shipper as (shipper_key, shipper_name, location as location in cube sales, shipper_type) define dimension from_location as location in cube sales define dimension to_location as location in cube sales Measures: Three Categories Distributive  if the result derived by applying the function to n aggregate values is the same as that derived by applying the function on all the data without partitioning. o E.g., count(), sum(), min(), max() Algebraic  if it can be computed by an algebraic function with M arguments (where M is a bounded integer), each of which is obtained by applying a distributive aggregate function. o E.g., avg(), min_N(), standard_deviation() Holistic  if there is no constant bound on the storage size needed to describe a sub-aggregate. o E.g., median(), mode(), rank()
  • 3. A Concept Hierarchy: Dimension (location) Concept hierarchy allows data to be handled at varying levels of abstractions. Multidimensional Data Sales volume as a function of product, month, and Country A Sample Data Cube Cuboids Corresponding to the Cube Browsing a Data Cube
  • 4. Typical OLAP Operations Roll up (drill-up): summarise data  by climbing up hierarchy or by dimension reduction Drill down (roll down): reverse of roll-up  from higher level summary to lower level summary or detailed data, or introducing new dimensions Slice and dice  project and select Pivot (rotate)  reorient the cube, visualisation, 3D to series of 2D planes. Other operations  drill across: involving (across) more than one fact table  drill through: through the bottom level of the cube to its back-end relational tables (using SQL) A Star-Net Query Model Design of a Data Warehouse: A Business Analysis Framework Four views regarding the design of a data warehouse - Top-down view o allows selection of the relevant information necessary for the data warehouse - Data source view o exposes the information being captured, stored, and managed by operational systems - Data warehouse view o consists of fact tables and dimension tables - Business query view o sees the perspectives of data in the warehouse from the view of enduser Data Warehouse Design Process Top-down, bottom-up approaches or a combination of both  Top-down: Starts with overall design and planning (mature)  Bottom-up: Starts with experiments and prototypes (rapid) From software engineering point of view  Steps: planning, data collection, DW design, testing and evaluation, DW deployment  Waterfall: structured and systematic analysis at each step before proceeding to the next  Spiral: rapid generation of increasingly functional systems, short turn around time, quick turn around Typical data warehouse design process  Choose a business process to model, e.g., orders, invoices, etc.  Choose the grain (atomic level of data) of the business process  Choose the dimensions that will apply to each fact table record  Choose the measure that will populate each fact table record Multi-Tiered Architecture
  • 5. Three Data Warehouse Models Enterprise warehouse  collects all of the information about subjects spanning the entire organisation Data Mart  a subset of corporate-wide data that is of value to a specific group of users. Its scope is confined to specific, selected groups, such as marketing data mart - Independent vs. dependent (directly from warehouse) data mart Virtual warehouse  A set of views over operational databases  Only some of the possible summary views may be materialised Data Warehouse Development: A Recommended Approach OLAP Server Architectures Relational OLAP (ROLAP)  Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middleware to support missing pieces  Include optimisation of DBMS backend, implementation of aggregation navigation logic, and additional tools and services  greater scalability Multidimensional OLAP (MOLAP)  Array-based multidimensional storage engine (sparse matrix techniques)  fast indexing to pre-computed summarised data Hybrid OLAP (HOLAP)  User flexibility, e.g., low level: relational, high-level: array Specialised SQL servers  specialised support for SQL queries over star/snowflake schemas Home Work 1a Suppose that a data warehouse for a Big University consists of the following 4 dimensions: students, module, semester, and lecturer and 2 measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, module, semester, and lecturer combination), the avg_grade measure stores the actual module grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. 1. Draw a snowflake schema diagram for the data warehouse. 2. Starting with the base cuboid [student, module, semester, lecturer], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS modules for each Big University student. 3. If each dimension has 5 levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids