SlideShare une entreprise Scribd logo
1  sur  43
Department of Software and
Computing Systems

A Comprehensive Method for Data
Warehouse Design
Sergio Luján-Mora, Juan Trujillo
(sergio.lujan@ua.es / @sergiolujanmora)
Published in:
5th International Workshop on Design and Management of
Data Warehouses (DMDW'03), p. 1.1-1.14, Berlin
(Germany), September 8 2003.
Download:
http://gplsi.dlsi.ua.es/almacenes/ver.php?pdf=48
Department of Software and
Computing Systems

A Comprehensive Method for
Data Warehouse Design
Sergio Luján-Mora
Juan Trujillo

DMDW 2003
A Comprehensive Method for Data Warehouse Design

Contents
• Motivation
•
•
•
•
•

UML extension mechanisms
DW modeling schemas
Applying modeling schemas
Conclusions
Future Work
A Comprehensive Method for Data Warehouse Design

Motivation
• Data warehouses are complex
information systems
• Support:
– OLAP
– Data mining
– Decision Support Systems
–…

• Building a DW: time consuming,
expensive and prone to fail
A Comprehensive Method for Data Warehouse Design

Motivation
• Partial approaches:
– ETL processes
– Logical and conceptual design of the DW
based on the multidimensional paradigm
– Derive DW schema from ER schemas of
the data sources
–…

• DW methods, but not a general model
for the different phases
A Comprehensive Method for Data Warehouse Design

Motivation
• Goal: A Comprehensive Method for Data
Warehouse Design
• Principles that drive our approach:
– Standard modeling notation  UML
– Comprehensive  Include main phases of DW
design
– Powerful but easy to understand  Different
levels of detail for different users (technical and
final users)
– Method  Starting point, not a rigid template
A Comprehensive Method for Data Warehouse Design

Contents
• Motivation

• UML extension mechanisms
•
•
•
•

DW modeling schemas
Applying modeling schemas
Conclusions
Future Work
A Comprehensive Method for Data Warehouse Design

UML extension mechanisms
• UML is a general purpose visual
modeling language for systems
• Extension mechanisms allow the user
to tailor it to specific domains
• Mechanisms:
– Stereotypes  New building elements
– Tagged values  New properties
– Constraints  New semantics
A Comprehensive Method for Data Warehouse Design

UML extension mechanisms

Icon

Decoration

Label

None
A Comprehensive Method for Data Warehouse Design

Contents
• Motivation
• UML extension mechanisms

• DW modeling schemas
• Applying modeling schemas
• Conclusions
• Future Work
A Comprehensive Method for Data Warehouse Design

Data Warehouse
Conceptual Schema
(DWCS)
***
***
*** ***

l edo M

ETL
Process

***
***
*** ***

***
***
*** ***

Analyze

***
***
*** ***

Exportation
Process

Business
Model

Operational Data
Schema

Data Warehouse
Storage Schema

(ODS)

(DWSS)

Diagrams
(windows or views into the model)

(BM)
A Comprehensive Method for Data Warehouse Design

General diagram (level 0)
<<ODS>>, <<DWCS>>, <<DWSS>>, <<BM>>, <<ETL>>, <<Exportation>>

<<BM>>
Manager

<<BM>>
Accounting

<<DWCS>>
Data warehouse

<<ODS>>
Sales data

<<DWSS>>
Informix
Metacube
<<ODS>>
Production data

<<ODS>>
Syndicated
data

<<ETL>>
Transformations

<<Exportation>>
Mappings
<<DWSS>>
Cognos
PowerPlay
A Comprehensive Method for Data Warehouse Design

Data Warehouse
Conceptual Schema
(DWCS)
***
***
*** ***

ETL
Process

***
***
*** ***

***
***
*** ***

Analyze

***
***
*** ***

Exportation
Process

Business
Model

Operational Data
Schema

Data Warehouse
Storage Schema

(ODS)

(DWSS)

(BM)
A Comprehensive Method for Data Warehouse Design

ODS
• Operational Data Schema
• Represents:
– Transaction processing systems (OLTP)
– External sources (census data, economic
data, competitors’ data, etc.)

• Not exists a UML extension for
modeling different types of data sources
A Comprehensive Method for Data Warehouse Design

ODS
• RDBMS  Rational’s UML Profile for
Database Design: <<Database>>,
<<Schema>>, <<Table>>, …
• ORDBMS  Marcos et al. UML Profile for
Object-Relational Database Design:
<<array>>, <<row>>, <<ref>>, …
• XML  Rational’s XML-DTD UML Profile:
<<DTDElement>>, <<DTDElementEmpty>>,
<<DTDEntity>>,
• …
A Comprehensive Method for Data Warehouse Design

<<ODS>>
Sales data
0..n

0..n

1

1..n

1
<<ODS>>
Production data

Salesmen
1
0..n

<<ODS>>
Syndicated
data

Cities
1

1

1

1..n

Counties

Groups
0..n

0..n

Discount policies

0..n

0..n
1
Families

0..n

1

Products
0..n 0..n
1
1

Packages

0..n
Invoices
1

Storage conditions

0..n
Lines

States

0..n
0..n

1

1

1
Customers

0..n
Agents
0..n
1

Categories

1
A Comprehensive Method for Data Warehouse Design

Data Warehouse
Conceptual Schema
(DWCS)
***
***
*** ***

ETL
Process

***
***
*** ***

***
***
*** ***

Analyze

***
***
*** ***

Exportation
Process

Business
Model

Operational Data
Schema

Data Warehouse
Storage Schema

(ODS)

(DWSS)

(BM)
A Comprehensive Method for Data Warehouse Design

DWCS
• Data Warehouse Conceptual Schema
• UML Profile for Multidimensional Modeling
• Basic components:
– Facts: the transactions or values being analyzed
– Dimensions: descriptive information about the
facts

• Properties:
–
–
–
–

Shared dimensions
Heterogeneous dimensions
Degenerate facts and dimensions
Multiple and alternative path classification
hierarchies
–…
A Comprehensive Method for Data Warehouse Design

DWCS

Level 1

Level 2

Level 3

Model

Star schema

Dimension/fact

definition

definition

definition
A Comprehensive Method for Data Warehouse Design

DWCS
Package stereotypes

Class stereotypes

StarPackage
(Level 1)

Fact
(Level 3)

FactPackage
(Level 2)

Dimension
(Level 3)

DimensionPackage
(Level 2)

Base
(Level 3)
A Comprehensive Method for Data Warehouse Design

Model definition (level 1)
<<StarPackage>>

Production schema

Sales schema

Salesmen schema
A Comprehensive Method for Data Warehouse Design

Star schema definition (level 2)
<<FactPackage>>, <<DimensionPackage>>

Production schema

Sales schema

Salesmen schema

Stores dimension

Times dimension

Sales fact

Products dimension

Customers dimension
A Comprehensive Method for Data Warehouse Design

Dimension/fact definition (level 3)
<<Fact>>, <<Dimension>>, <<Base>>
Customers dim
1

Production schema

Sales schema

1

Salesmen schema

Customers
+child
Stores dimension

Times dimension

+parent

0..n

0..n
+child

1

Sales fact

Products dimension

Customers dimension

ZIPs
+child 0..n
+parent

1
+parent

+child
Cities

0..n

+parent

1

1
States
A Comprehensive Method for Data Warehouse Design

Data Warehouse
Conceptual Schema
(DWCS)
***
***
*** ***

ETL
Process

***
***
*** ***

***
***
*** ***

Analyze

***
***
*** ***

Exportation
Process

Business
Model

Operational Data
Schema

Data Warehouse
Storage Schema

(ODS)

(DWSS)

(BM)
A Comprehensive Method for Data Warehouse Design

DWSS
• Data Warehouse Storage Schema
• Depending on the implementation
(RDMS, ORDBMS, MD, …)  Similar
to the ODS
• Two possibilities: manual or automatic
A Comprehensive Method for Data Warehouse Design

Data Warehouse
Conceptual Schema
(DWCS)
***
***
*** ***

ETL
Process

***
***
*** ***

***
***
*** ***

Analyze

***
***
*** ***

Exportation
Process

Business
Model

Operational Data
Schema

Data Warehouse
Storage Schema

(ODS)

(DWSS)

(BM)
A Comprehensive Method for Data Warehouse Design

BM
• Business Model
• Adapt the DW to final users:
– Easier to understand
– Security concerns
–…

• UML importing mechanism  Different
submodels of DWCS
A Comprehensive Method for Data Warehouse Design

<<DWCS>>
Data warehouse

Production schema

Sales schema

<<BM>>
Accounting

Salesmen schema

Sales schema
(from Data warehouse)

Importing
A Comprehensive Method for Data Warehouse Design

Data Warehouse
Conceptual Schema
(DWCS)
***
***
*** ***

ETL
Process

***
***
*** ***

***
***
*** ***

Analyze

***
***
*** ***

Exportation
Process

Business
Model

Operational Data
Schema

Data Warehouse
Storage Schema

(ODS)

(DWSS)

(BM)
A Comprehensive Method for Data Warehouse Design

ETL Process
•
•
•
•

Extraction-Transformation-Loading
Mapping between ODS and DWCS
UML Profile for Modeling ETL Processes
Common mechanisms:
–
–
–
–

Integration different data sources
Transformati
Generation of surrogate keys
…
A Comprehensive Method for Data Warehouse Design

ETL Process
Aggregation

Loader

Conversion

Log

Filter

Merge

Incorrect

Surrogate

Join

Wrapper
A Comprehensive Method for Data Warehouse Design

LeftJoin(Storage = IdStorage)
Name = Products.Name
StName = [Storage conditions].Name
StDescription = [Storage conditions].Description
Storage conditions
(from Sales data)

- IdStorage
- Name
- Description
Products dim
1

(from Products dimension)

0..n

Products
(from Sales data)

- IdProduct
- Name
- Price
- Family
- Storage

NewClass2
- IdProduct
- Name
- Price
- Family
- StName
- StDescription

ProdEuro

ProdLoader

ProdDescription
(from Products dimension)

Price = DollarToEuro(Price)
A Comprehensive Method for Data Warehouse Design

Data Warehouse
Conceptual Schema
(DWCS)
***
***
*** ***

ETL
Process

***
***
*** ***

***
***
*** ***

Analyze

***
***
*** ***

Exportation
Process

Business
Model

Operational Data
Schema

Data Warehouse
Storage Schema

(ODS)

(DWSS)

(BM)
A Comprehensive Method for Data Warehouse Design

Exportation Process
• Mapping between DWCS and DWSS
• Two possibilities: manual or automatic
A Comprehensive Method for Data Warehouse Design

Contents
• Motivation
• UML extension mechanisms
• DW modeling schemas

• Applying modeling schemas
• Conclusions
• Future Work
A Comprehensive Method for Data Warehouse Design
A Comprehensive Method for Data Warehouse Design
A Comprehensive Method for Data Warehouse Design
A Comprehensive Method for Data Warehouse Design

Contents
•
•
•
•

Motivation
UML extension mechanisms
DW modeling schemas
Applying modeling schemas

• Conclusions
• Future Work
A Comprehensive Method for Data Warehouse Design

Conclusions
• Global DW design method
• Best advantages:
– Same standard notation (UML)
– Integration of different design phases in a
single and coherent framework
– Scale up to handle huge and complex DWs

• CASE tool support with Rational Rose
 Add-in
A Comprehensive Method for Data Warehouse Design

Contents
•
•
•
•
•

Motivation
UML extension mechanisms
DW modeling schemas
Applying modeling schemas
Conclusions

• Future Work
A Comprehensive Method for Data Warehouse Design

Future work
• Data mapping at attribute level
• Diagramming and style guidelines for
creating better diagrams
• More stages of the DW life cycle (e.g.,
refresh processes)
A Comprehensive Method for Data Warehouse Design

Department of Software and
Computing Systems

A Comprehensive Method for
Data Warehouse Design
Sergio Luján-Mora
Juan Trujillo

Contenu connexe

En vedette

Metodologia de una tesis1
Metodologia de una tesis1Metodologia de una tesis1
Metodologia de una tesis1emelec2014
 
Responsabilidad de los Directores de Sistemas
Responsabilidad de los Directores de SistemasResponsabilidad de los Directores de Sistemas
Responsabilidad de los Directores de SistemasXavier Ribas
 
9197757 los-sniffers
9197757 los-sniffers9197757 los-sniffers
9197757 los-sniffers1 2d
 
Cookies
CookiesCookies
Cookies1 2d
 
Forrester’s study: Discover How Marketing Analytics Increases Business Perfor...
Forrester’s study: Discover How Marketing Analytics Increases Business Perfor...Forrester’s study: Discover How Marketing Analytics Increases Business Perfor...
Forrester’s study: Discover How Marketing Analytics Increases Business Perfor...Nicolas Valenzuela
 
Procedimiento de notificacion de infracciones a ISP
Procedimiento de notificacion de infracciones a ISPProcedimiento de notificacion de infracciones a ISP
Procedimiento de notificacion de infracciones a ISPXavier Ribas
 
Las redes sociales jose luis de la mata
Las redes sociales jose luis de la mataLas redes sociales jose luis de la mata
Las redes sociales jose luis de la mataConfesorAD
 
#Mgghub Cookies y privacidad ¿Cumple tu web con la ley?
#Mgghub Cookies y privacidad ¿Cumple tu web con la ley?#Mgghub Cookies y privacidad ¿Cumple tu web con la ley?
#Mgghub Cookies y privacidad ¿Cumple tu web con la ley?madridgeekgirls
 
Cookies y otras tecnologías de monitorización en internet
Cookies y otras tecnologías de monitorización en internetCookies y otras tecnologías de monitorización en internet
Cookies y otras tecnologías de monitorización en internetAlejandro Ramos
 

En vedette (14)

PHP: Sesiones
PHP: SesionesPHP: Sesiones
PHP: Sesiones
 
Web Analytics | Clase 1/4
Web Analytics | Clase 1/4Web Analytics | Clase 1/4
Web Analytics | Clase 1/4
 
Metodologia de una tesis1
Metodologia de una tesis1Metodologia de una tesis1
Metodologia de una tesis1
 
Responsabilidad de los Directores de Sistemas
Responsabilidad de los Directores de SistemasResponsabilidad de los Directores de Sistemas
Responsabilidad de los Directores de Sistemas
 
9197757 los-sniffers
9197757 los-sniffers9197757 los-sniffers
9197757 los-sniffers
 
Cookies: Uso en JavaScript
Cookies: Uso en JavaScriptCookies: Uso en JavaScript
Cookies: Uso en JavaScript
 
Cookies
CookiesCookies
Cookies
 
Forrester’s study: Discover How Marketing Analytics Increases Business Perfor...
Forrester’s study: Discover How Marketing Analytics Increases Business Perfor...Forrester’s study: Discover How Marketing Analytics Increases Business Perfor...
Forrester’s study: Discover How Marketing Analytics Increases Business Perfor...
 
Cookies: ¿Cómo funcionan?
Cookies: ¿Cómo funcionan?Cookies: ¿Cómo funcionan?
Cookies: ¿Cómo funcionan?
 
Procedimiento de notificacion de infracciones a ISP
Procedimiento de notificacion de infracciones a ISPProcedimiento de notificacion de infracciones a ISP
Procedimiento de notificacion de infracciones a ISP
 
Php
PhpPhp
Php
 
Las redes sociales jose luis de la mata
Las redes sociales jose luis de la mataLas redes sociales jose luis de la mata
Las redes sociales jose luis de la mata
 
#Mgghub Cookies y privacidad ¿Cumple tu web con la ley?
#Mgghub Cookies y privacidad ¿Cumple tu web con la ley?#Mgghub Cookies y privacidad ¿Cumple tu web con la ley?
#Mgghub Cookies y privacidad ¿Cumple tu web con la ley?
 
Cookies y otras tecnologías de monitorización en internet
Cookies y otras tecnologías de monitorización en internetCookies y otras tecnologías de monitorización en internet
Cookies y otras tecnologías de monitorización en internet
 

Plus de Sergio Luján Mora - Universidad de Alicante

Plus de Sergio Luján Mora - Universidad de Alicante (20)

Delivering location-based services using GIS, WAP, and the Web: two applications
Delivering location-based services using GIS, WAP, and the Web: two applicationsDelivering location-based services using GIS, WAP, and the Web: two applications
Delivering location-based services using GIS, WAP, and the Web: two applications
 
Clustering of Similar Values, in Spanish, for the Improvement of Search Systems
Clustering of Similar Values, in Spanish, for the Improvement of Search SystemsClustering of Similar Values, in Spanish, for the Improvement of Search Systems
Clustering of Similar Values, in Spanish, for the Improvement of Search Systems
 
XML: Ejemplos de uso
XML: Ejemplos de usoXML: Ejemplos de uso
XML: Ejemplos de uso
 
XML: Introducción
XML: IntroducciónXML: Introducción
XML: Introducción
 
XML: HTML y XHTML
XML: HTML y XHTMLXML: HTML y XHTML
XML: HTML y XHTML
 
Cookies: ¿Qué son y para qué sirven?
Cookies: ¿Qué son y para qué sirven?Cookies: ¿Qué son y para qué sirven?
Cookies: ¿Qué son y para qué sirven?
 
Curso Introduccion accesibilidad web
Curso Introduccion accesibilidad webCurso Introduccion accesibilidad web
Curso Introduccion accesibilidad web
 
¿Qué es un CAPTCHA? Origen y uso
¿Qué es un CAPTCHA? Origen y uso¿Qué es un CAPTCHA? Origen y uso
¿Qué es un CAPTCHA? Origen y uso
 
¿Qué es un CAPTCHA? Futuro
¿Qué es un CAPTCHA? Futuro¿Qué es un CAPTCHA? Futuro
¿Qué es un CAPTCHA? Futuro
 
Errores web: Tame
Errores web: TameErrores web: Tame
Errores web: Tame
 
Errores web: Renfe y las fechas
Errores web: Renfe y las fechasErrores web: Renfe y las fechas
Errores web: Renfe y las fechas
 
Errores web: Renfe y los nombres de las ciudades
Errores web: Renfe y los nombres de las ciudadesErrores web: Renfe y los nombres de las ciudades
Errores web: Renfe y los nombres de las ciudades
 
Errores web: El País
Errores web: El PaísErrores web: El País
Errores web: El País
 
Errores web: Amadeus y su calendario
Errores web: Amadeus y su calendarioErrores web: Amadeus y su calendario
Errores web: Amadeus y su calendario
 
Errores web: Rumbo y su calendario
Errores web: Rumbo y su calendarioErrores web: Rumbo y su calendario
Errores web: Rumbo y su calendario
 
Herramientas de trabajo colaborativo
Herramientas de trabajo colaborativoHerramientas de trabajo colaborativo
Herramientas de trabajo colaborativo
 
Herramientas educativas
Herramientas educativasHerramientas educativas
Herramientas educativas
 
Recursos 2.0 de la Universidad de Alicante
Recursos 2.0 de la Universidad de AlicanteRecursos 2.0 de la Universidad de Alicante
Recursos 2.0 de la Universidad de Alicante
 
La Web 2.0 y la educación
La Web 2.0 y la educaciónLa Web 2.0 y la educación
La Web 2.0 y la educación
 
Presentación de Sergio Luján Mora
Presentación de Sergio Luján MoraPresentación de Sergio Luján Mora
Presentación de Sergio Luján Mora
 

Dernier

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Dernier (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

A Comprehensive Method for Data Warehouse Design

  • 1. Department of Software and Computing Systems A Comprehensive Method for Data Warehouse Design Sergio Luján-Mora, Juan Trujillo (sergio.lujan@ua.es / @sergiolujanmora) Published in: 5th International Workshop on Design and Management of Data Warehouses (DMDW'03), p. 1.1-1.14, Berlin (Germany), September 8 2003. Download: http://gplsi.dlsi.ua.es/almacenes/ver.php?pdf=48
  • 2. Department of Software and Computing Systems A Comprehensive Method for Data Warehouse Design Sergio Luján-Mora Juan Trujillo DMDW 2003
  • 3. A Comprehensive Method for Data Warehouse Design Contents • Motivation • • • • • UML extension mechanisms DW modeling schemas Applying modeling schemas Conclusions Future Work
  • 4. A Comprehensive Method for Data Warehouse Design Motivation • Data warehouses are complex information systems • Support: – OLAP – Data mining – Decision Support Systems –… • Building a DW: time consuming, expensive and prone to fail
  • 5. A Comprehensive Method for Data Warehouse Design Motivation • Partial approaches: – ETL processes – Logical and conceptual design of the DW based on the multidimensional paradigm – Derive DW schema from ER schemas of the data sources –… • DW methods, but not a general model for the different phases
  • 6. A Comprehensive Method for Data Warehouse Design Motivation • Goal: A Comprehensive Method for Data Warehouse Design • Principles that drive our approach: – Standard modeling notation  UML – Comprehensive  Include main phases of DW design – Powerful but easy to understand  Different levels of detail for different users (technical and final users) – Method  Starting point, not a rigid template
  • 7. A Comprehensive Method for Data Warehouse Design Contents • Motivation • UML extension mechanisms • • • • DW modeling schemas Applying modeling schemas Conclusions Future Work
  • 8. A Comprehensive Method for Data Warehouse Design UML extension mechanisms • UML is a general purpose visual modeling language for systems • Extension mechanisms allow the user to tailor it to specific domains • Mechanisms: – Stereotypes  New building elements – Tagged values  New properties – Constraints  New semantics
  • 9. A Comprehensive Method for Data Warehouse Design UML extension mechanisms Icon Decoration Label None
  • 10. A Comprehensive Method for Data Warehouse Design Contents • Motivation • UML extension mechanisms • DW modeling schemas • Applying modeling schemas • Conclusions • Future Work
  • 11. A Comprehensive Method for Data Warehouse Design Data Warehouse Conceptual Schema (DWCS) *** *** *** *** l edo M ETL Process *** *** *** *** *** *** *** *** Analyze *** *** *** *** Exportation Process Business Model Operational Data Schema Data Warehouse Storage Schema (ODS) (DWSS) Diagrams (windows or views into the model) (BM)
  • 12. A Comprehensive Method for Data Warehouse Design General diagram (level 0) <<ODS>>, <<DWCS>>, <<DWSS>>, <<BM>>, <<ETL>>, <<Exportation>> <<BM>> Manager <<BM>> Accounting <<DWCS>> Data warehouse <<ODS>> Sales data <<DWSS>> Informix Metacube <<ODS>> Production data <<ODS>> Syndicated data <<ETL>> Transformations <<Exportation>> Mappings <<DWSS>> Cognos PowerPlay
  • 13. A Comprehensive Method for Data Warehouse Design Data Warehouse Conceptual Schema (DWCS) *** *** *** *** ETL Process *** *** *** *** *** *** *** *** Analyze *** *** *** *** Exportation Process Business Model Operational Data Schema Data Warehouse Storage Schema (ODS) (DWSS) (BM)
  • 14. A Comprehensive Method for Data Warehouse Design ODS • Operational Data Schema • Represents: – Transaction processing systems (OLTP) – External sources (census data, economic data, competitors’ data, etc.) • Not exists a UML extension for modeling different types of data sources
  • 15. A Comprehensive Method for Data Warehouse Design ODS • RDBMS  Rational’s UML Profile for Database Design: <<Database>>, <<Schema>>, <<Table>>, … • ORDBMS  Marcos et al. UML Profile for Object-Relational Database Design: <<array>>, <<row>>, <<ref>>, … • XML  Rational’s XML-DTD UML Profile: <<DTDElement>>, <<DTDElementEmpty>>, <<DTDEntity>>, • …
  • 16. A Comprehensive Method for Data Warehouse Design <<ODS>> Sales data 0..n 0..n 1 1..n 1 <<ODS>> Production data Salesmen 1 0..n <<ODS>> Syndicated data Cities 1 1 1 1..n Counties Groups 0..n 0..n Discount policies 0..n 0..n 1 Families 0..n 1 Products 0..n 0..n 1 1 Packages 0..n Invoices 1 Storage conditions 0..n Lines States 0..n 0..n 1 1 1 Customers 0..n Agents 0..n 1 Categories 1
  • 17. A Comprehensive Method for Data Warehouse Design Data Warehouse Conceptual Schema (DWCS) *** *** *** *** ETL Process *** *** *** *** *** *** *** *** Analyze *** *** *** *** Exportation Process Business Model Operational Data Schema Data Warehouse Storage Schema (ODS) (DWSS) (BM)
  • 18. A Comprehensive Method for Data Warehouse Design DWCS • Data Warehouse Conceptual Schema • UML Profile for Multidimensional Modeling • Basic components: – Facts: the transactions or values being analyzed – Dimensions: descriptive information about the facts • Properties: – – – – Shared dimensions Heterogeneous dimensions Degenerate facts and dimensions Multiple and alternative path classification hierarchies –…
  • 19. A Comprehensive Method for Data Warehouse Design DWCS Level 1 Level 2 Level 3 Model Star schema Dimension/fact definition definition definition
  • 20. A Comprehensive Method for Data Warehouse Design DWCS Package stereotypes Class stereotypes StarPackage (Level 1) Fact (Level 3) FactPackage (Level 2) Dimension (Level 3) DimensionPackage (Level 2) Base (Level 3)
  • 21. A Comprehensive Method for Data Warehouse Design Model definition (level 1) <<StarPackage>> Production schema Sales schema Salesmen schema
  • 22. A Comprehensive Method for Data Warehouse Design Star schema definition (level 2) <<FactPackage>>, <<DimensionPackage>> Production schema Sales schema Salesmen schema Stores dimension Times dimension Sales fact Products dimension Customers dimension
  • 23. A Comprehensive Method for Data Warehouse Design Dimension/fact definition (level 3) <<Fact>>, <<Dimension>>, <<Base>> Customers dim 1 Production schema Sales schema 1 Salesmen schema Customers +child Stores dimension Times dimension +parent 0..n 0..n +child 1 Sales fact Products dimension Customers dimension ZIPs +child 0..n +parent 1 +parent +child Cities 0..n +parent 1 1 States
  • 24. A Comprehensive Method for Data Warehouse Design Data Warehouse Conceptual Schema (DWCS) *** *** *** *** ETL Process *** *** *** *** *** *** *** *** Analyze *** *** *** *** Exportation Process Business Model Operational Data Schema Data Warehouse Storage Schema (ODS) (DWSS) (BM)
  • 25. A Comprehensive Method for Data Warehouse Design DWSS • Data Warehouse Storage Schema • Depending on the implementation (RDMS, ORDBMS, MD, …)  Similar to the ODS • Two possibilities: manual or automatic
  • 26. A Comprehensive Method for Data Warehouse Design Data Warehouse Conceptual Schema (DWCS) *** *** *** *** ETL Process *** *** *** *** *** *** *** *** Analyze *** *** *** *** Exportation Process Business Model Operational Data Schema Data Warehouse Storage Schema (ODS) (DWSS) (BM)
  • 27. A Comprehensive Method for Data Warehouse Design BM • Business Model • Adapt the DW to final users: – Easier to understand – Security concerns –… • UML importing mechanism  Different submodels of DWCS
  • 28. A Comprehensive Method for Data Warehouse Design <<DWCS>> Data warehouse Production schema Sales schema <<BM>> Accounting Salesmen schema Sales schema (from Data warehouse) Importing
  • 29. A Comprehensive Method for Data Warehouse Design Data Warehouse Conceptual Schema (DWCS) *** *** *** *** ETL Process *** *** *** *** *** *** *** *** Analyze *** *** *** *** Exportation Process Business Model Operational Data Schema Data Warehouse Storage Schema (ODS) (DWSS) (BM)
  • 30. A Comprehensive Method for Data Warehouse Design ETL Process • • • • Extraction-Transformation-Loading Mapping between ODS and DWCS UML Profile for Modeling ETL Processes Common mechanisms: – – – – Integration different data sources Transformati Generation of surrogate keys …
  • 31. A Comprehensive Method for Data Warehouse Design ETL Process Aggregation Loader Conversion Log Filter Merge Incorrect Surrogate Join Wrapper
  • 32. A Comprehensive Method for Data Warehouse Design LeftJoin(Storage = IdStorage) Name = Products.Name StName = [Storage conditions].Name StDescription = [Storage conditions].Description Storage conditions (from Sales data) - IdStorage - Name - Description Products dim 1 (from Products dimension) 0..n Products (from Sales data) - IdProduct - Name - Price - Family - Storage NewClass2 - IdProduct - Name - Price - Family - StName - StDescription ProdEuro ProdLoader ProdDescription (from Products dimension) Price = DollarToEuro(Price)
  • 33. A Comprehensive Method for Data Warehouse Design Data Warehouse Conceptual Schema (DWCS) *** *** *** *** ETL Process *** *** *** *** *** *** *** *** Analyze *** *** *** *** Exportation Process Business Model Operational Data Schema Data Warehouse Storage Schema (ODS) (DWSS) (BM)
  • 34. A Comprehensive Method for Data Warehouse Design Exportation Process • Mapping between DWCS and DWSS • Two possibilities: manual or automatic
  • 35. A Comprehensive Method for Data Warehouse Design Contents • Motivation • UML extension mechanisms • DW modeling schemas • Applying modeling schemas • Conclusions • Future Work
  • 36. A Comprehensive Method for Data Warehouse Design
  • 37. A Comprehensive Method for Data Warehouse Design
  • 38. A Comprehensive Method for Data Warehouse Design
  • 39. A Comprehensive Method for Data Warehouse Design Contents • • • • Motivation UML extension mechanisms DW modeling schemas Applying modeling schemas • Conclusions • Future Work
  • 40. A Comprehensive Method for Data Warehouse Design Conclusions • Global DW design method • Best advantages: – Same standard notation (UML) – Integration of different design phases in a single and coherent framework – Scale up to handle huge and complex DWs • CASE tool support with Rational Rose  Add-in
  • 41. A Comprehensive Method for Data Warehouse Design Contents • • • • • Motivation UML extension mechanisms DW modeling schemas Applying modeling schemas Conclusions • Future Work
  • 42. A Comprehensive Method for Data Warehouse Design Future work • Data mapping at attribute level • Diagramming and style guidelines for creating better diagrams • More stages of the DW life cycle (e.g., refresh processes)
  • 43. A Comprehensive Method for Data Warehouse Design Department of Software and Computing Systems A Comprehensive Method for Data Warehouse Design Sergio Luján-Mora Juan Trujillo

Notes de l'éditeur

  1. Good morning to everybody, my name is Sergio Luján-Mora. The work I am going to present (pri’zent) and I have developed (di’velopt) with my colleague (‘kolig) Juan Trujillo is entitled (in’taitl) “A Comprehensive (kompri’hensiv) Method (‘mezod) for Data Warehouse Design (di’sain)”. This work has been carried out in the “Department of Software and Computing Systems” at the “University of Alicante” in Spain.
  2. Good morning to everybody, my name is Sergio Luján-Mora. The work I am going to present (pri’zent) and I have developed (di’velopt) with my colleague (‘kolig) Juan Trujillo is entitled (in’taitl) “A Comprehensive (kompri’hensiv) Method (‘mezod) for Data Warehouse Design (di’sain)”. This work has been carried out in the “Department of Software and Computing Systems” at the “University of Alicante” in Spain.
  3. I have divided my presentation into six main points. Firstly, I will start with the motivation of our work. Then, in the second section I will provide a short background about the UML extension mechanisms (‘mek&amp;nIz&amp;m). Next, I will show the different schemas that we have defined in our data warehouse design approach. And then I will propose a set of steps that help the user to apply our method. Finally, I will end my presentation with the main conclusions and future work. Let us start with the first part of the presentation.
  4. Data warehouses are complex information systems. Nowadays, data warehouses are a key component of information systems because they provide support to OLAP applications, data mining, decision support systems, and so on. It’s well-known that building a data warehouse is time consuming, expensive and prone to fail. There are a lot of studies about building data warehouse and the problems that can be involved (In&apos;vAlvt). Therefore, modeling a data warehouse can be crucial (‘cru:sol) in the building of a data warehouse.
  5. During the last few years, different approaches for modeling data warehouses have appeared (a’piart). However, they are partial approaches because they only address different parts of data warehouses. For example, … On the other hand, some data warehouse methods have been proposed, but they don’t include a general model for the different design steps of a data warehouse.
  6. Therefore, we have been working in the development (dI&apos;vel&amp;pm&amp;nt) of “A Comprehensive (kompri’hensiv) Method (‘mezod) for Data Warehouse Design (di’sain)”. Different principles have driven the design of our approach. First, instead of defining our own graphical notation, we use the UML, a standard visual modelling language. We say that our approach is comprehensive (kompri’hensiv) because we include the main phases of data warehouse design. Moreover, the design of a data warehouse is a joint effort DW developers and final users. Therefore, a powerful (but also easy to understand) method is needed. Finally, we provide a method as a starting point, not as a rigid template. Therefore, it’s not a software development (dI&apos;velopment) process (‘proses) that defines the who, what, when and how of developing software.
  7. Before continuing, I am going to provide a short background about the UML extension mechanisms (‘mek&amp;nIz&amp;m).
  8. The UML is a general purpose visual modeling language for systems. The designers of UML realized that it was simply not possible to design a completely universal modeling language that would satisfy everyone’s needs present and future, so UML incorporates three simple extensibility mechanisms. Stereotypes…, Tagged values…, Constraints…
  9. The main UML extension mechanism (‘mekanIzem) is the stereotype. In a UML diagram, there are four possible representations of a stereotyped element: icon (the stereotype icon is displayed instead of the normal representation of the element), decoration (the stereotype decoration is displayed inside the element), label (the stereotype name is displayed and appears inside guillemots), and none (the stereotype is not indicated).
  10. Now, I will introduce the different schemas that are part of our proposal.
  11. We consider that the development of a data warehouse can be structured into an integrated model with four different schemas (ODS, DWCS, DWSS, BM) and two schema mappings (ETL Process and Exportation Process). Let’s discuss in greater detail each one of the schemas.
  12. I am going to use a motivating example along all the presentation. This the general diagram, the level 0 (‘zIr&amp;U) of the example. Each one of the schemas and mappings is represented as a stereotyped UML package. We have defined 6 stereotypes for this level: ODS, DWCS, DWSS, BM, ETL, and Exportation.
  13. The ODS reflects the structure of the operational data sources and external sources. Nowadays, there does not exist an accepted UML extension for modeling different types of data sources. Therefore, we have to use different UML extensions to model the ODS according to the source.
  14. For examples, if the data source is a relational database… However, if the data source is an object-relational database And if the data source is an XML document, we use…
  15. We use UML packages to divide (dI&apos;vaId) the design process into three levels. In this way, we avoid flat diagrams.
  16. Our UML profile includes the definition of different stereotypes for package, class and attribute. The most important stereotypes are…
  17. The DWSS defines the storage (‘sto:rIdZ) of the data warehouse depending on the target platform.
  18. We have defined a reduced and yet highly powerful set of ETL mechanisms. We have decided to reduce the number of mechanisms in order to reduce the complexity of our proposal.
  19. Providing a graphical notation is not enough to propose a method, instead a method must specify how to properly use the corresponding graphical notation. Therefore, we propose a set of steps to guide the design of a data warehouse following our approach.
  20. Moreover, thanks to the use of the UML packages, we avoid flat diagrams and our method can scale up to handle huge (hju:ch) and complex DWs.
  21. We also plan to incorporate in our method more stages of the DW life cycle (‘saIkl), such as the design of the refresh processes.