Published in:
5th International Workshop on Design and Management of Data Warehouses (DMDW'03), p. 1.1-1.14, Berlin (Germany), September 8 2003.
Download:
http://gplsi.dlsi.ua.es/almacenes/ver.php?pdf=48
Scanning the Internet for External Cloud Exposures via SSL Certs
A Comprehensive Method for Data Warehouse Design
1. Department of Software and
Computing Systems
A Comprehensive Method for Data
Warehouse Design
Sergio Luján-Mora, Juan Trujillo
(sergio.lujan@ua.es / @sergiolujanmora)
Published in:
5th International Workshop on Design and Management of
Data Warehouses (DMDW'03), p. 1.1-1.14, Berlin
(Germany), September 8 2003.
Download:
http://gplsi.dlsi.ua.es/almacenes/ver.php?pdf=48
2. Department of Software and
Computing Systems
A Comprehensive Method for
Data Warehouse Design
Sergio Luján-Mora
Juan Trujillo
DMDW 2003
3. A Comprehensive Method for Data Warehouse Design
Contents
• Motivation
•
•
•
•
•
UML extension mechanisms
DW modeling schemas
Applying modeling schemas
Conclusions
Future Work
4. A Comprehensive Method for Data Warehouse Design
Motivation
• Data warehouses are complex
information systems
• Support:
– OLAP
– Data mining
– Decision Support Systems
–…
• Building a DW: time consuming,
expensive and prone to fail
5. A Comprehensive Method for Data Warehouse Design
Motivation
• Partial approaches:
– ETL processes
– Logical and conceptual design of the DW
based on the multidimensional paradigm
– Derive DW schema from ER schemas of
the data sources
–…
• DW methods, but not a general model
for the different phases
6. A Comprehensive Method for Data Warehouse Design
Motivation
• Goal: A Comprehensive Method for Data
Warehouse Design
• Principles that drive our approach:
– Standard modeling notation UML
– Comprehensive Include main phases of DW
design
– Powerful but easy to understand Different
levels of detail for different users (technical and
final users)
– Method Starting point, not a rigid template
7. A Comprehensive Method for Data Warehouse Design
Contents
• Motivation
• UML extension mechanisms
•
•
•
•
DW modeling schemas
Applying modeling schemas
Conclusions
Future Work
8. A Comprehensive Method for Data Warehouse Design
UML extension mechanisms
• UML is a general purpose visual
modeling language for systems
• Extension mechanisms allow the user
to tailor it to specific domains
• Mechanisms:
– Stereotypes New building elements
– Tagged values New properties
– Constraints New semantics
9. A Comprehensive Method for Data Warehouse Design
UML extension mechanisms
Icon
Decoration
Label
None
10. A Comprehensive Method for Data Warehouse Design
Contents
• Motivation
• UML extension mechanisms
• DW modeling schemas
• Applying modeling schemas
• Conclusions
• Future Work
11. A Comprehensive Method for Data Warehouse Design
Data Warehouse
Conceptual Schema
(DWCS)
***
***
*** ***
l edo M
ETL
Process
***
***
*** ***
***
***
*** ***
Analyze
***
***
*** ***
Exportation
Process
Business
Model
Operational Data
Schema
Data Warehouse
Storage Schema
(ODS)
(DWSS)
Diagrams
(windows or views into the model)
(BM)
12. A Comprehensive Method for Data Warehouse Design
General diagram (level 0)
<<ODS>>, <<DWCS>>, <<DWSS>>, <<BM>>, <<ETL>>, <<Exportation>>
<<BM>>
Manager
<<BM>>
Accounting
<<DWCS>>
Data warehouse
<<ODS>>
Sales data
<<DWSS>>
Informix
Metacube
<<ODS>>
Production data
<<ODS>>
Syndicated
data
<<ETL>>
Transformations
<<Exportation>>
Mappings
<<DWSS>>
Cognos
PowerPlay
13. A Comprehensive Method for Data Warehouse Design
Data Warehouse
Conceptual Schema
(DWCS)
***
***
*** ***
ETL
Process
***
***
*** ***
***
***
*** ***
Analyze
***
***
*** ***
Exportation
Process
Business
Model
Operational Data
Schema
Data Warehouse
Storage Schema
(ODS)
(DWSS)
(BM)
14. A Comprehensive Method for Data Warehouse Design
ODS
• Operational Data Schema
• Represents:
– Transaction processing systems (OLTP)
– External sources (census data, economic
data, competitors’ data, etc.)
• Not exists a UML extension for
modeling different types of data sources
15. A Comprehensive Method for Data Warehouse Design
ODS
• RDBMS Rational’s UML Profile for
Database Design: <<Database>>,
<<Schema>>, <<Table>>, …
• ORDBMS Marcos et al. UML Profile for
Object-Relational Database Design:
<<array>>, <<row>>, <<ref>>, …
• XML Rational’s XML-DTD UML Profile:
<<DTDElement>>, <<DTDElementEmpty>>,
<<DTDEntity>>,
• …
16. A Comprehensive Method for Data Warehouse Design
<<ODS>>
Sales data
0..n
0..n
1
1..n
1
<<ODS>>
Production data
Salesmen
1
0..n
<<ODS>>
Syndicated
data
Cities
1
1
1
1..n
Counties
Groups
0..n
0..n
Discount policies
0..n
0..n
1
Families
0..n
1
Products
0..n 0..n
1
1
Packages
0..n
Invoices
1
Storage conditions
0..n
Lines
States
0..n
0..n
1
1
1
Customers
0..n
Agents
0..n
1
Categories
1
17. A Comprehensive Method for Data Warehouse Design
Data Warehouse
Conceptual Schema
(DWCS)
***
***
*** ***
ETL
Process
***
***
*** ***
***
***
*** ***
Analyze
***
***
*** ***
Exportation
Process
Business
Model
Operational Data
Schema
Data Warehouse
Storage Schema
(ODS)
(DWSS)
(BM)
18. A Comprehensive Method for Data Warehouse Design
DWCS
• Data Warehouse Conceptual Schema
• UML Profile for Multidimensional Modeling
• Basic components:
– Facts: the transactions or values being analyzed
– Dimensions: descriptive information about the
facts
• Properties:
–
–
–
–
Shared dimensions
Heterogeneous dimensions
Degenerate facts and dimensions
Multiple and alternative path classification
hierarchies
–…
19. A Comprehensive Method for Data Warehouse Design
DWCS
Level 1
Level 2
Level 3
Model
Star schema
Dimension/fact
definition
definition
definition
20. A Comprehensive Method for Data Warehouse Design
DWCS
Package stereotypes
Class stereotypes
StarPackage
(Level 1)
Fact
(Level 3)
FactPackage
(Level 2)
Dimension
(Level 3)
DimensionPackage
(Level 2)
Base
(Level 3)
21. A Comprehensive Method for Data Warehouse Design
Model definition (level 1)
<<StarPackage>>
Production schema
Sales schema
Salesmen schema
22. A Comprehensive Method for Data Warehouse Design
Star schema definition (level 2)
<<FactPackage>>, <<DimensionPackage>>
Production schema
Sales schema
Salesmen schema
Stores dimension
Times dimension
Sales fact
Products dimension
Customers dimension
23. A Comprehensive Method for Data Warehouse Design
Dimension/fact definition (level 3)
<<Fact>>, <<Dimension>>, <<Base>>
Customers dim
1
Production schema
Sales schema
1
Salesmen schema
Customers
+child
Stores dimension
Times dimension
+parent
0..n
0..n
+child
1
Sales fact
Products dimension
Customers dimension
ZIPs
+child 0..n
+parent
1
+parent
+child
Cities
0..n
+parent
1
1
States
24. A Comprehensive Method for Data Warehouse Design
Data Warehouse
Conceptual Schema
(DWCS)
***
***
*** ***
ETL
Process
***
***
*** ***
***
***
*** ***
Analyze
***
***
*** ***
Exportation
Process
Business
Model
Operational Data
Schema
Data Warehouse
Storage Schema
(ODS)
(DWSS)
(BM)
25. A Comprehensive Method for Data Warehouse Design
DWSS
• Data Warehouse Storage Schema
• Depending on the implementation
(RDMS, ORDBMS, MD, …) Similar
to the ODS
• Two possibilities: manual or automatic
26. A Comprehensive Method for Data Warehouse Design
Data Warehouse
Conceptual Schema
(DWCS)
***
***
*** ***
ETL
Process
***
***
*** ***
***
***
*** ***
Analyze
***
***
*** ***
Exportation
Process
Business
Model
Operational Data
Schema
Data Warehouse
Storage Schema
(ODS)
(DWSS)
(BM)
27. A Comprehensive Method for Data Warehouse Design
BM
• Business Model
• Adapt the DW to final users:
– Easier to understand
– Security concerns
–…
• UML importing mechanism Different
submodels of DWCS
28. A Comprehensive Method for Data Warehouse Design
<<DWCS>>
Data warehouse
Production schema
Sales schema
<<BM>>
Accounting
Salesmen schema
Sales schema
(from Data warehouse)
Importing
29. A Comprehensive Method for Data Warehouse Design
Data Warehouse
Conceptual Schema
(DWCS)
***
***
*** ***
ETL
Process
***
***
*** ***
***
***
*** ***
Analyze
***
***
*** ***
Exportation
Process
Business
Model
Operational Data
Schema
Data Warehouse
Storage Schema
(ODS)
(DWSS)
(BM)
30. A Comprehensive Method for Data Warehouse Design
ETL Process
•
•
•
•
Extraction-Transformation-Loading
Mapping between ODS and DWCS
UML Profile for Modeling ETL Processes
Common mechanisms:
–
–
–
–
Integration different data sources
Transformati
Generation of surrogate keys
…
31. A Comprehensive Method for Data Warehouse Design
ETL Process
Aggregation
Loader
Conversion
Log
Filter
Merge
Incorrect
Surrogate
Join
Wrapper
32. A Comprehensive Method for Data Warehouse Design
LeftJoin(Storage = IdStorage)
Name = Products.Name
StName = [Storage conditions].Name
StDescription = [Storage conditions].Description
Storage conditions
(from Sales data)
- IdStorage
- Name
- Description
Products dim
1
(from Products dimension)
0..n
Products
(from Sales data)
- IdProduct
- Name
- Price
- Family
- Storage
NewClass2
- IdProduct
- Name
- Price
- Family
- StName
- StDescription
ProdEuro
ProdLoader
ProdDescription
(from Products dimension)
Price = DollarToEuro(Price)
33. A Comprehensive Method for Data Warehouse Design
Data Warehouse
Conceptual Schema
(DWCS)
***
***
*** ***
ETL
Process
***
***
*** ***
***
***
*** ***
Analyze
***
***
*** ***
Exportation
Process
Business
Model
Operational Data
Schema
Data Warehouse
Storage Schema
(ODS)
(DWSS)
(BM)
34. A Comprehensive Method for Data Warehouse Design
Exportation Process
• Mapping between DWCS and DWSS
• Two possibilities: manual or automatic
35. A Comprehensive Method for Data Warehouse Design
Contents
• Motivation
• UML extension mechanisms
• DW modeling schemas
• Applying modeling schemas
• Conclusions
• Future Work
39. A Comprehensive Method for Data Warehouse Design
Contents
•
•
•
•
Motivation
UML extension mechanisms
DW modeling schemas
Applying modeling schemas
• Conclusions
• Future Work
40. A Comprehensive Method for Data Warehouse Design
Conclusions
• Global DW design method
• Best advantages:
– Same standard notation (UML)
– Integration of different design phases in a
single and coherent framework
– Scale up to handle huge and complex DWs
• CASE tool support with Rational Rose
Add-in
41. A Comprehensive Method for Data Warehouse Design
Contents
•
•
•
•
•
Motivation
UML extension mechanisms
DW modeling schemas
Applying modeling schemas
Conclusions
• Future Work
42. A Comprehensive Method for Data Warehouse Design
Future work
• Data mapping at attribute level
• Diagramming and style guidelines for
creating better diagrams
• More stages of the DW life cycle (e.g.,
refresh processes)
43. A Comprehensive Method for Data Warehouse Design
Department of Software and
Computing Systems
A Comprehensive Method for
Data Warehouse Design
Sergio Luján-Mora
Juan Trujillo
Notes de l'éditeur
Good morning to everybody, my name is Sergio Luján-Mora.
The work I am going to present (pri’zent) and I have developed (di’velopt) with my colleague (‘kolig) Juan Trujillo is entitled (in’taitl) “A Comprehensive (kompri’hensiv) Method (‘mezod) for Data Warehouse Design (di’sain)”. This work has been carried out in the “Department of Software and Computing Systems” at the “University of Alicante” in Spain.
Good morning to everybody, my name is Sergio Luján-Mora.
The work I am going to present (pri’zent) and I have developed (di’velopt) with my colleague (‘kolig) Juan Trujillo is entitled (in’taitl) “A Comprehensive (kompri’hensiv) Method (‘mezod) for Data Warehouse Design (di’sain)”. This work has been carried out in the “Department of Software and Computing Systems” at the “University of Alicante” in Spain.
I have divided my presentation into six main points.
Firstly, I will start with the motivation of our work.
Then, in the second section I will provide a short background about the UML extension mechanisms (‘mek&nIz&m).
Next, I will show the different schemas that we have defined in our data warehouse design approach.
And then I will propose a set of steps that help the user to apply our method.
Finally, I will end my presentation with the main conclusions and future work.
Let us start with the first part of the presentation.
Data warehouses are complex information systems.
Nowadays, data warehouses are a key component of information systems because they provide support to OLAP applications, data mining, decision support systems, and so on.
It’s well-known that building a data warehouse is time consuming, expensive and prone to fail. There are a lot of studies about building data warehouse and the problems that can be involved (In'vAlvt).
Therefore, modeling a data warehouse can be crucial (‘cru:sol) in the building of a data warehouse.
During the last few years, different approaches for modeling data warehouses have appeared (a’piart). However, they are partial approaches because they only address different parts of data warehouses. For example, …
On the other hand, some data warehouse methods have been proposed, but they don’t include a general model for the different design steps of a data warehouse.
Therefore, we have been working in the development (dI'vel&pm&nt) of “A Comprehensive (kompri’hensiv) Method (‘mezod) for Data Warehouse Design (di’sain)”.
Different principles have driven the design of our approach. First, instead of defining our own graphical notation, we use the UML, a standard visual modelling language. We say that our approach is comprehensive (kompri’hensiv) because we include the main phases of data warehouse design. Moreover, the design of a data warehouse is a joint effort DW developers and final users. Therefore, a powerful (but also easy to understand) method is needed. Finally, we provide a method as a starting point, not as a rigid template. Therefore, it’s not a software development (dI'velopment) process (‘proses) that defines the who, what, when and how of developing software.
Before continuing, I am going to provide a short background about the UML extension mechanisms (‘mek&nIz&m).
The UML is a general purpose visual modeling language for systems.
The designers of UML realized that it was simply not possible to design a completely universal modeling language that would satisfy everyone’s needs present and future, so UML incorporates three simple extensibility mechanisms.
Stereotypes…, Tagged values…, Constraints…
The main UML extension mechanism (‘mekanIzem) is the stereotype.
In a UML diagram, there are four possible representations of a stereotyped element: icon (the stereotype icon is displayed instead of the normal representation of the element), decoration (the stereotype decoration is displayed inside the element), label (the stereotype name is displayed and appears inside guillemots), and none (the stereotype is not indicated).
Now, I will introduce the different schemas that are part of our proposal.
We consider that the development of a data warehouse can be structured into an integrated model with four different schemas (ODS, DWCS, DWSS, BM) and two schema mappings (ETL Process and Exportation Process).
Let’s discuss in greater detail each one of the schemas.
I am going to use a motivating example along all the presentation. This the general diagram, the level 0 (‘zIr&U) of the example.
Each one of the schemas and mappings is represented as a stereotyped UML package. We have defined 6 stereotypes for this level: ODS, DWCS, DWSS, BM, ETL, and Exportation.
The ODS reflects the structure of the operational data sources and external sources.
Nowadays, there does not exist an accepted UML extension for modeling different types of data sources. Therefore, we have to use different UML extensions to model the ODS according to the source.
For examples, if the data source is a relational database…
However, if the data source is an object-relational database
And if the data source is an XML document, we use…
We use UML packages to divide (dI'vaId) the design process into three levels. In this way, we avoid flat diagrams.
Our UML profile includes the definition of different stereotypes for package, class and attribute. The most important stereotypes are…
The DWSS defines the storage (‘sto:rIdZ) of the data warehouse depending on the target platform.
We have defined a reduced and yet highly powerful set of ETL mechanisms. We have decided to reduce the number of mechanisms in order to reduce the complexity of our proposal.
Providing a graphical notation is not enough to propose a method, instead a method must specify how to properly use the corresponding graphical notation. Therefore, we propose a set of steps to guide the design of a data warehouse following our approach.
Moreover, thanks to the use of the UML packages, we avoid flat diagrams and our method can scale up to handle huge (hju:ch) and complex DWs.
We also plan to incorporate in our method more stages of the DW life cycle (‘saIkl), such as the design of the refresh processes.