4. Rule 1 – Do not implement a Data Vault DWH
without DWH Automation
¡ Why ?
¡ You have 3 to 4 times more objects than 3NF, meaning
much more manual development work.
¡ Data Vault objects have generic logic per type (HUB,
SATELLITE & LINK) and there are lots of them.
¡ Therefore code generation can be used to deliver
higher development speeds.
5. Rule 2 – Do not create a DV model that holds
the single version of the Truth in your first layer
of your DWH
¡ Why ?
¡ Single version of the Truth :
¡ Is defined by business and changes faster over
time than the source systems
¡ The single version of the truth is a myth.
¡ As business definitions changes over time you will
also get multiple versions of the truth over time
6. Rule 3 – Do not limit what you record in the DV
based on user requirements
¡ Why ?
¡ End users can not predict everything they need and
they want to be able change their mind on what is
needed.
7. Historical – Milestone projects
Health Sector1
2009
Data Vault
Health Sector2
2010
Data Vault
DWH Automation 1.0
8. Healthcare Sector Project 2
¡ Create a foundation layer the holds :
¡ the Single version of the Facts = Stores data in Source system format
¡ Atomic level data
¡ All data from source except for Interface or other technical tables
¡ All History of change
¡ Integrates data across sources
¡ Use a data Vault modeling which is flexible and resilient to change.
¡ Use etl-generation = OWB OMB-code generation
¡ Important : Reuse of investment of existing ETL-tool is important so
the automation tool should generate Mappings, not replace the
existing tool.
¡ Create a presentation layer
¡ Generate the incremental logic from foundation to presentation layer.
¡ Manual development.
¡ That structures the data in a way end users understands it.
10. Rule 4 – Do not automate incremental logic
towards PL
¡ Why ?
¡ Generic increment logic can not take in account that
there are driving tables, which means all tables are
driving tables in the load logic and this has a huge
impact on the performance.
¡ Exception :
¡ Use Engineered systems to run this logic
¡ Use Engineered systems & in Memory technology
to virtualize the Presentation Layer.
11. Rule 5 – Do not implement all business logic
From Foundation layer to Presentation layer
¡ Atomic level objects that do not exist in the source
should not be placed in the presentation layer
¡ Why ?
¡ They are typically used in business logic to build
multiple Presentation Layer object
¡ Best to persist them before the PL, otherwise you
have to implement the logic to load them,
multiple times
12. Historical – Milestone projects
Health Sector1
2009
Data Vault
Health Sector2
2010
Data Vault
DWH Automation 1.0
13. Bank project
¡ Introduced new DV features in DWH Automation tool
¡ Transactional Links
¡ Same as Logic
¡ Splitting Satellites over Multiple Satellites
¡ More customization so more customer standards
could be supported
14. Historical – Milestone projects
Health Sector1
2009
Data Vault
Health Sector2
2010
Data Vault
DWH Automation 1.0
15. The Agile Information Factory
i
Architecture & Approach
q Innovation
ü Supports all new
concepts in
Information
management
q Delivers Value
ü Agility
ü Cost Reduction
q Best Practices
ü Reuse of approach &
solutions
q Oracle Platform
ü Uses Integrated
Software/Hardware
stack of Oracle
16. DWH-Automation Solution 3.0
• Tripwire
DWH Foundation Accelerator
Analysis
Source Analysis Automation
• Tripwire
DWH Foundation Accelerator
Development
Etl-Code
Generation
• Tripwire
DWH Foundation Accelerator
Testing
Automated
Data Validation
• Redbridge
Lifecycle Management
Automated Release
Management
• Oracle Enterprise Metada
Management
Impact Analysis
Enterprise Metadata
Management
18. ¡ In the foundation layer there are actually 2 persistent layers
(typically stored in 1 schema)
¡ RDV : Raw integration – none to simple business key integration
– the data does not represent common business rules
¡ BDV : Business Data Vault
¡ Business rules are applied
¡ Business key integration takes place
¡ New Business Concept introduced
¡ Data Virtualization of exiting business concepts in the
Raw Vault – Do not persist objects that already exist in
the Raw Data Vault
Foundation Area : The internal Layers
19. Multiple speed Implementation
¡ The Raw and the Business Data Vault area can be built at different speeds
because :
¡ The RAW or Source based Data Vault is :
¡ A technical implementation based on source systems and only requires a
source analysis = Single version of the fact
¡ Data Warehouse automation can be used as the target structure is a direct
representation of the source.
¡ The Business Data Vault is :
¡ A Business based implementation that requires functional and technical
analysis to understand business requirements = Single or multiple versions of
the truth
¡ New Business Concepts can be created (New Hubs) but implementation
experience show typically link tables between existing Source Business
concepts (source based Hubs) support requirements for 90%.
¡ The multiple speed approach supports better functional and technical
analysis when the raw data vault data is already available.
20. Rule 6 – Put the right business logic in the right
layer.
¡ If you do not standardize than you will have to document
everything
¡ Supports the multiple speed approach
¡ Increases the ability to change without high impact.
¡
Parameters to define where to place which Business logic :
¡ Stability : Is this logic likely to change a lot over time
¡ Scope : Enterprise wide, Departmental, User specific
¡ Type : Conditional, Calculation, Aggregation, Data Quality Check, …
¡ Result : Factual, Master Data
21.
22. For questions :
Piet De Windt
piet.de.windt@tripwiresolutions.be
+32 473 99 99 89
Everything you need to build
something exceptional