How to adeptly model a lean data warehouse for maximum adapability to a changing business, changing source data, changing business rules, changing requirements, and changing needs for integration with NoSQL repositories.
Generative AI on Enterprise Cloud with NiFi and Milvus
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel Upton
1. EDW Data Model Storming for
Integration of NoSQL with RDBMS
SQL Saturday #497, April 2, 2016
Daniel Upton
DW-BI Architect, Data Modeler
DecisionLab.Net
Serving Orange County and San Diego County since 2007
dupton@decisionlab.net
blog: www.decisionlab.net
linkedin.com/in/DanielUpton
2. __________________________________________________________________________________________________________________________________________________________________________________
Page 2 of 20
Open Questions
o With DW-BI now a mainstream I.T. career specialization with an established set of best-
practices, why do many real-world implementations still fall short of satisfying business
stakeholder expectations?
o What influence have Lean and Agile thinking had on DW-BI?
o What parts of DW implementation have been most resistant to Agile?
o Are established DW data modeling methods an asset or a liability?
o What factors are driving change in data modeling for business intelligence?
o What is Data Model Storming?
o What challenges does NoSQL introduce to data modeling intended for integration with RDBMS
data?
o What do we mean by Integration?
o What challenges does NoSQL introduce to data modeling intended for integration with RDBMS
data?
o What does End-to-End Model Storming mean?
Objectives:
o Describe a data modeling method and demonstrate how it differs from both dimensional
modeling and 3rd
Normal Form according to…
o Agile: Quickly and iteratively deliver minimally viable products (MVP’s) to users.
o Lean: Design in loose coupling to minimize or eliminate functional dependencies
o PMBOK: Breakdown work (including design) into small-yet-cohesive chunks.
o Review BEAM Dimensional Model Storming (Corr and Stagnitto)
o Demonstrate some best-practice NoSQL data models as major variations from 3rd
Normal Form.
o Introduce and Perform EDW Model Storming with a simple use case involving unpredictable, last
minute changes to business rules
o Extend the Model Storm w/ a last-minute requirement for NoSQL integration
6. __________________________________________________________________________________________________________________________________________________________________________________
Page 6 of 20
BEAM Model Storming (Corr and Stagnitto)
o Accelerates agile dimensional design with a great short-hand notation on eye-friendly visual
information displays to perform real-time dimensional design occurring during requirements
meetings with business stakeholders.
o Begins with user-information story
o Ends with artifacts that capture the business requirement while also specifying the logic for a
star schema.
o One such artifact is an event matrix (minimal example):
o Includes source data column profiling at column/record level; ignores source data structure
8. __________________________________________________________________________________________________________________________________________________________________________________
Page 8 of 20
More on Lean Data Warehouse / Hyper-Normal / Data Vault): Objectives
o Fully enforced, simple (single-field equi-joins only) referential integrity
o Identify a business key, store values as unique records in a Hub table; Surrogate PK removes all
functional dependencies (tight couplings) to this identifier FROM other tables’ FK’s
o Store history of value changes to all attributes in a child table using LoadDTS and LoadEndDTS.
o Store all table relationships to accommodate any current or future real-world cardinality
relationships (1-to-1, 1-to-M, & M-to-M), via an associative join table. Why
o While preserving all actual relationships between records in related tables, all DW table
relationships now abstracted as Hub_PK, related to Link_FK, related to Hub_PK.
o For Satellite’s identifier fields that, in source, were used as foreign keys (thus tightly coupled),
remove these functional dependencies TO other DV Ensembles.
o Benefits:
o Zero functional dependencies between DW Ensembles, thus small increments may be
designed, loaded and released based only on definition of a Minimally Viable Product
(MVP, rather than forcing larger slower releases or more functionally inter-twined, thus
much larger increments.
o When a directly related data subject area is later to be added in, this is accomplished with
zero re-factoring of the existing ensembles.
9. __________________________________________________________________________________________________________________________________________________________________________________
Page 9 of 20
Mindset for Lean DW ModelStorm Design:
o K.I.S.S: Once a source table determined in-scope, include all fields and records, so you never
have add them later.
o Other than creating Hubs, Satellite, and Links, perform no other transformations in this layer.
o No calculations, aggregations, or business rules (yet).
o As such, we are NOT, or at least NOT YET attempting to define a single version of the truth
(SVOT), nor a data presentation / reporting RDBMS layer.
o Instead, we are…
o Loosely integrating data from multiple data sources
o Aligning it around business keys
o Tracking the history of attributes whose old values may be overwritten in source systems
o Supporting all actual (intended and otherwise) relationships among records in related
tables.
o Doing all of the above while enforcing simple referential integrity exclusively with single-
field equi-joins.
13. __________________________________________________________________________________________________________________________________________________________________________________
Page 13 of 20
Next, for each new table-copy…
o Remove all (source-based) foreign key relationships without removing underlying identifier
fields.
o Remove primary key constraint.
o Add the following control / metadata fields:
DWLoadBatchID_SourceSys
DW_Load_DTS
DW_Load_Expire_DTS
Placeholder_SurrogateKey (explained later)
o Create new composite Primary Key w/ Placeholder Surrogate Key + Load_DTS
o Satellite-splitting
If a subset of fields are updated in source much more frequently than others, and table
will be sufficiently large that ETL processing of the more frequent updates will result in
excessive loading time, split table in two or more subsets.
15. __________________________________________________________________________________________________________________________________________________________________________________
Page 15 of 20
Then, starting with tables classified earlier as bonafide entities
In new submodel, rename Placeholder_SurrogateKey field to Hub[EntityName]_SQN
(or …_HashId) for all tables split from the source entity table
Copy one of these tables again
In newest table-copy, delete all fields except new PK, new control fields AND
Business Key
Rename table as “ Hub_[Enter Entity Name Here] “
Remove ‘Load_DTS’ from Primary Key
Add a unique constraint to the Business Key.
In each corresponding tables, rename each as “ Sat_[Enter Entity Name
Here_&Something] “
Create a defining relationship between Hub (parent / 1) and each “ Sat_[Enter Entity
Name Here_&Something] “ so that child tables FK is also part of it’s PK.
Once all entity tables are converted into Hub-Satellite Sets, start on mere-Association
tables
Still in new submodel, repeat above steps to add control fields
Add new “ Link_[Assoc_Name)_SQN (or _HashID)
As above, set PK as …SQN + Load_DTS
Rename table to “ Sat_Link_[Enter Assoc. Name Here] “
Create another copy of table, and rename as “ Link_[Enter Assoc. Name Here] “
Follow same remaining steps as with Hubs, except that no Business Key remains
in the link.
Create defining relationship from Link (child) to directly related Hubs (parents), so
that Hub_[ParentHub]_SQN is included in the Link.
Create Unique Key on composite of Hub_ParentHub_SQN fields.
o Create defining relationship from Link (parent) to LinkSat (child)
20. __________________________________________________________________________________________________________________________________________________________________________________
Page 20 of 20
DecisionLab.Net
_____________________________________________________________________
Data Warehouse / Business Intelligence envisioning, implementation, oversight, and assessment
________________________________________________________________________________________________________________
This slide deck available now at… slideshare.net/DanielUpton/
_______________________________________________________________________________________________________________
Daniel Upton dupton@decisionlab.net
Carlsbad, CA blog: http://www.decisionlab.net phone 760.525.3268