What's New in Teams Calling, Meetings and Devices March 2024
Warehousing dimension star-snowflake_schemas
1. Data Warehousing – Dimensions | Star and
Snowflake Schemas
Eric Matthews - DataWithUs
2. Defining Some Key Terms
Dimension
• Data Element
• Categorizes each item in a data set
• Provides Structured Labeling/Tagging
• Dimensions can consist of hierarchies. For example: Date |
Month, Quarter, Year
• Dimension tables contain appropriate foreign keys to join
to fact tables.
Dimension – Primary Role
• Data Filtering
• Data Grouping
• Data Labeling
Fact
• Measures, Counted, or aggregate event. For example:
Sales, Admissions, Blood Pressure, Inventory can all be
construed as “facts”
• Fact Tables contain appropriate joining keys
3. Defining Some Key Terms (continued)
Conformed Dimension
• Common set of data structures/attributes
• Can cut across many facts, but…
• The row headers in an answer must be able to exactly
match, or…
• Can be an exact subset
These definitions will come into brighter light as we look at some
examples.
4. Star Schema
• Most atomic form of dimension modeling
• Consists of dimension table(s) modeled around a fact table
• Optimized for querying large data sets
6. Star Schema – Talking Points for Next Diagram
Note: Have original table schema as point of reference.
• Discuss aggregation from source table to fact table rolling
up totals (How this needed to be done).
• Discuss the notion of rolling up fact tables to create other
fact tables (use account type, financial class, and service
code columns in the fact table for basis of discussion)
• Discuss some of the pitfalls of dimension tables by using
the physician dimension as an example (example:
Physicians can change jobs)
• Discuss the Date Dimension from the perspective of the
data in the table… which transitions us to a key point…
…which is similar to how one needs to resolve foreign keys in
reporting the dimension table is a table form of the same
concept.
Additionally, If one has well defined master data then populating
the dimension tables can be done using a columnar subset of the
source master data table.
7. Fact Table: Acct Fin Rollup
Dimension Table
Date Dimension Table
ACCT_NUM Patient
WEEK ACCT_PTPTR
YEAR ACCT_PTPTR
ACCT_GUARANTOR_ID PATIENT_NAME
QUARTER ACCT_REFERRING_MD
MONTH CITY
ACCT_START_DATE STATE
ACCT_END_DATE ZIP
PLAN_SEQ1
ACCT_TYPE
Dimension Table FC
Insurance Plan/Carrier HOSPITAL_SERVICE_CODE
PLAN_SEQ1
PLAN_NAME TOT_TOTAL_CHARGES
Dimension Table
CARRIER TOT_TOTAL_PAYMENTS
Referring Physician
CITY TOT_TOTAL_ADJUSTMENTS
TOT_BALANCE ACCT_REFERRING_MD
STATE
PHYSICIAN_NAME
ZIP
AFFILIATION
AFFILIATION_CITY
AFFILIATION_STATE
AFFILIATION_ZIP
8. Snowflake Schema
• Think Star Schema where the dimension tables are
normalized
• Can be used to segregate rows in dimension tables that
have a high percentage of null data (for faster lookup, you
cannot index null )
9. Snowflake Schema
Fact Table
product_key
Dimension Table
Units product_key
Cost Per Unit supplier_key
Product Info Dimension Table
supplier_key
Supplier Info
10. Conformed Dimension
A conformed dimension is a set of data attributes that have been
physically implemented in multiple tables using the same structure. A
conformed dimension can be applied to different fact tables. For
example:
Dimension Table
Patient
Demographics
(Gender, Age)
Fact Table
Hypertension
Studies
Note: The classic example for
a conformed dimension is Fact Table
date. I wanted to offer a
different example. Lab Results
Fact Table
Diabetes
Assessment
11. Transition to Next Point of Discussion
Star and Snowflake schemas are optimized for
querying large data sets.
They should support:
• OLAP cubes
• Business Intelligence and Analytic Applications
• Ad hoc queries