[DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx

Practical Kimball Data
Patterns
Antoni Ivanov
Lead maintainer of Versatile Data Kit

2
Where are we going ?
Data Science & Data Engineering Process
Dimensional modelling
Versatile Data Kit
Create our own dimension and facts (exercise)
Create ML model (exercise)

3
Data Science & Data Engineering Process
https://neptune.ai/blog/best-practices-for-data-science-project-workflows-and-file-organizations

4
Where are we in the agenda?
Data Science Process
Versatile Data Kit
Create ML model using our fact and dimension tables (exercise)
Cons and Pros of dimensional modelling in ML

5
Data modeling
Data Integration and Transformation
Data Sources
3rd party SaaS
products
Corporate
systems/DBs
Data driven products
Insights
BI
Data Science tools
Business model

6
Dimensional modeling
Dimension and fact tables

7
Why dimensional modelling ?
Performance
Extensibility
Consistency
Ease of understanding

8
What is Kimball ?
https://www.kimballgroup.com/
Architecture
Process
Design Patterns (Techniques)

9
Kimball Architecture
Quick mention
Data Integration and Transformation Insights
Data Sources
3rd party SaaS
products
Corporate
systems/DBs
Business model
Staging (Data Lake)
BI
Data Science tools
Back room (the kitchen) Front room (the dining room)
Metadata
See more in https://bit.ly/kimball-architecture

10
Kimball Dimensional Design Process
Data modelling steps consider both business needs and data realitities.
Identify the business
process
Identify the Grain
Identify the Dimensions
Identify the Facts
Checking account balance
Boarding a plane
Date, Customer, Bank
Date, Passenger, Flight, Airline
The bank account balance each month
The boarding pass scanned at the gate of a passenger
monthly account balance snapshot
passenger boarding event

11
Kimball Data Modelling Design Patterns
Kimball Dimensional Modelling Techniques
Transaction fact tables
Periodic snapshot fact tables
Accumulating snapshot fact tables
Slowly Changing Dimension Type 1 to 6

12
Versatile Data Kit
Create ML model using our fact and dimension tables (exercise)

13
Versatile Data Kit
Data lifecycle (Data Journey) and where VDK fits in
Ingest
Data Job
Transfor
m
Data Job
Export
Data Job
Data Integration and Transformation Insights
Data Sources
3rd party SaaS
products
Corporate
systems/DBs
Business model
Raw Data (Data Lake)
BI & Data
Science tools
Automate DevOps for Data

14
Versatile Data Kit

15
Data Modeling Insights
Data Sources
Product
events
Corporate
systems
Data model
(Dimensional model)
Raw Data
(Data Lake)
Data driven
products
BI & Data
Science tools
Transform
Ingest Publish Export
ML Modeling
Train &
Validation Train
Model
Object
The Data & ML Journey

Confidential │ ©2021 VMware, Inc. 16
Use Case– EV Prediction Model
• Follow Dimensional Design Process
• Process data into dimension and facts
• Read from local files (CSV/Excel)
• Create data jobs using Versatile Data Kit
• Build and test a linear regression model (using VDK)
• Create an interactive visualization (using Streamlit)

Confidential │ ©2021 VMware, Inc. 17
We work for Volkswagen (or Tesla or…) and
our customers complain that their battery is
drained in the middle of a trip.
We want to provide them with an app
How much should you expect your battery to be
drained if you drive 60 km at 50 km per hour, using
heated seats?

18
Expected application would look like

19
Try it yourself at home
What you need:
Open: bit.ly/dsc-demo
Laptop Internet Connectivity

20
Versatile Data Kit

21
Feedback is appreciated
https://bit.ly/vdk-dsc

22
Thank you!
https://github.com/vmware/versatile-data-kit/#contacts
https://www.linkedin.com/in/antoni-ivanov

26
Meeting business needs with quality and efficiency
Challenges :
• Efficiently processing the data and making it
ready for BI and Data Science.
• Troubleshooting and debugging data issues
Product Telemetry
BIlling Data
NPS Customer Success
Customer Data
Support Data
Integrate data from diverse
data sources
Clean &
pre-process data
Reporting, Advanced Analytics
and Data Science
Troubleshoot & debug
• Quickly enhancing existing analytics
• Transforming raw data into business KPIs
• Productionizing the data analytics
Deploy & Operate

[DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx

Recommandé

Recommandé

Contenu connexe

Similaire à [DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx

Similaire à [DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx (20)

Plus de DataScienceConferenc1

Plus de DataScienceConferenc1 (20)

Dernier

Dernier (20)

[DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx

Notes de l'éditeur