More Related Content
Similar to DBT ELT approach for Advanced Analytics.pptx (20)
DBT ELT approach for Advanced Analytics.pptx
- 3. CONFIDENTIAL. Copyright © 3
Agenda
01
02
03
04
05
06
Motivation
DBT Approach
How to work with DBT
Demo
Key take away
Discussion
- 6. CONFIDENTIAL. Copyright © 6
Data Analytics (DA) daily job
How to prepare master table?
• Drag and drop to visualization tool?
• Modeling on the fly?
• Write complex queries?
Multiple data sources Multiple tables Data modeling
Source: link
BIG DATA!?
Volume: 10GB 5 years of Data.
Variety: multiple data sources.
Velocity: real-time analytics.
- 7. CONFIDENTIAL. Copyright © 7
We moved to Datawarehouse
Lead time at least 2 weeks
DA don’t understand what DE did
And vise versa
Data warehouse
Transform Load
Extract
- 8. CONFIDENTIAL. Copyright © 8
DE challenges
Readability
• How to read and
understand this
query?
• Where to start?
Accessibility
• How to verify the
output?
• Can we break the
script into smaller
pieces for testing?
Collaboration
• How to reuse this
query for other
analysis?
• How to onboard
new members?
• How to explain if
there’re 100
tables?
Scripting
• How to reuse this
query for other
analysis?
• How to manage
model versions?
- 9. CONFIDENTIAL. Copyright © 9
CONFIDENTIAL. Copyright © 9
Customer segmentation: Segmentation is a technique used to divide
customers into groups based on certain characteristics or behaviors. This can
help businesses understand their customers better and tailor their
marketing efforts to specific groups. SQL can be used to create customer
segments by grouping customers based on demographic information (age,
gender, location) or transactional data (purchase history, frequency,
monetary value).
Cohort Analysis: DBT can be used to perform cohort analysis by
transforming raw data into a format suitable for analysis. By using DBT to
transform the data, analysts can quickly identify patterns and trends in user
behavior and track the performance of different customer segments over
time.
Marketing Attribution: DBT can be used to perform marketing attribution
analysis by transforming raw data into a format suitable for analysis. By
using DBT to transform the data, marketers can better understand which
channels and campaigns are driving the most conversions and optimize their
marketing spend accordingly.
Financial Reporting: DBT can be used to transform financial data into a
format suitable for reporting and analysis. By using DBT to transform the
data, financial analysts can quickly generate accurate and consistent reports
that provide insights into company performance, revenue, expenses, and
other key financial metrics.
Demand forecasting: DBT can be used to create a series of transformations
on raw transactional data to prepare it for predictive modeling. For example,
it can be used to aggregate transactional data by time periods (e.g., days,
weeks, or months) and join it with other relevant data sources such as
weather data, holidays, or other events that can affect demand.
Recommendation engines: Recommendation engines are used to suggest
products or services to customers based on their past behavior or
preferences. SQL can be used to create recommendation engines by
analyzing customer purchase history and identifying patterns or similarities
between customers. This can be used to suggest similar products or to
identify cross-selling opportunities.
USE CASES
ADVANCED ANALYTICS
How to go fast with Data-driven
culture and Advanced Analytics?
- 11. CONFIDENTIAL. Copyright © 11
Migration from Imperative to Declarative
LEADING INSURANCE COMPANY
Say goodbye to spaghetti
code and complex DOM
manipulations with ReactJS
Infrastructure as code (IaC)
with Terraform
Managing containerized
applications at scale has
never been easier with K8s
More accurate and efficient
analytics with DBT
Front end
Cluster
orchestration
Dev Ops
Data job/op
- 12. CONFIDENTIAL. Copyright © 12
DBT philosophy
DDL, DML-free
Just write SELECT * FROM table
instead of having to manage multiple
DDL (CRUD), DML (tables, views)
transactions, schema, Pandas
DataFrame, etc.
DRY (Don’t Repeat Yourself)
Modularize the data model, reuse it in
many places instead of rewriting it
from scratch when moving to new
analytics (macros, hooks, package
management).
Avoid copying / pasting SQL scripts in
many places, not reusable, easy to
generate errors when the original data
model needs to be edited.
Model versioning
Data models are versioned, making it
easier to learn the process of building
business logic over time, collaborating
with team members (branching, pull
requests, code reviews,
documentation).
Data quality control
Writing tests for data models is quick
and convenient. Analysis errors often
occur in the corner cases, by
preventing these cases will make the
model more reliable later on.
- 13. CONFIDENTIAL. Copyright © 13
dbt and the modern BI stack
Source: link
dbt (data build tool) is a command line tool that enables data analysts and engineers to transform data in their
warehouses more effectively. Today, dbt has ~850 companies using it in production, including companies like Casper,
Seatgeek, and Wistia.
Load Transform
Extract
- 15. CONFIDENTIAL. Copyright © 15
Step 1: Develop models Step 2: compile project Step 3: Build tables + views
Write business logic with a simple SQL file
DBT infers the dependencies in the data models and
builds the DAG (directed acyclic graph) for us.
When running dbt, the business logic will build as
tables or views in the data warehouse.
- 16. CONFIDENTIAL. Copyright © 16
CONFIDENTIAL. Copyright © 16
Demo
Goal: calculate monthly sales values by category
Tech stacks: DBT, Databricks, Azure Blob
Data: Brazilian E-Commerce Public Dataset by Olist (Kaggle)
Github: https://github.com/ongxuanhong/de05-dbt-databricks
Youtube: https://youtube.com/playlist?list=PLR0bWeb09-BxoexgE1JD-CUC7TAtNVeyO
- 19. CONFIDENTIAL. Copyright © 19
Data Lakehouse: ingested data (JSON, Parquet, Avro, Delta)
Brazilian E-Commerce Public Dataset by Olist | Kaggle
- 31. CONFIDENTIAL. Copyright © 31
CONFIDENTIAL. Copyright © 31
• Enables seamless data transformation: DBT
automates the transformation of raw data into
a format that is useful for analytics. This allows
data analysts and engineers to focus on
insights and analysis rather than spending
time on data preparation.
• Provides a modular approach to data
transformation: DBT’s modular approach
makes it easy to break down complex
transformations into smaller, more
manageable steps. This allows teams to work
collaboratively on specific parts of a project
and to easily modify and test those parts
without affecting the entire project.
• Promotes data consistency and quality: DBT
enforces strict data testing and documentation
requirements, ensuring that data is accurate,
consistent, and reliable. This enables analysts
and engineers to have confidence in the data
they are working with, leading to better
insights and more informed decision-making.
Benefits
DATA BUILD TOOL (DBT)
Grown at 10% every single month (github)
- 32. CONFIDENTIAL. Copyright © 32
CONFIDENTIAL. Copyright © 32
• Requires SQL knowledge: While dbt makes
it easier to work with SQL, it still requires
a certain level of SQL knowledge to
use effectively. If you don't have experience
with SQL, you may need to invest time in
learning it in order to use dbt effectively.
• Performance overhead: Depending on the
complexity of your dbt models and the size
of your data, there may be a performance
overhead associated with using dbt.
• Limited scope: While dbt can help automate
some aspects of data modeling, it doesn't
solve all data-related problems. It's
important to understand the limitations of
dbt and when other tools or approaches
might be more appropriate.
Be aware of
DATA BUILD TOOL (DBT)
Source (link)
- 33. CONFIDENTIAL. Copyright © 33
Discussion
What is analytics engineering?
dbt: Model contract v1.5
dbt + Machine Learning: What makes a great baton pass?
dbt Cloud integrations (Snowflake, Airflow, Monte Carlo)
- 34. CONFIDENTIAL. Copyright © 34
References
• What is analytics engineering?
• What is dbt?
• Quickstart for dbt Core
• Tristan Handy — The Work Behind the Data Work
Editor's Notes
-
How to move faster?
How to ensure data quality?