SlideShare a Scribd company logo
1 of 34
CONFIDENTIAL. Copyright © 1
1
DBT (DATA BUILD TOOL) AN ELT APPROACH FOR
ADVANCED ANALYTICS
CONFIDENTIAL. Copyright © 2
8+ years swimming in data @
A Researcher, Engineer and Blogger
CONFIDENTIAL. Copyright © 3
Agenda
01
02
03
04
05
06
Motivation
DBT Approach
How to work with DBT
Demo
Key take away
Discussion
CONFIDENTIAL. Copyright © 4
Motivation
CONFIDENTIAL. Copyright © 5
We start with Excel files
CONFIDENTIAL. Copyright © 6
Data Analytics (DA) daily job
How to prepare master table?
• Drag and drop to visualization tool?
• Modeling on the fly?
• Write complex queries?
Multiple data sources Multiple tables Data modeling
Source: link
BIG DATA!?
Volume: 10GB 5 years of Data.
Variety: multiple data sources.
Velocity: real-time analytics.
CONFIDENTIAL. Copyright © 7
We moved to Datawarehouse
Lead time at least 2 weeks
DA don’t understand what DE did
And vise versa
Data warehouse
Transform Load
Extract
CONFIDENTIAL. Copyright © 8
DE challenges
Readability
• How to read and
understand this
query?
• Where to start?
Accessibility
• How to verify the
output?
• Can we break the
script into smaller
pieces for testing?
Collaboration
• How to reuse this
query for other
analysis?
• How to onboard
new members?
• How to explain if
there’re 100
tables?
Scripting
• How to reuse this
query for other
analysis?
• How to manage
model versions?
CONFIDENTIAL. Copyright © 9
CONFIDENTIAL. Copyright © 9
Customer segmentation: Segmentation is a technique used to divide
customers into groups based on certain characteristics or behaviors. This can
help businesses understand their customers better and tailor their
marketing efforts to specific groups. SQL can be used to create customer
segments by grouping customers based on demographic information (age,
gender, location) or transactional data (purchase history, frequency,
monetary value).
Cohort Analysis: DBT can be used to perform cohort analysis by
transforming raw data into a format suitable for analysis. By using DBT to
transform the data, analysts can quickly identify patterns and trends in user
behavior and track the performance of different customer segments over
time.
Marketing Attribution: DBT can be used to perform marketing attribution
analysis by transforming raw data into a format suitable for analysis. By
using DBT to transform the data, marketers can better understand which
channels and campaigns are driving the most conversions and optimize their
marketing spend accordingly.
Financial Reporting: DBT can be used to transform financial data into a
format suitable for reporting and analysis. By using DBT to transform the
data, financial analysts can quickly generate accurate and consistent reports
that provide insights into company performance, revenue, expenses, and
other key financial metrics.
Demand forecasting: DBT can be used to create a series of transformations
on raw transactional data to prepare it for predictive modeling. For example,
it can be used to aggregate transactional data by time periods (e.g., days,
weeks, or months) and join it with other relevant data sources such as
weather data, holidays, or other events that can affect demand.
Recommendation engines: Recommendation engines are used to suggest
products or services to customers based on their past behavior or
preferences. SQL can be used to create recommendation engines by
analyzing customer purchase history and identifying patterns or similarities
between customers. This can be used to suggest similar products or to
identify cross-selling opportunities.
USE CASES
ADVANCED ANALYTICS
How to go fast with Data-driven
culture and Advanced Analytics?
CONFIDENTIAL. Copyright © 10
DBT Approach
DBT (Data Build Tool) an ELT approach for Advanced Analytics
CONFIDENTIAL. Copyright © 11
Migration from Imperative to Declarative
LEADING INSURANCE COMPANY
Say goodbye to spaghetti
code and complex DOM
manipulations with ReactJS
Infrastructure as code (IaC)
with Terraform
Managing containerized
applications at scale has
never been easier with K8s
More accurate and efficient
analytics with DBT
Front end
Cluster
orchestration
Dev Ops
Data job/op
CONFIDENTIAL. Copyright © 12
DBT philosophy
DDL, DML-free
Just write SELECT * FROM table
instead of having to manage multiple
DDL (CRUD), DML (tables, views)
transactions, schema, Pandas
DataFrame, etc.
DRY (Don’t Repeat Yourself)
Modularize the data model, reuse it in
many places instead of rewriting it
from scratch when moving to new
analytics (macros, hooks, package
management).
Avoid copying / pasting SQL scripts in
many places, not reusable, easy to
generate errors when the original data
model needs to be edited.
Model versioning
Data models are versioned, making it
easier to learn the process of building
business logic over time, collaborating
with team members (branching, pull
requests, code reviews,
documentation).
Data quality control
Writing tests for data models is quick
and convenient. Analysis errors often
occur in the corner cases, by
preventing these cases will make the
model more reliable later on.
CONFIDENTIAL. Copyright © 13
dbt and the modern BI stack
Source: link
dbt (data build tool) is a command line tool that enables data analysts and engineers to transform data in their
warehouses more effectively. Today, dbt has ~850 companies using it in production, including companies like Casper,
Seatgeek, and Wistia.
Load Transform
Extract
CONFIDENTIAL. Copyright © 14
How to work with DBT
CONFIDENTIAL. Copyright © 15
Step 1: Develop models Step 2: compile project Step 3: Build tables + views
Write business logic with a simple SQL file
DBT infers the dependencies in the data models and
builds the DAG (directed acyclic graph) for us.
When running dbt, the business logic will build as
tables or views in the data warehouse.
CONFIDENTIAL. Copyright © 16
CONFIDENTIAL. Copyright © 16
Demo
Goal: calculate monthly sales values by category
Tech stacks: DBT, Databricks, Azure Blob
Data: Brazilian E-Commerce Public Dataset by Olist (Kaggle)
Github: https://github.com/ongxuanhong/de05-dbt-databricks
Youtube: https://youtube.com/playlist?list=PLR0bWeb09-BxoexgE1JD-CUC7TAtNVeyO
CONFIDENTIAL. Copyright © 17
Calculate monthly sales values by category
values_per_bills = total_sales / total_bills
CONFIDENTIAL. Copyright © 18
DBT on Databricks Data Lakehouse with Brazilian Ecommerce dataset
Source: link
CONFIDENTIAL. Copyright © 19
Data Lakehouse: ingested data (JSON, Parquet, Avro, Delta)
Brazilian E-Commerce Public Dataset by Olist | Kaggle
CONFIDENTIAL. Copyright © 20
Data Lakehouse: create external tables (JSON, Parquet, Avro, Delta)
CONFIDENTIAL. Copyright © 21
Brazilian Ecommerce tables
CONFIDENTIAL. Copyright © 22
Initialize project
CONFIDENTIAL. Copyright © 23
DBT run
CONFIDENTIAL. Copyright © 24
Full pipeline
CONFIDENTIAL. Copyright © 25
Full pipeline
CONFIDENTIAL. Copyright © 26
Macros
CONFIDENTIAL. Copyright © 27
Pivot table
CONFIDENTIAL. Copyright © 28
DBT packages
https://hub.getdbt.com/
CONFIDENTIAL. Copyright © 29
DBT data lineage and output reports
CONFIDENTIAL. Copyright © 30
Key take away
CONFIDENTIAL. Copyright © 31
CONFIDENTIAL. Copyright © 31
• Enables seamless data transformation: DBT
automates the transformation of raw data into
a format that is useful for analytics. This allows
data analysts and engineers to focus on
insights and analysis rather than spending
time on data preparation.
• Provides a modular approach to data
transformation: DBT’s modular approach
makes it easy to break down complex
transformations into smaller, more
manageable steps. This allows teams to work
collaboratively on specific parts of a project
and to easily modify and test those parts
without affecting the entire project.
• Promotes data consistency and quality: DBT
enforces strict data testing and documentation
requirements, ensuring that data is accurate,
consistent, and reliable. This enables analysts
and engineers to have confidence in the data
they are working with, leading to better
insights and more informed decision-making.
Benefits
DATA BUILD TOOL (DBT)
Grown at 10% every single month (github)
CONFIDENTIAL. Copyright © 32
CONFIDENTIAL. Copyright © 32
• Requires SQL knowledge: While dbt makes
it easier to work with SQL, it still requires
a certain level of SQL knowledge to
use effectively. If you don't have experience
with SQL, you may need to invest time in
learning it in order to use dbt effectively.
• Performance overhead: Depending on the
complexity of your dbt models and the size
of your data, there may be a performance
overhead associated with using dbt.
• Limited scope: While dbt can help automate
some aspects of data modeling, it doesn't
solve all data-related problems. It's
important to understand the limitations of
dbt and when other tools or approaches
might be more appropriate.
Be aware of
DATA BUILD TOOL (DBT)
Source (link)
CONFIDENTIAL. Copyright © 33
Discussion
What is analytics engineering?
dbt: Model contract v1.5
dbt + Machine Learning: What makes a great baton pass?
dbt Cloud integrations (Snowflake, Airflow, Monte Carlo)
CONFIDENTIAL. Copyright © 34
References
• What is analytics engineering?
• What is dbt?
• Quickstart for dbt Core
• Tristan Handy — The Work Behind the Data Work

More Related Content

What's hot

What's hot (20)

Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stack
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Data Mesh 101
Data Mesh 101Data Mesh 101
Data Mesh 101
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!Snowflake: The most cost-effective agile and scalable data warehouse ever!
Snowflake: The most cost-effective agile and scalable data warehouse ever!
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
 

Similar to DBT ELT approach for Advanced Analytics.pptx

KeyAchivementsMimecast
KeyAchivementsMimecastKeyAchivementsMimecast
KeyAchivementsMimecast
Vera Ekimenko
 
What Is New In 2008 R2 Public
What Is New In 2008 R2 PublicWhat Is New In 2008 R2 Public
What Is New In 2008 R2 Public
sqlserver.co.il
 

Similar to DBT ELT approach for Advanced Analytics.pptx (20)

Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10
 
Traditional data word
Traditional data wordTraditional data word
Traditional data word
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Amit_Kumar_CV
Amit_Kumar_CVAmit_Kumar_CV
Amit_Kumar_CV
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Webinar on MongoDB BI Connectors
Webinar on MongoDB BI ConnectorsWebinar on MongoDB BI Connectors
Webinar on MongoDB BI Connectors
 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail
 
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data ModelerDAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Open Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise ITOpen Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise IT
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal ForecastingBest Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
 
KeyAchivementsMimecast
KeyAchivementsMimecastKeyAchivementsMimecast
KeyAchivementsMimecast
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
What Is New In 2008 R2 Public
What Is New In 2008 R2 PublicWhat Is New In 2008 R2 Public
What Is New In 2008 R2 Public
 

More from Hong Ong

More from Hong Ong (8)

Feast Feature Store - An In-depth Overview Experimentation and Application in...
Feast Feature Store - An In-depth Overview Experimentation and Application in...Feast Feature Store - An In-depth Overview Experimentation and Application in...
Feast Feature Store - An In-depth Overview Experimentation and Application in...
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
 
Data Products for Mobile Commerce in Real-time and Real-life.pdf
Data Products for Mobile Commerce in Real-time and Real-life.pdfData Products for Mobile Commerce in Real-time and Real-life.pdf
Data Products for Mobile Commerce in Real-time and Real-life.pdf
 
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
 
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thịDistance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
 
Nền tảng thuật toán của AI, Machine Learning, Big Data
Nền tảng thuật toán của AI, Machine Learning, Big DataNền tảng thuật toán của AI, Machine Learning, Big Data
Nền tảng thuật toán của AI, Machine Learning, Big Data
 
Bắt đầu nghiên cứu Big Data
Bắt đầu nghiên cứu Big DataBắt đầu nghiên cứu Big Data
Bắt đầu nghiên cứu Big Data
 
Bắt đầu học data science
Bắt đầu học data scienceBắt đầu học data science
Bắt đầu học data science
 

Recently uploaded

Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Recently uploaded (20)

Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 

DBT ELT approach for Advanced Analytics.pptx

  • 1. CONFIDENTIAL. Copyright © 1 1 DBT (DATA BUILD TOOL) AN ELT APPROACH FOR ADVANCED ANALYTICS
  • 2. CONFIDENTIAL. Copyright © 2 8+ years swimming in data @ A Researcher, Engineer and Blogger
  • 3. CONFIDENTIAL. Copyright © 3 Agenda 01 02 03 04 05 06 Motivation DBT Approach How to work with DBT Demo Key take away Discussion
  • 5. CONFIDENTIAL. Copyright © 5 We start with Excel files
  • 6. CONFIDENTIAL. Copyright © 6 Data Analytics (DA) daily job How to prepare master table? • Drag and drop to visualization tool? • Modeling on the fly? • Write complex queries? Multiple data sources Multiple tables Data modeling Source: link BIG DATA!? Volume: 10GB 5 years of Data. Variety: multiple data sources. Velocity: real-time analytics.
  • 7. CONFIDENTIAL. Copyright © 7 We moved to Datawarehouse Lead time at least 2 weeks DA don’t understand what DE did And vise versa Data warehouse Transform Load Extract
  • 8. CONFIDENTIAL. Copyright © 8 DE challenges Readability • How to read and understand this query? • Where to start? Accessibility • How to verify the output? • Can we break the script into smaller pieces for testing? Collaboration • How to reuse this query for other analysis? • How to onboard new members? • How to explain if there’re 100 tables? Scripting • How to reuse this query for other analysis? • How to manage model versions?
  • 9. CONFIDENTIAL. Copyright © 9 CONFIDENTIAL. Copyright © 9 Customer segmentation: Segmentation is a technique used to divide customers into groups based on certain characteristics or behaviors. This can help businesses understand their customers better and tailor their marketing efforts to specific groups. SQL can be used to create customer segments by grouping customers based on demographic information (age, gender, location) or transactional data (purchase history, frequency, monetary value). Cohort Analysis: DBT can be used to perform cohort analysis by transforming raw data into a format suitable for analysis. By using DBT to transform the data, analysts can quickly identify patterns and trends in user behavior and track the performance of different customer segments over time. Marketing Attribution: DBT can be used to perform marketing attribution analysis by transforming raw data into a format suitable for analysis. By using DBT to transform the data, marketers can better understand which channels and campaigns are driving the most conversions and optimize their marketing spend accordingly. Financial Reporting: DBT can be used to transform financial data into a format suitable for reporting and analysis. By using DBT to transform the data, financial analysts can quickly generate accurate and consistent reports that provide insights into company performance, revenue, expenses, and other key financial metrics. Demand forecasting: DBT can be used to create a series of transformations on raw transactional data to prepare it for predictive modeling. For example, it can be used to aggregate transactional data by time periods (e.g., days, weeks, or months) and join it with other relevant data sources such as weather data, holidays, or other events that can affect demand. Recommendation engines: Recommendation engines are used to suggest products or services to customers based on their past behavior or preferences. SQL can be used to create recommendation engines by analyzing customer purchase history and identifying patterns or similarities between customers. This can be used to suggest similar products or to identify cross-selling opportunities. USE CASES ADVANCED ANALYTICS How to go fast with Data-driven culture and Advanced Analytics?
  • 10. CONFIDENTIAL. Copyright © 10 DBT Approach DBT (Data Build Tool) an ELT approach for Advanced Analytics
  • 11. CONFIDENTIAL. Copyright © 11 Migration from Imperative to Declarative LEADING INSURANCE COMPANY Say goodbye to spaghetti code and complex DOM manipulations with ReactJS Infrastructure as code (IaC) with Terraform Managing containerized applications at scale has never been easier with K8s More accurate and efficient analytics with DBT Front end Cluster orchestration Dev Ops Data job/op
  • 12. CONFIDENTIAL. Copyright © 12 DBT philosophy DDL, DML-free Just write SELECT * FROM table instead of having to manage multiple DDL (CRUD), DML (tables, views) transactions, schema, Pandas DataFrame, etc. DRY (Don’t Repeat Yourself) Modularize the data model, reuse it in many places instead of rewriting it from scratch when moving to new analytics (macros, hooks, package management). Avoid copying / pasting SQL scripts in many places, not reusable, easy to generate errors when the original data model needs to be edited. Model versioning Data models are versioned, making it easier to learn the process of building business logic over time, collaborating with team members (branching, pull requests, code reviews, documentation). Data quality control Writing tests for data models is quick and convenient. Analysis errors often occur in the corner cases, by preventing these cases will make the model more reliable later on.
  • 13. CONFIDENTIAL. Copyright © 13 dbt and the modern BI stack Source: link dbt (data build tool) is a command line tool that enables data analysts and engineers to transform data in their warehouses more effectively. Today, dbt has ~850 companies using it in production, including companies like Casper, Seatgeek, and Wistia. Load Transform Extract
  • 14. CONFIDENTIAL. Copyright © 14 How to work with DBT
  • 15. CONFIDENTIAL. Copyright © 15 Step 1: Develop models Step 2: compile project Step 3: Build tables + views Write business logic with a simple SQL file DBT infers the dependencies in the data models and builds the DAG (directed acyclic graph) for us. When running dbt, the business logic will build as tables or views in the data warehouse.
  • 16. CONFIDENTIAL. Copyright © 16 CONFIDENTIAL. Copyright © 16 Demo Goal: calculate monthly sales values by category Tech stacks: DBT, Databricks, Azure Blob Data: Brazilian E-Commerce Public Dataset by Olist (Kaggle) Github: https://github.com/ongxuanhong/de05-dbt-databricks Youtube: https://youtube.com/playlist?list=PLR0bWeb09-BxoexgE1JD-CUC7TAtNVeyO
  • 17. CONFIDENTIAL. Copyright © 17 Calculate monthly sales values by category values_per_bills = total_sales / total_bills
  • 18. CONFIDENTIAL. Copyright © 18 DBT on Databricks Data Lakehouse with Brazilian Ecommerce dataset Source: link
  • 19. CONFIDENTIAL. Copyright © 19 Data Lakehouse: ingested data (JSON, Parquet, Avro, Delta) Brazilian E-Commerce Public Dataset by Olist | Kaggle
  • 20. CONFIDENTIAL. Copyright © 20 Data Lakehouse: create external tables (JSON, Parquet, Avro, Delta)
  • 21. CONFIDENTIAL. Copyright © 21 Brazilian Ecommerce tables
  • 22. CONFIDENTIAL. Copyright © 22 Initialize project
  • 24. CONFIDENTIAL. Copyright © 24 Full pipeline
  • 25. CONFIDENTIAL. Copyright © 25 Full pipeline
  • 27. CONFIDENTIAL. Copyright © 27 Pivot table
  • 28. CONFIDENTIAL. Copyright © 28 DBT packages https://hub.getdbt.com/
  • 29. CONFIDENTIAL. Copyright © 29 DBT data lineage and output reports
  • 30. CONFIDENTIAL. Copyright © 30 Key take away
  • 31. CONFIDENTIAL. Copyright © 31 CONFIDENTIAL. Copyright © 31 • Enables seamless data transformation: DBT automates the transformation of raw data into a format that is useful for analytics. This allows data analysts and engineers to focus on insights and analysis rather than spending time on data preparation. • Provides a modular approach to data transformation: DBT’s modular approach makes it easy to break down complex transformations into smaller, more manageable steps. This allows teams to work collaboratively on specific parts of a project and to easily modify and test those parts without affecting the entire project. • Promotes data consistency and quality: DBT enforces strict data testing and documentation requirements, ensuring that data is accurate, consistent, and reliable. This enables analysts and engineers to have confidence in the data they are working with, leading to better insights and more informed decision-making. Benefits DATA BUILD TOOL (DBT) Grown at 10% every single month (github)
  • 32. CONFIDENTIAL. Copyright © 32 CONFIDENTIAL. Copyright © 32 • Requires SQL knowledge: While dbt makes it easier to work with SQL, it still requires a certain level of SQL knowledge to use effectively. If you don't have experience with SQL, you may need to invest time in learning it in order to use dbt effectively. • Performance overhead: Depending on the complexity of your dbt models and the size of your data, there may be a performance overhead associated with using dbt. • Limited scope: While dbt can help automate some aspects of data modeling, it doesn't solve all data-related problems. It's important to understand the limitations of dbt and when other tools or approaches might be more appropriate. Be aware of DATA BUILD TOOL (DBT) Source (link)
  • 33. CONFIDENTIAL. Copyright © 33 Discussion What is analytics engineering? dbt: Model contract v1.5 dbt + Machine Learning: What makes a great baton pass? dbt Cloud integrations (Snowflake, Airflow, Monte Carlo)
  • 34. CONFIDENTIAL. Copyright © 34 References • What is analytics engineering? • What is dbt? • Quickstart for dbt Core • Tristan Handy — The Work Behind the Data Work

Editor's Notes

  1. How to move faster? How to ensure data quality?