SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Data Tech Talk
ft. external speaker
Chris Riccomini, Author, Engineer & Manager, …
1. Talk: Data Warehousing Trends
2. Open Dialogue: Q & A
Data
Warehousing
Trends 2021/12/09 · Chris Riccomini
Hi, I’m Chris
● Engineer & Manager
WePay, LinkedIn, PayPal
● Open Source
Apache Samza, Apache Airflow, Debezium
● Author
The Missing README
● Investor & Advisor
Prefect, Meroxa, StarTree, Amundsen, Anomalo, TopCoat, ...
@criccomini
The Trends
● Realtime DWHs
● Analytics Engineering
● Data Mesh
● Data Catalogs
● Reverse ETL
● Headless BI
● Data Integrity
● Data Lakehouses
● DataOps
● White-label Data Viz
https://twitter.com/criccomini/status/1451557884769169412
https://preset.io/blog/reshaping-data-engineering/
The Trends
● Realtime DWHs
● Analytics Engineering
● Data Mesh
● Data Catalogs
● Reverse ETL
● Headless BI
● Data Integrity
● Data Lakehouses
● DataOps
● White-label Data Viz
https://twitter.com/criccomini/status/1451557884769169412
https://preset.io/blog/reshaping-data-engineering/
You are here
Realtime DWHs
Batch ETL
Pubsub
Realtime DWH
Realtime DWH
Why Realtime DWHs?
● Debugging
○ Investigate application errors
○ Audit log shows how things changed
● Operational
○ Monitoring
○ Scripts that pull from DWH
● Security/Compliance
○ Audit log
● Customer data products
○ Ad hoc customer reports (e.g. Stripe Sigma, WePay txns)
○ Data clean rooms
Realtime DWHs Technical Advantages
● Handles hard deletes
● No schema requirements (timestamps)
● Replay from Kafka
● Data integration
Realtime DWH Drawbacks
● Operationally complex
● Depends on source DB support (for CDC)
● Inline transformation is harder
● Fixing bad data is harder
Data Mesh
“A data mesh is a type of data platform architecture that embraces the ubiquity of
data in the enterprise by leveraging a domain-oriented, self-serve design.”
● Domain-oriented decentralized data ownership and architecture
● Data as a product
● Self-serve data infrastructure as a platform
● Federated computational governance
Data Mesh
https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0
“A data mesh is a type of data platform architecture that embraces the ubiquity of
data in the enterprise by leveraging a domain-oriented, self-serve design.”
● Domain-oriented decentralized data ownership and architecture
● Data as a product
● Self-serve data infrastructure as a platform
● Federated computational governance
Data Mesh
https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0
wat.
“A product is any item or service you sell to serve a customer's need or want.”
Data is a Product
https://www.aha.io/roadmapping/guide/product-management/what-is-a-product
● Customers
○ Data scientists
○ Business analysts
○ Finance
○ Sales
○ Product managers
○ Engineers
○ External customers
● Products
○ Recommender systems
○ Billing
○ Fraud
○ Reports
○ Dashboards
Data is a Product
https://cnr.sh/essays/what-the-heck-data-mesh
● Versioned
● Compatible
● Documented
● Monitored
● Self-serve
● Secure (AuthN, AuthZ)
Treat Data Models like APIs
https://cnr.sh/essays/what-the-heck-data-mesh
● Microservice
● DevOps
We’ve Done This Before
https://cnr.sh/essays/what-the-heck-data-mesh
Headless BI
Metrics then
● BI tools to create and visualize metrics
○ Looker
○ Mode
○ Tableau
○ Data Studio
● Answer internal business questions
○ How is a product's health?
○ What does revenue look like?
Metrics now
● Metrics matter for external business workflows
○ Predicting when a customer might churn
○ Notifying users when they reach their capacity limit
○ DS wants to create models to optimize certain metrics
○ Computing customer bills
● BI tools aren’t meant for this
○ Walled garden
○ Have to re-implement the same metrics in different systems
“For data consumption, we heard complaints from decision makers that different
teams reported different numbers for very simple business questions, and
there was no easy way to know which number was correct.”
"...the teams that own metrics would be able to define them once, in a way
that’s consistent across dashboards, automation tools, sales reporting, and so
on. Let’s call this ‘Headless BI’."
Headless BI
https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
https://basecase.vc/blog/headless-bi
Headless BI
● Programmatically manage business metrics
○ Automated
○ Centralized
○ Documented
○ Validated
○ Metadata/lineage
○ Backfills
○ Cost
○ Privacy
○ Access
○ Deprecation
○ Retention
https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
Headless BI
https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
Q&A
● Realtime DWHs
● Analytics Engineering
● Data Mesh
● Data Catalogs
● Reverse ETL
● Headless BI
● Data Integrity
● Data Lakehouses
● DataOps
● White-label Data Viz
Appendix
Analytics Engineering
Analytics engineers provide clean data sets to end users, modeling data in a way
that empowers end users to answer their own questions.
While a data analyst spends their time analyzing data, an analytics engineer
spends their time transforming, testing, deploying, and documenting data.
Analytics engineers apply software engineering best practices like version
control and continuous integration to the analytics code base.
https://www.getdbt.com/what-is-analytics-engineering
Analytics Engineering
● Job
○ Building
○ Testing
○ Cataloging
● Tools
○ DBT
○ Airflow
● Customers
○ Data science
○ Data analysts
○ BI
○ Reporting
Data Catalogs
● Flavor of the month
○ Amundsen
○ DataHub
○ Metaphor
○ Marquez
○ Atlan
○ Collibra
○ Alation
● Use cases
○ Discoverability
○ Operations
○ Governance
“Reverse ETL syncs data from a system of records like a warehouse to a system
of actions like CRM, MAP, and other SaaS apps to operationalize data.”
Reverse ETL
https://blog.getcensus.com/what-is-reverse-etl/
Reverse ETL
https://blog.getcensus.com/what-is-reverse-etl/
● https://unsplash.com/photos/Lbvi0GGJWY4
● https://unsplash.com/photos/8vTAAFYhFfQ
● https://www.flickr.com/photos/jurgenappelo/5201275209
●
Photos
Realtime DWHs

Contenu connexe

Tendances

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
 
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...IDERA Software
 
Evoluindo a Plataforma de Dados do Nubank TDC SP 2019
Evoluindo a Plataforma de Dados do Nubank TDC SP 2019Evoluindo a Plataforma de Dados do Nubank TDC SP 2019
Evoluindo a Plataforma de Dados do Nubank TDC SP 2019André de Lannoy Tavares
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...DataWorks Summit
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Why I quit Amazon and Build the Next-gen Streaming System
Why I quit Amazon and Build the Next-gen Streaming SystemWhy I quit Amazon and Build the Next-gen Streaming System
Why I quit Amazon and Build the Next-gen Streaming SystemYingjun Wu
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDatabricks
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)Ryan Blue
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperMárton Kodok
 

Tendances (20)

Big query
Big queryBig query
Big query
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
 
Evoluindo a Plataforma de Dados do Nubank TDC SP 2019
Evoluindo a Plataforma de Dados do Nubank TDC SP 2019Evoluindo a Plataforma de Dados do Nubank TDC SP 2019
Evoluindo a Plataforma de Dados do Nubank TDC SP 2019
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Why I quit Amazon and Build the Next-gen Streaming System
Why I quit Amazon and Build the Next-gen Streaming SystemWhy I quit Amazon and Build the Next-gen Streaming System
Why I quit Amazon and Build the Next-gen Streaming System
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 

Similaire à Data Warehousing Trends

Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like ProductsVMware Tanzu
 
DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.ZaraaTitima1
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
 
How To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with HadoopHow To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with HadoopMammoth Data
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationInside Analysis
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as ProductDATAVERSITY
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsLooker
 
Introduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesIntroduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesAnjani Phuyal
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution  Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution Sirinporn Setworaya
 
Digital Operations Service Design
Digital Operations Service DesignDigital Operations Service Design
Digital Operations Service DesignNVISIA
 
No Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDBNo Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDBMongoDB
 
20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science OrientationDuc Lai Trung Minh
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4jNeo4j
 
SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365Brian Culver
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with MicrosoftCaserta
 

Similaire à Data Warehousing Trends (20)

Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
 
DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
How To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with HadoopHow To Run A Successful BI Project with Hadoop
How To Run A Successful BI Project with Hadoop
 
The Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with VirtualizationThe Agile Analyst: Solving the Data Problem with Virtualization
The Agile Analyst: Solving the Data Problem with Virtualization
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
 
Introduction to Big Data using AWS Services
Introduction to Big Data using AWS ServicesIntroduction to Big Data using AWS Services
Introduction to Big Data using AWS Services
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution  Company Profile - NPC with TIBCO Spotfire solution
Company Profile - NPC with TIBCO Spotfire solution
 
Digital Operations Service Design
Digital Operations Service DesignDigital Operations Service Design
Digital Operations Service Design
 
No Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDBNo Compromises SQL Connectivity for MongoDB
No Compromises SQL Connectivity for MongoDB
 
20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation20180701 - 1st Meeting - Data Science Orientation
20180701 - 1st Meeting - Data Science Orientation
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4j
 
SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365SPT 104 Unlock your big data with analytics and BI on Office 365
SPT 104 Unlock your big data with analytics and BI on Office 365
 
Paving The Way To Data Driven
Paving The Way To Data DrivenPaving The Way To Data Driven
Paving The Way To Data Driven
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with Microsoft
 

Plus de Chris Riccomini

What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)Chris Riccomini
 
The Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSFThe Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSFChris Riccomini
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInChris Riccomini
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInChris Riccomini
 
Building Applications on YARN
Building Applications on YARNBuilding Applications on YARN
Building Applications on YARNChris Riccomini
 

Plus de Chris Riccomini (6)

What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)What Your Tech Lead Thinks You Know (But Didn't Teach You)
What Your Tech Lead Thinks You Know (But Didn't Teach You)
 
The Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSFThe Future of Data Engineering - 2019 InfoQ QConSF
The Future of Data Engineering - 2019 InfoQ QConSF
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
Building Applications on YARN
Building Applications on YARNBuilding Applications on YARN
Building Applications on YARN
 

Dernier

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 

Dernier (20)

Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 

Data Warehousing Trends

  • 1. Data Tech Talk ft. external speaker Chris Riccomini, Author, Engineer & Manager, … 1. Talk: Data Warehousing Trends 2. Open Dialogue: Q & A
  • 3. Hi, I’m Chris ● Engineer & Manager WePay, LinkedIn, PayPal ● Open Source Apache Samza, Apache Airflow, Debezium ● Author The Missing README ● Investor & Advisor Prefect, Meroxa, StarTree, Amundsen, Anomalo, TopCoat, ... @criccomini
  • 4.
  • 5. The Trends ● Realtime DWHs ● Analytics Engineering ● Data Mesh ● Data Catalogs ● Reverse ETL ● Headless BI ● Data Integrity ● Data Lakehouses ● DataOps ● White-label Data Viz https://twitter.com/criccomini/status/1451557884769169412 https://preset.io/blog/reshaping-data-engineering/
  • 6. The Trends ● Realtime DWHs ● Analytics Engineering ● Data Mesh ● Data Catalogs ● Reverse ETL ● Headless BI ● Data Integrity ● Data Lakehouses ● DataOps ● White-label Data Viz https://twitter.com/criccomini/status/1451557884769169412 https://preset.io/blog/reshaping-data-engineering/
  • 13. Why Realtime DWHs? ● Debugging ○ Investigate application errors ○ Audit log shows how things changed ● Operational ○ Monitoring ○ Scripts that pull from DWH ● Security/Compliance ○ Audit log ● Customer data products ○ Ad hoc customer reports (e.g. Stripe Sigma, WePay txns) ○ Data clean rooms
  • 14. Realtime DWHs Technical Advantages ● Handles hard deletes ● No schema requirements (timestamps) ● Replay from Kafka ● Data integration
  • 15. Realtime DWH Drawbacks ● Operationally complex ● Depends on source DB support (for CDC) ● Inline transformation is harder ● Fixing bad data is harder
  • 17. “A data mesh is a type of data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design.” ● Domain-oriented decentralized data ownership and architecture ● Data as a product ● Self-serve data infrastructure as a platform ● Federated computational governance Data Mesh https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0
  • 18. “A data mesh is a type of data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design.” ● Domain-oriented decentralized data ownership and architecture ● Data as a product ● Self-serve data infrastructure as a platform ● Federated computational governance Data Mesh https://towardsdatascience.com/what-is-a-data-mesh-and-how-not-to-mesh-it-up-210710bb41e0 wat.
  • 19. “A product is any item or service you sell to serve a customer's need or want.” Data is a Product https://www.aha.io/roadmapping/guide/product-management/what-is-a-product
  • 20. ● Customers ○ Data scientists ○ Business analysts ○ Finance ○ Sales ○ Product managers ○ Engineers ○ External customers ● Products ○ Recommender systems ○ Billing ○ Fraud ○ Reports ○ Dashboards Data is a Product https://cnr.sh/essays/what-the-heck-data-mesh
  • 21. ● Versioned ● Compatible ● Documented ● Monitored ● Self-serve ● Secure (AuthN, AuthZ) Treat Data Models like APIs https://cnr.sh/essays/what-the-heck-data-mesh
  • 22. ● Microservice ● DevOps We’ve Done This Before https://cnr.sh/essays/what-the-heck-data-mesh
  • 24. Metrics then ● BI tools to create and visualize metrics ○ Looker ○ Mode ○ Tableau ○ Data Studio ● Answer internal business questions ○ How is a product's health? ○ What does revenue look like?
  • 25. Metrics now ● Metrics matter for external business workflows ○ Predicting when a customer might churn ○ Notifying users when they reach their capacity limit ○ DS wants to create models to optimize certain metrics ○ Computing customer bills ● BI tools aren’t meant for this ○ Walled garden ○ Have to re-implement the same metrics in different systems
  • 26. “For data consumption, we heard complaints from decision makers that different teams reported different numbers for very simple business questions, and there was no easy way to know which number was correct.” "...the teams that own metrics would be able to define them once, in a way that’s consistent across dashboards, automation tools, sales reporting, and so on. Let’s call this ‘Headless BI’." Headless BI https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70 https://basecase.vc/blog/headless-bi
  • 27. Headless BI ● Programmatically manage business metrics ○ Automated ○ Centralized ○ Documented ○ Validated ○ Metadata/lineage ○ Backfills ○ Cost ○ Privacy ○ Access ○ Deprecation ○ Retention https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70
  • 29. Q&A ● Realtime DWHs ● Analytics Engineering ● Data Mesh ● Data Catalogs ● Reverse ETL ● Headless BI ● Data Integrity ● Data Lakehouses ● DataOps ● White-label Data Viz
  • 31. Analytics Engineering Analytics engineers provide clean data sets to end users, modeling data in a way that empowers end users to answer their own questions. While a data analyst spends their time analyzing data, an analytics engineer spends their time transforming, testing, deploying, and documenting data. Analytics engineers apply software engineering best practices like version control and continuous integration to the analytics code base. https://www.getdbt.com/what-is-analytics-engineering
  • 32. Analytics Engineering ● Job ○ Building ○ Testing ○ Cataloging ● Tools ○ DBT ○ Airflow ● Customers ○ Data science ○ Data analysts ○ BI ○ Reporting
  • 33. Data Catalogs ● Flavor of the month ○ Amundsen ○ DataHub ○ Metaphor ○ Marquez ○ Atlan ○ Collibra ○ Alation ● Use cases ○ Discoverability ○ Operations ○ Governance
  • 34. “Reverse ETL syncs data from a system of records like a warehouse to a system of actions like CRM, MAP, and other SaaS apps to operationalize data.” Reverse ETL https://blog.getcensus.com/what-is-reverse-etl/
  • 36. ● https://unsplash.com/photos/Lbvi0GGJWY4 ● https://unsplash.com/photos/8vTAAFYhFfQ ● https://www.flickr.com/photos/jurgenappelo/5201275209 ● Photos