SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Ganesh Chand, Databricks
Ravi Gawai, Databricks
Agenda
• Delta Lake - What and Why?
• Common Delta Lake use cases
• Data as a Service (DaaS)
• Our Approach
• Use Cases
• Demo
• Q&A
3
What’s a Data Lake?
4
A data lake is a centralized repository that allows you to store all your
structured and unstructured data at any scale.
“If you think of a datamart as a store of bottled water – cleansed and
packaged and structured for easy consumption – the data lake is a
large body of water in a more natural state. The contents of the data
lake stream in from a source to fill the lake, and various users of the
lake can come to examine, dive in, or take samples.” - James Dixon
Why Data Lake ?
5
LAKES STREAMS
WAREHOUSES NOSQL
CSV,
JSON,
TXT…
Challenges with Data Warehouse
• Big Data problem
• Expensive (build, store and process)
• Proprietary technology (processing and
storage)
• Vendor lock-in
• Lack of ML capabilities
Data Lake: Aspiration
6
Real-time Streaming,
Data Science and ML
• Recommendation Engines
• Risk, Fraud, & Intrusion Detection
• Customer Analytics
• IoT & Predictive Maintenance
• Genomics & DNA Sequencing
Use AI and Machine Learning to outperform your competition,
retain your customers, boost your productivity with lower TCO
using variety of data sources
Data Lake: Reality
7
Real-time Streaming,
Data Science and ML
• Recommendation Engines
• Risk, Fraud, & Intrusion Detection
• Customer Analytics
• IoT & Predictive Maintenance
• Genomics & DNA Sequencing
The majority of these
projects are failing!
Unreliable, low quality data
slow performance
Why ?
8
Data WarehouseStrengths of Data Warehouse
• Full ACID Transaction
• Insert, Delete, Update w/ SCD-II
• Indexing for faster query response
• Schema-On-Write
Strengths of Data Lake
• Open Source, Open Standards
• Powered By Apache Spark
• Scale
• Unified platform for data & AI
● Unification of Batch & Streaming workloads
● Incrementally improve the quality of your data until it is ready for
consumption (Multi-hop pipelines)
● Dramatically reduces legacy Spark/Hive operational burdens
● Scalable Metadata Handling
And
What’s a Delta Lake
9
A Data Lake Powered By Delta
LAKES STREAMS
WAREHOUSES NOSQL
CSV,
JSON,
TXT…
Raw
Ingestion
Bronze
Filtered, Cleaned
Augmented
Silver
Business-level
Aggregates
Gold
Delta Lake
Common Delta Lake Use Cases
• Interactive Queries
• BI reporting and dashboards
• Train and Build Machine Learning Models
• Create Data Warehouse
• Create / Monetize Data Products
• Sell or Share curated data to partners, vendors and internal
customers
• Feed data back to source systems, web applications, Mobile
Apps
10
Common Delta Lake Use Cases
• Interactive Queries
• BI reporting and dashboards
• Train and Build Machine Learning Models
• Create Data Warehouse
11
• Create / Monetize Data Products
• Sell or Share curated data to partners, vendors and internal
customers
• Feed data back to source systems, web applications, Mobile
Apps
Serving Data From Delta Lake
12
Web app
Mobile app
ERP
Storage
Data product
Data enrichment
Data Integration
Data export
Serving Data From Delta Lake
13
Serving Data From Delta Lake
14
Storage
S3 ADLS HDFS
Catalog
ConsumersCompute Serving
API
Access
Management
Data Service
Metadata Service
Serving Data From Delta Lake
Data-as-a-Service (DaaS )
• Rest APIs
• Ready-Only
• Data Format
• Delivery mechanism
15
Challenges
• Security
• Latency
• Throughput
• SLA
• Data licensing, ownership
and monetization model
• Managing evolving
requirements
• Minimizing Information Silos
Use Cases for Demo App
• MVP features for the demo app
• End-to-end etl pipeline writing into delta lake
• DaaS REST endpoint to export data
• Front-end app to consume data and build a dashboard
16
• UI to interact with delta lake
• Export classified and aggregated data out of delta lake to be
consumed by a client app
Our implementation
17
Storage
S3
Compute Consumers
databricks
Jobs
API
Serving
R
E
S
T
Routes:
/listSchemas
/listTables
/exportData
DaaS APIs
18
GET delta-meta-service/getDbDetails
GET delta-meta-service/previewTable?table=db.tablename
POST delta-sql-service/exportSqlData -d
{
"inputSql": "select * from db.table where condition",
"outputPath": "/path/",
"format": "json"
}
GET delta-sql-service/getRunStatus?run_id=id
Demo
19
Delta ETL pipeline
20
Front-End
21
Front-End
22
Thank You
23
ganesh@databricks.com ravi@databricks.com
Building Data Intensive Analytic Application on Top of Delta Lakes

Contenu connexe

Tendances

The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Tableau 7.0 prsentation
Tableau 7.0 prsentationTableau 7.0 prsentation
Tableau 7.0 prsentation
inam_slides
 

Tendances (20)

Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesAccelerate and modernize your data pipelines
Accelerate and modernize your data pipelines
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
 
Diving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDiving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction Log
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Alteryx Desktop Designer Overview
Alteryx Desktop Designer OverviewAlteryx Desktop Designer Overview
Alteryx Desktop Designer Overview
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
OLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure SynapseOLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure Synapse
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
Microsoft Fabric Intro D Koutsanastasis
Microsoft Fabric Intro D KoutsanastasisMicrosoft Fabric Intro D Koutsanastasis
Microsoft Fabric Intro D Koutsanastasis
 
Tableau 7.0 prsentation
Tableau 7.0 prsentationTableau 7.0 prsentation
Tableau 7.0 prsentation
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 

Similaire à Building Data Intensive Analytic Application on Top of Delta Lakes

Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified Insights
Denodo
 
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Denodo
 

Similaire à Building Data Intensive Analytic Application on Top of Delta Lakes (20)

So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...Practical guide to architecting data lakes -  Avinash Ramineni - Phoenix Data...
Practical guide to architecting data lakes - Avinash Ramineni - Phoenix Data...
 
Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified Insights
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
 
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 Planning and Optimizing Data Lake Architecture - Milos Milovanovic Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Difference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data LakeDifference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data Lake
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the Hype
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the Hype
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 

Plus de Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Plus de Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Dernier

Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 

Dernier (20)

High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 

Building Data Intensive Analytic Application on Top of Delta Lakes

  • 1.
  • 2. Ganesh Chand, Databricks Ravi Gawai, Databricks
  • 3. Agenda • Delta Lake - What and Why? • Common Delta Lake use cases • Data as a Service (DaaS) • Our Approach • Use Cases • Demo • Q&A 3
  • 4. What’s a Data Lake? 4 A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. “If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” - James Dixon
  • 5. Why Data Lake ? 5 LAKES STREAMS WAREHOUSES NOSQL CSV, JSON, TXT… Challenges with Data Warehouse • Big Data problem • Expensive (build, store and process) • Proprietary technology (processing and storage) • Vendor lock-in • Lack of ML capabilities
  • 6. Data Lake: Aspiration 6 Real-time Streaming, Data Science and ML • Recommendation Engines • Risk, Fraud, & Intrusion Detection • Customer Analytics • IoT & Predictive Maintenance • Genomics & DNA Sequencing Use AI and Machine Learning to outperform your competition, retain your customers, boost your productivity with lower TCO using variety of data sources
  • 7. Data Lake: Reality 7 Real-time Streaming, Data Science and ML • Recommendation Engines • Risk, Fraud, & Intrusion Detection • Customer Analytics • IoT & Predictive Maintenance • Genomics & DNA Sequencing The majority of these projects are failing! Unreliable, low quality data slow performance
  • 8. Why ? 8 Data WarehouseStrengths of Data Warehouse • Full ACID Transaction • Insert, Delete, Update w/ SCD-II • Indexing for faster query response • Schema-On-Write Strengths of Data Lake • Open Source, Open Standards • Powered By Apache Spark • Scale • Unified platform for data & AI ● Unification of Batch & Streaming workloads ● Incrementally improve the quality of your data until it is ready for consumption (Multi-hop pipelines) ● Dramatically reduces legacy Spark/Hive operational burdens ● Scalable Metadata Handling And
  • 9. What’s a Delta Lake 9 A Data Lake Powered By Delta LAKES STREAMS WAREHOUSES NOSQL CSV, JSON, TXT… Raw Ingestion Bronze Filtered, Cleaned Augmented Silver Business-level Aggregates Gold Delta Lake
  • 10. Common Delta Lake Use Cases • Interactive Queries • BI reporting and dashboards • Train and Build Machine Learning Models • Create Data Warehouse • Create / Monetize Data Products • Sell or Share curated data to partners, vendors and internal customers • Feed data back to source systems, web applications, Mobile Apps 10
  • 11. Common Delta Lake Use Cases • Interactive Queries • BI reporting and dashboards • Train and Build Machine Learning Models • Create Data Warehouse 11 • Create / Monetize Data Products • Sell or Share curated data to partners, vendors and internal customers • Feed data back to source systems, web applications, Mobile Apps
  • 12. Serving Data From Delta Lake 12 Web app Mobile app ERP Storage Data product Data enrichment Data Integration Data export
  • 13. Serving Data From Delta Lake 13
  • 14. Serving Data From Delta Lake 14 Storage S3 ADLS HDFS Catalog ConsumersCompute Serving API Access Management Data Service Metadata Service
  • 15. Serving Data From Delta Lake Data-as-a-Service (DaaS ) • Rest APIs • Ready-Only • Data Format • Delivery mechanism 15 Challenges • Security • Latency • Throughput • SLA • Data licensing, ownership and monetization model • Managing evolving requirements • Minimizing Information Silos
  • 16. Use Cases for Demo App • MVP features for the demo app • End-to-end etl pipeline writing into delta lake • DaaS REST endpoint to export data • Front-end app to consume data and build a dashboard 16 • UI to interact with delta lake • Export classified and aggregated data out of delta lake to be consumed by a client app
  • 18. DaaS APIs 18 GET delta-meta-service/getDbDetails GET delta-meta-service/previewTable?table=db.tablename POST delta-sql-service/exportSqlData -d { "inputSql": "select * from db.table where condition", "outputPath": "/path/", "format": "json" } GET delta-sql-service/getRunStatus?run_id=id