SlideShare une entreprise Scribd logo
1  sur  24
Modern DW Architecture
About Me
• Associate Director, IQVIA, DW & DS
• In Data warehouse technology for more than 10
years
• Worked as Developer, Lead, Consultant, Solutions
Architect on DW & MDM Projects
• Ex-Microsoft – worked as Technical Consultant
• Presenter at various technical forums , 2000+
answers on MSDN
• Certifications : Microsoft Certified IT Professional ,
Amazon certified Architect
• Linked in :
https://www.linkedin.com/in/rakeshjayaram
Agenda
•Traditional IT Vs SaaS
•Why cloud?
•Migrate to Cloud
Solution Models
•Storage
•Prep
•Serve
Modern DW Architecture
•Debunk myths
Data Lakes vs Data
Warehouse
•Data Lake
•databricks
•Synapse
Demo
•Compute
•Storage
Cloud optimized DW
Solutions
•Thank you!Q&A
Solution Models
Azure databricks
Azure database
Azure Synapse
SQL Server
on VM
SQL Server
on Prem
Solution Models
WHY CLOUD? Elasticity Pay as you go Secure
Increase/Decrease resources
on demand
Auto-scaling options
Unlimited resources availability
No upfront capital/cost
Maintenance Resources elimination
Encrypted at-rest(/transit
Azure AD integration
ACL’s & Security groups
IP Address black/white listing
Virtual network integration
Fast time to Market
Integration with Azure Services
Support to familiar language(T-sql)
Continuous build & Integration
Solution Models
HOW TO
MOVE TO
CLOUD?
Lift/Shift Remodel Build for Cloud
SQL Server on Azure VM Azure database Databricks & Synapse
Modern DW Architecture
STORE PREP SERVE
Azure data lake Gen1
Azure Blob
Azure data lake Gen2
Azure databricks
Azure HDInsight
Azure SQL Synapse
SQL Server on VM
INGEST
Azure Data factory
Text
Parquet
Json
CRM
BI +
Reporting
Downstream
apps
Advanced
Analytics
Storage
Compute
Compute + Storage
Storage
STORE PREP SERVE
Azure data lake Gen1
Azure Blob
Azure data lake Gen2
Azure databricks
Azure HDInsight
Azure SQL Synapse
SQL Server on VM
INGEST
Azure Data factory
Text
Parquet
Json
CRM
BI +
Reporting
Downstream
apps
Advanced
Analytics
Storage
Compute
Compute + Storage
Storage
Blob Storage
Purpose General purpose object store for a wide variety of storage scenarios, including big data analytics
Use cases Any type of text or binary data, such as application back end, backup data, media storage for streaming
and general purpose data
Encryption (at rest) Transparent, Server side
With service-managed keys
With customer-managed keys in Azure KeyVault (preview)
Client-side encryption
Life cycle Management Yes
Authentication Access Keys
Store Type Object store with flat namespace
Redundancy Locally redundant (LRS), zone redundant (ZRS), globally redundant (GRS)
Limits 2 PB/Account in US
Storage
ADLS Gen1
Purpose Optimized storage for big data analytics workloads
Use cases Batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click
streams, large datasets
Encryption (at rest) Transparent, Server side
With service-managed keys
With customer-managed keys in Azure KeyVault (preview)
Life cycle Management No
Authentication Azure AD
Store Type Hierarchical file system
Redundancy Locally redundant (LRS)
Limits No support for Hot/Cold Storage
No support for Redundancy Storage
Don’t use for new projects – upgrade to ADLS Gen2
Storage
ADLS Gen2
Purpose Optimized storage for big data analytics workloads
Use cases Batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click
streams, large datasets
Encryption (at rest) Transparent, Server side
With service-managed keys
With customer-managed keys in Azure KeyVault (preview)
Life cycle Management Yes (In preview)
Authentication Azure AD
Store Type Hierarchical file system
Redundancy Locally redundant (LRS)
ZRS & GRS (Preview)
Storage
Migration from Blob to ADLS Gen2 New features on ADLS Gen2
Modern Data Architecture
STORE PREP SERVE
Azure data lake Gen1
Azure Blob
Azure data lake Gen2
Azure databricks
Azure HDInsight
Azure SQL Synapse
SQL Server on VM
INGEST
Azure Data factory
Text
Parquet
Json
CRM
BI +
Reporting
Downstream
apps
Advanced
Analytics
Storage
Compute
Compute + Storage
Compute - databricks
Prep
Azure databricks
Purpose Platform/Tool to be used for massive data processing
Use cases Integration with ADLS
Spark & Notebook
Auto-scaling & Auto-termination
Limits Steep learning curve for data engineers
Prep
Azure HD Insight
Highlights Complete open source platform for Hadoop clusters on Azure
Preferred for Pig/Kafka/Hive/etc..
Hortonworks merger with cloudera!
No termination & Auto-scale
Highlights Self service BI
Good for smaller workloads
Primarily for Data Analysts/Business Analysts
Power BI Dataflow
Highlights Query as Service
pay for the processing
U-SQL (~ SQL /C#)
Azure Data Lake Analytics
Modern Data Architecture
STORE PREP SERVE
Azure data lake Gen1
Azure Blob
Azure data lake Gen2
Azure databricks
Azure HDInsight
Azure SQL Synapse
SQL Server on VM
INGEST
Azure Data factory
Text
Parquet
Json
CRM
BI +
Reporting
Downstream
apps
Advanced
Analytics
Storage
Compute
Compute + Storage
Compute - Synapse
Serve
Azure Synapse
Purpose Peta-byte scale cloud based DW
Use cases Massive parallel technology
De-coupled (storage & compute)architecture
Stop-resume options for cost-efficiency
Indexing, distribution, partition options for faster query performance
Limits Required reference to ADLS to fetch data using PolyBase
Data Lake vs Data Warehouse
Data format
Purpose
Sources
Volume
Ingestion
Users
Use Case
Raw Data – Un structed – Semi structured
Any purpose ( ML – AI – Data Warehouse)
Native raw form (logs – data files)
PB Scale
Stored with minimal validation & transformation
Data engineers – data Analysts (advanced)
Batch & Stream processing
Structured – Cleansed – Processed
Mostly Reporting & BI
Historical & relational form
Less than Data Lake
Data must be cleansed – validated – refined
Business Analysts
Batch processing
Data pipelines using data bricks
Load
•Flat, Parquet
•Zip files
•Control/Trigger
•Source & loadId driven
Validate
•Control file
•Column Name & Count
•Data type , Null , Duplicate
•Phone, Zip, e-mail pattern
validation
•Pattern check (Datetime,
Regex etc.)
•Error level configuration
Transform
•Transform from sql file
•Azure Tables
•Raise error on condition
•SCD processer
•Fact processer
Unload
•Azure Tables
•CSV file
Audit logs &
Report
Generation
•Reports
•Application logs
•Tracking identifier(UID)
Demo
Production Power platform world tour
Scene Azure Data Platform Demo
Take Hopefully only 1 !
Actors Azure data lake , Azure
databricks, Azure Synapse
Date 05-12-2019
Optimize for Cost & Performance
0
5
10
15
20
25
30
35
Traffic Server capacity(TDU)
Synapse
Shutdown
0
20
40
60
80
100
120
Metadata Workday CRM ERP National
Sales
Zip Sales
Metadata Workday CRM ERP National Sales Zip Sales
Cluster size depending
on source volume
• Synapse is a de-coupled architecture
• Shut down when production run-cycle is complete
• Chose the right distribution strategy
• Partition fact files where required
• Chose the right clusters size according to the volume of source to
avoid resource under-utilization
• Auto-scale if required
• Auto-termination on complete
Shutdown feature in Synapse
Large
Use spark for validations & source specific jobs
Small
Medium
Optimize for Cost & Performance
Enable ADLS Gen2 Lifecycle Management Policy
Operation
(For X
volume)
Hot Cool Archive
Store 100$ ~55$ ~5.5$
Write 100$ ~200$ ~200$
Read 100$ ~250$ ~120,000$
In
development
(Hot  Cool  Archive)
References
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts?tabs=azure-portal
https://azure.microsoft.com/en-us/blog/multi-protocol-access-on-data-lake-storage-now-generally-available/
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-comparison-with-blob-storage
https://azure.microsoft.com/en-us/services/data-lake-analytics/
https://medium.com/@cprosenjit/restricting-access-to-your-big-data-system-on-azure-d887845c42ab
https://cloudarchitected.com/2019/03/data-level-security-in-azure-databricks/
https://cloudarchitected.com/2019/02/network-isolation-for-azure-databricks/
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-continuous-integration-and-deployment
Q & A
THANK YOU

Contenu connexe

Tendances

Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapseNilesh Gule
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure DatabricksSascha Dittmann
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure Antonios Chatzipavlis
 
Azure data factory
Azure data factoryAzure data factory
Azure data factoryDavid Giard
 
Azure Purview Data Toboggan Erwin de Kreuk
Azure Purview Data Toboggan Erwin de KreukAzure Purview Data Toboggan Erwin de Kreuk
Azure Purview Data Toboggan Erwin de KreukErwin de Kreuk
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudMark Kromer
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mark Kromer
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed InstanceJames Serra
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoDimko Zhluktenko
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020
Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020
Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020Timothy McAliley
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 

Tendances (20)

Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
 
Azure purview
Azure purviewAzure purview
Azure purview
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
Azure Purview Data Toboggan Erwin de Kreuk
Azure Purview Data Toboggan Erwin de KreukAzure Purview Data Toboggan Erwin de Kreuk
Azure Purview Data Toboggan Erwin de Kreuk
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
 
Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
 
Microsoft Purview
Microsoft PurviewMicrosoft Purview
Microsoft Purview
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020
Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020
Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 

Similaire à Modern data warehouse

Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
DBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data ApplicationsDBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data Applicationsdecode2016
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Trivadis
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Rukmani Gopalan
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS Amazon Web Services
 
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the CloudSQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the CloudMark Kromer
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?James Serra
 
Modern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptxModern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptxssuser290967
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesCCG
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWSAmazon Web Services
 

Similaire à Modern data warehouse (20)

Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
DBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data ApplicationsDBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data Applications
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
Azure Days 2019: Business Intelligence auf Azure (Marco Amhof & Yves Mauron)
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS AWS March 2016 Webinar Series Building Your Data Lake on AWS
AWS March 2016 Webinar Series Building Your Data Lake on AWS
 
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the CloudSQL Saturday Redmond 2019 ETL Patterns in the Cloud
SQL Saturday Redmond 2019 ETL Patterns in the Cloud
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Modern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptxModern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptx
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data Services
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Dernier (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Modern data warehouse

  • 2. About Me • Associate Director, IQVIA, DW & DS • In Data warehouse technology for more than 10 years • Worked as Developer, Lead, Consultant, Solutions Architect on DW & MDM Projects • Ex-Microsoft – worked as Technical Consultant • Presenter at various technical forums , 2000+ answers on MSDN • Certifications : Microsoft Certified IT Professional , Amazon certified Architect • Linked in : https://www.linkedin.com/in/rakeshjayaram
  • 3. Agenda •Traditional IT Vs SaaS •Why cloud? •Migrate to Cloud Solution Models •Storage •Prep •Serve Modern DW Architecture •Debunk myths Data Lakes vs Data Warehouse •Data Lake •databricks •Synapse Demo •Compute •Storage Cloud optimized DW Solutions •Thank you!Q&A
  • 4. Solution Models Azure databricks Azure database Azure Synapse SQL Server on VM SQL Server on Prem
  • 5. Solution Models WHY CLOUD? Elasticity Pay as you go Secure Increase/Decrease resources on demand Auto-scaling options Unlimited resources availability No upfront capital/cost Maintenance Resources elimination Encrypted at-rest(/transit Azure AD integration ACL’s & Security groups IP Address black/white listing Virtual network integration Fast time to Market Integration with Azure Services Support to familiar language(T-sql) Continuous build & Integration
  • 6. Solution Models HOW TO MOVE TO CLOUD? Lift/Shift Remodel Build for Cloud SQL Server on Azure VM Azure database Databricks & Synapse
  • 7. Modern DW Architecture STORE PREP SERVE Azure data lake Gen1 Azure Blob Azure data lake Gen2 Azure databricks Azure HDInsight Azure SQL Synapse SQL Server on VM INGEST Azure Data factory Text Parquet Json CRM BI + Reporting Downstream apps Advanced Analytics Storage Compute Compute + Storage
  • 8. Storage STORE PREP SERVE Azure data lake Gen1 Azure Blob Azure data lake Gen2 Azure databricks Azure HDInsight Azure SQL Synapse SQL Server on VM INGEST Azure Data factory Text Parquet Json CRM BI + Reporting Downstream apps Advanced Analytics Storage Compute Compute + Storage
  • 9. Storage Blob Storage Purpose General purpose object store for a wide variety of storage scenarios, including big data analytics Use cases Any type of text or binary data, such as application back end, backup data, media storage for streaming and general purpose data Encryption (at rest) Transparent, Server side With service-managed keys With customer-managed keys in Azure KeyVault (preview) Client-side encryption Life cycle Management Yes Authentication Access Keys Store Type Object store with flat namespace Redundancy Locally redundant (LRS), zone redundant (ZRS), globally redundant (GRS) Limits 2 PB/Account in US
  • 10. Storage ADLS Gen1 Purpose Optimized storage for big data analytics workloads Use cases Batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets Encryption (at rest) Transparent, Server side With service-managed keys With customer-managed keys in Azure KeyVault (preview) Life cycle Management No Authentication Azure AD Store Type Hierarchical file system Redundancy Locally redundant (LRS) Limits No support for Hot/Cold Storage No support for Redundancy Storage Don’t use for new projects – upgrade to ADLS Gen2
  • 11. Storage ADLS Gen2 Purpose Optimized storage for big data analytics workloads Use cases Batch, interactive, streaming analytics and machine learning data such as log files, IoT data, click streams, large datasets Encryption (at rest) Transparent, Server side With service-managed keys With customer-managed keys in Azure KeyVault (preview) Life cycle Management Yes (In preview) Authentication Azure AD Store Type Hierarchical file system Redundancy Locally redundant (LRS) ZRS & GRS (Preview)
  • 12. Storage Migration from Blob to ADLS Gen2 New features on ADLS Gen2
  • 13. Modern Data Architecture STORE PREP SERVE Azure data lake Gen1 Azure Blob Azure data lake Gen2 Azure databricks Azure HDInsight Azure SQL Synapse SQL Server on VM INGEST Azure Data factory Text Parquet Json CRM BI + Reporting Downstream apps Advanced Analytics Storage Compute Compute + Storage Compute - databricks
  • 14. Prep Azure databricks Purpose Platform/Tool to be used for massive data processing Use cases Integration with ADLS Spark & Notebook Auto-scaling & Auto-termination Limits Steep learning curve for data engineers
  • 15. Prep Azure HD Insight Highlights Complete open source platform for Hadoop clusters on Azure Preferred for Pig/Kafka/Hive/etc.. Hortonworks merger with cloudera! No termination & Auto-scale Highlights Self service BI Good for smaller workloads Primarily for Data Analysts/Business Analysts Power BI Dataflow Highlights Query as Service pay for the processing U-SQL (~ SQL /C#) Azure Data Lake Analytics
  • 16. Modern Data Architecture STORE PREP SERVE Azure data lake Gen1 Azure Blob Azure data lake Gen2 Azure databricks Azure HDInsight Azure SQL Synapse SQL Server on VM INGEST Azure Data factory Text Parquet Json CRM BI + Reporting Downstream apps Advanced Analytics Storage Compute Compute + Storage Compute - Synapse
  • 17. Serve Azure Synapse Purpose Peta-byte scale cloud based DW Use cases Massive parallel technology De-coupled (storage & compute)architecture Stop-resume options for cost-efficiency Indexing, distribution, partition options for faster query performance Limits Required reference to ADLS to fetch data using PolyBase
  • 18. Data Lake vs Data Warehouse Data format Purpose Sources Volume Ingestion Users Use Case Raw Data – Un structed – Semi structured Any purpose ( ML – AI – Data Warehouse) Native raw form (logs – data files) PB Scale Stored with minimal validation & transformation Data engineers – data Analysts (advanced) Batch & Stream processing Structured – Cleansed – Processed Mostly Reporting & BI Historical & relational form Less than Data Lake Data must be cleansed – validated – refined Business Analysts Batch processing
  • 19. Data pipelines using data bricks Load •Flat, Parquet •Zip files •Control/Trigger •Source & loadId driven Validate •Control file •Column Name & Count •Data type , Null , Duplicate •Phone, Zip, e-mail pattern validation •Pattern check (Datetime, Regex etc.) •Error level configuration Transform •Transform from sql file •Azure Tables •Raise error on condition •SCD processer •Fact processer Unload •Azure Tables •CSV file Audit logs & Report Generation •Reports •Application logs •Tracking identifier(UID)
  • 20. Demo Production Power platform world tour Scene Azure Data Platform Demo Take Hopefully only 1 ! Actors Azure data lake , Azure databricks, Azure Synapse Date 05-12-2019
  • 21. Optimize for Cost & Performance 0 5 10 15 20 25 30 35 Traffic Server capacity(TDU) Synapse Shutdown 0 20 40 60 80 100 120 Metadata Workday CRM ERP National Sales Zip Sales Metadata Workday CRM ERP National Sales Zip Sales Cluster size depending on source volume • Synapse is a de-coupled architecture • Shut down when production run-cycle is complete • Chose the right distribution strategy • Partition fact files where required • Chose the right clusters size according to the volume of source to avoid resource under-utilization • Auto-scale if required • Auto-termination on complete Shutdown feature in Synapse Large Use spark for validations & source specific jobs Small Medium
  • 22. Optimize for Cost & Performance Enable ADLS Gen2 Lifecycle Management Policy Operation (For X volume) Hot Cool Archive Store 100$ ~55$ ~5.5$ Write 100$ ~200$ ~200$ Read 100$ ~250$ ~120,000$ In development (Hot  Cool  Archive)
  • 24. Q & A THANK YOU

Notes de l'éditeur

  1. Time : 9:05 Data has become the strategic asset used to transform businesses to uncover new insights. IDC projects that this explosion of data will result in a 40 Zetabyte digital universe by 2020. To drive the business forward, the enterprise needs to Integrate, Adapt their enterprise data warehouse to evolve into <pause> a modern data warehouse.
  2. Time : 9:05
  3. Time : 9:05 Solution Models – 10 min Modern Data Architecture - 5 min Storage – 5 min Prep – 5 min Serve – 5 min Data Lakes vs Data Warehouse – 5 min Demo – 15 min Cloud Optimized Solutions - 5 min Q&A – 5 min
  4. Section End Time (1/3) – 9 : 10 Traditional IT The traditional IT data warehouse was designed specifically to be a central repository for all data in a company. Disparate data from transactional systems, ERP, CRM, and POB applications are cleansed—that is, extracted, transformed, and loaded (ETL)—into the warehouse within an overall relational schema. The predictable data structure and quality optimized processing for operational reporting. However, preparing queries was largely IT-supported and based on scheduled batch processing. Traditional data warehouse was built on symmetric multi-processing (SMP) technology. With SMP, adding more capacity involved procuring larger, more powerful hardware and then forklifting the prior data warehouse into it. This was necessary because as the warehouse approached capacity, its architecture experienced performance issues at a scale where there was no way to add incremental processor power or enable synchronization of the cache between processors. SaaS The cloud has quickly become an integral part of many IT organizations .Recent research from cloud solutions provider Right Scale showing 93% of businesses using cloud technology. Forrester recently did a study where they found 47% of organizations increasing their cloud deployments for big data specifically. It makes sense because the cloud not only enables cost efficiencies, it gives you the scale to meet demands / SLA to process any amount of data now and in the future.
  5. Section End Time (2/3) – 9 : 10 A defining characteristic of cloud computing is elasticity – the ability to rapidly provision and release resources to match what a workload requires – so that a user pays no more and no less than what they need to for the task at hand. Such just-in-time provisioning can save customers enormous amounts of money when their workloads are intermittent and heavily spiked. In modern enterprise, there are few workloads that have a desperate need for such elastic capabilities as data warehousing and big data. Traditionally built on-premises with very expensive hardware and software, most enterprise Data Warehouse (DW) systems have very low utilization except during peak periods of data loading, transformation and report generation. The Microsoft Modern Data Warehouse offers the most comprehensive options to deploy data warehousing and big data directly to the cloud with the elastic scalability of Azure. Security Data Lake ; Encrypted at-rest(Azure or Client Managed) Encrypted at-transit(HTTPS) Azure active directory integration ACL’s & Security groups IP Address black/white listing Virtual network integration
  6. Section End Time (3/3) – 9 : 10 Solution Models As seen in the diagram, each offering can be characterized by the level of administration you have over the infrastructure, and by the degree of cost efficiency. Factors that can influence your decision to choose between the different data offerings Cost Lift/Shift – IaaS option you need to invest additional time and resources to manage your database IaaS option enables you to shut down your resources while you are not using them to decrease the cost, Remodel (PaaS) PaaS version is always running unless if you drop and re-create your resources when they are needed Administration Lift/Shift Supports CLR Remodel (PaaS) PaaS options reduce the amount of time that you need to invest to administer the database SLA Both IaaS and PaaS provide high, industry standard SLA. PaaS option guarantees 99.99% SLA, while IaaS guarantees 99.95% SLA for infrastructure Time to Move to Azure SQL Server in Azure VM is the exact match of your environment, so migration from on-premises to Azure SQL VM is not different than moving the databases from one on-premises server to another. Azure database instance also enables extremely easy migration; however, there might be some changes that you need to apply before you migrate to a managed instance.
  7. Section End Time (1/1) – 9 : 10 Increasing volume Real Time performance Integration of public & Business application data sources (public {facebook – twitter – linkedin} & business application {CRM – Sales – Supply Chain - Workday} ) Capital cost for infrastructure Adhoc analytics / Client Self service analytics
  8. Section End Time (1/5) – 9 : 20 Azure Data Lake Store is a single repository to build cloud-based data lakes to capture and access any type of data for high-performance processing and analytics and low latency workloads with enterprise-grade security. This lets you store data in a single place and use any type of analytics to process it such as Azure HDInsight (Hadoop and Spark), R Server, Hortonworks, Cloudera, and Azure SQL Data Warehouse
  9. Section End Time (2/5) – 9 : 20 Azure storage offers different access tiers, which allow you to store blob object data in the most cost-effective manner. The available access tiers include: Hot - Optimized for storing data that is accessed frequently. Cool - Optimized for storing data that is infrequently accessed and stored for at least 30 days. Archive - Optimized for storing data that is rarely accessed and stored for at least 180 days with flexible latency requirements (on the order of hours). Lifecycle management policiesYou can now set policies to a tier or delete data in Data Lake Storage. To learn more, see the documentation “Manage the Azure Blob storage lifecycle.”
  10. Section End Time (3/5) – 9 : 20 No Client-side encryption Object store allows to perform file operations faster – Rename , move , copy , delete than blob storage No support for zone redundant (ZRS), globally redundant (GRS) Migrating data via Azure Data Factory is currently the easiest way to do a one-time data migration, as there is not currently a migration tool available. If you have any files in ADLS Gen1 larger than 5TB, they will need to be separated into multiple files before migration.
  11. Section End Time (4/5) – 9 : 20 No Client-side encryption Object store allows to perform file operations faster – Rename , move , copy , delete than blob storage No support for zone redundant (ZRS), globally redundant (GRS) Query Performance. When sending a query that is only retrieving a subset of data, with a hierarchical file system like ADLS Gen2 it is possible to leverage partition scans for data pruning (predicate pushdown). This can improve query performance dramatically for compute engines that understand how to take advantage of partition scans. Data Load Performance. Sometimes it is necessary to rename files or relocate files from one directory to another. Granular Security at the Directory and File Level. The hierarchical file system of ADLS Gen2 (and Gen1) is POSIX-compliant. Access control lists (ACLs) can be defined at the directory and file level to define granular security, which offers much-needed flexibility for controlling data-level security Object storage, such as Azure blob storage, is known for being highly economical. With respect to the direct storage cost, Microsoft has released ADLS Gen2 at the same price as Azure blob storage (i.e., block blob pricing). You only pay for the storage that you use; there is not the concept of reserving a specific size. However, the transaction costs are somewhat higher for storage accounts which have the hierarchical namespace enabled. Transaction costs are usually measured in batches of 10,000.
  12. Section End Time (5/5) – 9 : 20 Multi-protocol access to the same data, via Azure Blob storage API and Azure Data Lake Storage API, allows you to leverage existing object storage capabilities on Data Lake Storage accounts, which are hierarchical namespace-enabled storage accounts built on top of Blob storage Blob storage resembled a pseudo-filesystem directory hierarchy, adopting naming conventions to Blob objects containing slashes (/). This was inefficient because applications would have to iterate through potentially millions of individual Blob objects to achieve directory-level tasks: For example, deleting a directory with several million objects in Blob storage would require an equal number of delete operations as objects in that directory. In contrast, with ADLS Gen2, deleting a directory is a single operation regardless of the number of files in the directory.
  13. Section End Time (1/3) – 9 : 25
  14. Section End Time (2/3) – 9 : 25 Data engineers from SQL background to learn new language – Spark/Scala, Flink, Beam!
  15. Section End Time (3/3) – 9 : 25 Considering that U-SQL within Azure Data Lake Analytics (ADLA) is not one of the initial services to be supported by the optimized ABFS driver, that says something about where we should be placing our bets. Microsoft has not announced the future roadmap for ADLA, but we are observing that open source technologies such as Spark appeal to a wider customer base vs. proprietary tools and languages.
  16. Section End Time (1/2) – 9 : 30
  17. Section End Time (1/2) – 9 : 30 In the cloud, Azure SQL Data Warehouses leverages the same MPP architecture as the Analytics Platform System letting you combine the scaling power of this architecture with the elasticity of the cloud. A defining characteristic of cloud computing is elasticity – the ability to rapidly provision and release resources to match what a workload requires – so that a user pays no more and no less than what they need to for the task at hand. Such just-in-time provisioning can save customers enormous amounts of money when their workloads are intermittent and heavily spiked. Azure SQL Data Warehouse is a fully managed DW as a Service that you can provision in minutes and scale up to 60 times larger in seconds. With a few clicks in the Azure Portal, you can launch a data warehouse, and start analyzing or querying data at the scale of hundreds of terabytes. Our architecture separates compute and storage so that you can independently scale them. A very unique pause feature allows you to suspend compute in seconds and resume when needed while your data remains intact in Azure storage.
  18. Section End Time (1/1) – 9 : 35 Myth : You need data lake OR data warehouse Data lake and data warehouse serve different purpose. They are not mutually exclusive but infact work conjunction for optimal results & outcomes Myth : Easy to build Data Warehouse, While Data Lakes Are Difficult It’s true that data lakes require the specific skills of data engineers and data scientists (or experts with similar skill sets) to sort and make use of the data stored within. The unstructured nature of the data makes it less readily accessible to those without a full understanding of how the data lake works. However, once data scientists and data engineers build data models or pipelines, business users can often leverage integrations (custom or pre-built) with popular business tools to explore the data. Likewise, most business users access data stored within data warehouses through connected business intelligence (BI) tools like Tableau and Looker. With the help of third-party BI tools, business users should be able to access and analyze data, whether that data is stored in a data warehouse or a data lake.
  19. Section End Time (1/1) – 9 : 35
  20. Demo start – 9 :35 Demo end – 9 : 50
  21. Section End Time (1/2) – 9:55 Run multiple Databricks Spark cluster to meet SLA if required e.g ; (Synapse) Largest volume : 100 GB Resource purchased : 100 DTU Resource utilized : Source with Largest volume : 100 GB – Completely utilized Resource utilized : Source with Smallest volume : 10 MB – Under utilized e.g ; (databricks) Largest volume : 100 GB Resource purchased : Large cluster DTU Resource utilized : Source with Largest volume : 100 GB – Completely utilized Smallest volume : 10 MB Resource purchased : Smaller cluster DTU Resource utilized : Source with Smallest volume : 10 MB – Completely utilized
  22. Section End Time (2/2) – 9:55 Rehydrate an archived blob to an online tier - Rehydrate an archive blob to hot or cool by changing its tier using the Set Blob Tier operation. Copy an archived blob to an online tier - Create a new copy of an archive blob by using the Copy Blob operation. Specify a different blob name and a destination tier of hot or cool.